Chapter 4
Chars & Strings
IntroductionIn addition to the numeric types we covered in Chapter 3, F# supports character and string data. In this chapter, we’ll look at the different ways in which we can express characters and strings, and the different operations we can perform with them. Character DataIn F#, characters are of type char, and are manifestations of the System.Char type in .NET. We define characters by delimiting symbols in single quotes. Characters are internally represented in Unicode. In the following example, the identifier c is bound to the character x. let c = 'x' You can also represent a character literal as a byte by adding a B suffix to the character. > let c = 'x'B;; Because they are System.Chars, F# characters offer a number of useful methods, most of which are defined as static members of the System.Char class, e.g., Char.IsDigit. While System.Char instances have a few interesting methods in their own right, most of a Char’s usefulness comes from its class methods. Representing Characters by CodeThe character symbols that a computer is capable of displaying all come from a well-known, fixed pool of symbols. These symbols are divided into sets called code pages or character encodings. A code page is basically a finite, fixed set of numbered symbols. Every character is represented by a number corresponding to it entry in the code page. For example, in the ASCII code page, the capital letter ‘A’ corresponds to entry 65. This means that my program can display the capital letter ‘A’ by accessing entry 65 from the ASCII code page. Note that all operating systems and applications use a default code page.
There are several ways to express a character as an entry from a code page (a number). The following list highlights the various alternatives:
· \NNN is a 3-digit integer for representing a character. For example '\065' = 'A' · \uNNNN (Unicode) is a u followed by a hex representation of the number corresponding to the given character. For example, '\u0041' = 'A' · \UNNNNNNNN (long Unicode) is a U followed by a 2-byte hex sequence representing the number corresponding to the given character. For example, '\u00000041' = 'A'
StringsLife wouldn’t be all that interesting if all we had to work with were single characters. F# fully supports strings, which map to the System.String type. Strings are an ordered collection of Unicode characters. F# strings are immutable and are delimited by double quotes. Here are examples: let daughter = "Melissa" Strings can be single- or multi-line. To enter multi-line strings, simply enter a newline at the end of each line of the string: let haiku = "No sky If you want to enter a single, long string on multiple lines, use the backslash character to continue the string on the next line. This will prevent a newline from being inserted. let haiku = "No sky\ The above example will yield a string with no newlines embedded. Strings are Buckets of CharactersStrings are stored in memory as an ordered collection of characters. This ordered collection is an array.[1] This means that we can access the individual elements (characters) of an array by referring to its position in the array. Positions start at 0. For example: let daughter = "kimberly" //
string For those coming from other languages, note the . after the identifier name and before the braces. You need to use this syntax in F# to access array elements. You can also convert a string to a corresponding array of bytes via the B suffix. For example: > let countryBytes = "USA"B When you suffix a string with a B, F# interprets the characters as ASCII bytes. (Darn, there’s that code page thing again).
Escape SequencesThe familiar escape sequences work like they do in other languages such as C and C#, e.g., \n is a line return, \t is a TAB, etc.
Verbatim StringsAll too often in programming, we deal with strings that contain file paths and names that, by necessity, include backslashes and other special symbols that we need to escape. Here is where verbatim strings come to the rescue. Verbatim strings are not interpreted – they are processed as-is. The following example demonstrates using a verbatim string to represent a file path:
let mydir = @"c:\dev\fsharp\examples" The at (@) symbol initiates a verbatim string, allowing for the single backslash (\) to appear without issue. String OperationsStrings in F#, since they map to System.String, enjoy all the functionality the .NET libraries have to offer. Some operations available to our programs are illustrated below. let fullname = "Mary "+ "Smith"
// concatenation Like the numeric conversion operators we discussed in Chapter 3, you can use the string conversion operator to cast a numeric value to a string representation, as in the example above. Since strings are immutable, you cannot execute any operation that would change them, even via accessing their individual components. Given the example above, the line fullname.[0] <- 'L' would cause a compiler error. Building Long StringsWhile you can build a string by concatenating onto its end, this is inefficient for all but the most trivial cases. Every time you append to the end of a string, immutability dictates that F# creates a new string with the combined values. This is a very inefficient way to build a string. To build a string efficiently, you need to tap into the System.Text.StringBuilder class. We will discuss interoperating with the .NET class libraries in a future chapter; however, in order to give you a sneak preview, here is an example of creating a .NET StringBuilder object and using its methods to build a string efficiently. let buf = new System.Text.StringBuilder() buf.Append("Mary had ") buf.Append("a little lamb, it's ") buf.Append("(you know the rest...)") To retreive the built-up string, you can use the ToString() method: let str = buf.ToString() printfn "%s" str What You Need to Know· F# supports characters, which are single Unicode elements delimited by single quotes. F# characters map to the System.Char type, e.g., let c = 'x'. · Most of the interesting things you can do with characters come from the Char class, e.g., System.Char.IsDigit('x'). · F# support strings, which are ordered collections of Unicode characters delimited by double-quotes. · F# strings map to the .NET System.String type and have access to all of its members and operations. · F# strings are immutable. · F# strings support standard escape sequences, e.g., \n. · F# supports verbatim strings, e.g., let dir = @"c:\temp" · You can access the individual elements of a string via array syntax, e.g., s.[0]. · You can concatenate strings via +; however, for efficiency’s sake, you should avoid doing so for all but simple cases. To build up long strings efficiently, avoid + and use the .NET System.Text.StringBuilder class instead.
|
||
FeedbackWe welcome your feedback. If you have comments or questions about this chapter, please feel free to e-mail us at Keep Reading |