Problem. You want to split strings on different characters with single character or string delimiters. For example, split a string that contains ""r"n" sequences, which are Windows newlines. Solution. This document contains several tips for the Split method on the string type in the C# programming language.
Input string: One,Two,Three,Four,Five
Delimiter: , (char)
Array: One (string array)
Two
Three
Four
Five
Here we see the basic Split method overload. You already know the general way to do this, but it is good to look at the basic syntax before we move on. This example splits on a single character.
=== Example program for splitting on spaces ===
using System;
class Program
{
static void Main()
{
string s = "there is a cat";
//
// Split string on spaces.
// This will separate all the words.
//
string[] words = s.Split(' ');
foreach (string word in words)
{
Console.WriteLine(word);
}
}
}
=== Output of the program ===
there
is
a
cat
Description. The input string, which contains four words, is split on spaces and the foreach loop then displays each word. The result value from Split is a string[] array.
Here we use either the Regex method or the C# new array syntax. Note that a new char array is created in the following usages. There is an overloaded method with that signature if you need StringSplitOptions, which is used to remove empty strings.
=== Program that splits on lines with Regex ===
using System;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
string value = "cat"r"ndog"r"nanimal"r"nperson";
//
// Split the string on line breaks.
// The return value from Split is a string[] array.
//
string[] lines = Regex.Split(value, ""r"n");
foreach (string line in lines)
{
Console.WriteLine(line);
}
}
}
=== Output of the program ===
cat
dog
animal
person
Description. The first example uses Regex. Regex contains the Split method, which is static. It can be used to split strings, although it has different performance properties. The next two example show how you can specify an array as the first parameter to string Split.
=== Program that splits on multiple characters ===
using System;
class Program
{
static void Main()
{
//
// This string is also separated by Windows line breaks.
//
string value = "shirt"r"ndress"r"npants"r"njacket";
//
// Use a new char[] array of two characters ("r and "n) to break
// lines from into separate strings. Use "RemoveEmptyEntries"
// to make sure no empty strings get put in the string[] array.
//
char[] delimiters = new char[] { '"r', '"n' };
string[] parts = value.Split(delimiters, StringSplitOptions.RemoveEmptyEntries);
for (int i = 0; i < parts.Length; i++)
{
Console.WriteLine(parts[i]);
}
//
// Same as the previous example, but uses a new string of 2 characters.
//
parts = value.Split(new string[] { ""r"n" }, StringSplitOptions.None);
for (int i = 0; i < parts.Length; i++)
{
Console.WriteLine(parts[i]);
}
}
}
=== Output of the program ===
(Repeated two times)
shirt
dress
pants
jacket
Overview. One useful overload of Split receives char[] arrays. The string Split method can receive a character array as the first parameter. Each char in the array designates a new block.
Using string arrays. Another overload of Split receives string[] arrays. This means string array can also be passed to the Split method. The new string[] array is created inline with the Split call.
Explanation of StringSplitOptions. The RemoveEmptyEntries enum is specified. When two delimiters are adjacent, we end up with an empty result. We can use this as the second parameter to avoid this. [C# StringSplitOptions Enumeration - dotnetperls.com] The following screenshot shows the Visual Studio debugger.
Here we see how you can separate words with Split. Usually, the best way to separate words is to use a Regex that specifies non-word chars. This example separates words in a string based on non-word characters. It eliminates punctuation and whitespace from the return array.
=== Program that separates on non-word pattern ===
using System;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
string[] w = SplitWords("That is a cute cat, man");
foreach (string s in w)
{
Console.WriteLine(s);
}
Console.ReadLine();
}
/// <summary>
/// Take all the words in the input string and separate them.
/// </summary>
static string[] SplitWords(string s)
{
//
// Split on all non-word characters.
// Returns an array of all the words.
//
return Regex.Split(s, @""W+");
// @ special verbatim string syntax
// "W+ one or more non-word characters together
}
}
=== Output of the program ===
That
is
a
cute
cat
man
Word splitting example. Here you can separate parts of your input string based on any character set or range with Regex. Overall, this provides more power than the string Split methods. [C# Regex.Split Method Examples - dotnetperls.com]
Here you have a text file containing comma-delimited lines of values. This is called a CSV file, and it is easily dealt with in C#. We use the File.ReadAllLines method here, but you may want StreamReader instead.
Reading the following code. The C# code next reads in both of those lines, parses them, and displays the values of each line after the line number. The final comment shows how the file was parsed into the strings.
=== Contents of input file (TextFile1.txt) ===
Dog,Cat,Mouse,Fish,Cow,Horse,Hyena
Programmer,Wizard,CEO,Rancher,Clerk,Farmer
=== Program that splits lines in file (C#) ===
using System;
using System.IO;
class Program
{
static void Main()
{
int i = 0;
foreach (string line in File.ReadAllLines("TextFile1.txt"))
{
string[] parts = line.Split(',');
foreach (string part in parts)
{
Console.WriteLine("{0}:{1}",
i,
part);
}
i++; // For demo only
}
}
}
=== Output of the program ===
0:Dog
0:Cat
0:Mouse
0:Fish
0:Cow
0:Horse
0:Hyena
1:Programmer
1:Wizard
1:CEO
1:Rancher
1:Clerk
1:Farmer
Here we see how you can Split the segments in a Windows local directory into separate strings. Note that directory paths are complex and this may not handle all cases correctly. It is also platform-specific, and you could use System.IO.Path. DirectorySeparatorChar for more flexibility. [C# Path Examples - dotnetperls.com]
=== Program that splits Windows directories (C#) ===
using System;
class Program
{
static void Main()
{
// The directory from Windows
const string dir = @"C:"Users"Sam"Documents"Perls"Main";
// Split on directory separator
string[] parts = dir.Split('""');
foreach (string part in parts)
{
Console.WriteLine(part);
}
}
}
=== Output of the program ===
C:
Users
Sam
Documents
Perls
Main
The logic internal to the .NET framework for Split is implemented in managed code. The methods call into the overload with three parameters. The parameters are next checked for validity. Finally, it uses unsafe code to create the separator list, and then a for loop combined with Substring to return the array.
The author tested a long string and a short string, having 40 and 1200 chars. String splitting speed varies on the type of strings. The length of the blocks, number of delimiters, and total size of the string factor into performance.
Results. The Regex.Split option generally performed the worst. The author felt that the second or third methods would be the best, after observing performance problems with regular expressions in other situations.
=== Strings used in test ===
//
// Build long string.
//
_test = string.Empty;
for (int i = 0; i < 120; i++)
{
_test += "01234567"r"n";
}
//
// Build short string.
//
_test = string.Empty;
for (int i = 0; i < 10; i++)
{
_test += "ab"r"n";
}
=== Example methods tested (100000 iterations) ===
static void Test1()
{
string[] arr = Regex.Split(_test, ""r"n", RegexOptions.Compiled);
}
static void Test2()
{
string[] arr = _test.Split(new char[] { '"r', '"n' }, StringSplitOptions.RemoveEmptyEntries);
}
static void Test3()
{
string[] arr = _test.Split(new string[] { ""r"n" }, StringSplitOptions.None);
}
Longer strings: 1200 chars. The benchmark for the methods on the long strings is more even. It may be that for very long strings, such as entire files, the Regex method is equivalent or even faster. For short strings, Regex is slowest, but for long strings it is very fast.
=== Benchmark of Split on long strings ===
[1] Regex.Split: 3470 ms
[2] char[] Split: 1255 ms [fastest]
[3] string[] Split: 1449 ms
=== Benchmark of Split on short strings ===
[1] Regex.Split: 434 ms
[2] char[] Split: 63 ms [fastest]
[3] string[] Split: 83 ms
Short strings: 40 chars. This shows the three methods compared to each other on short strings. Method 1 is the Regex method, and it is by far the slowest on the short strings. This may be because of the compilation time. Smaller is better. [This article was last updated for .NET 3.5 SP1.]
Performance recommendation. For programs that use shorter strings, the methods that split based on arrays are faster and simpler, and they will avoid Regex compilation. For somewhat longer strings or files that contain more lines, Regex is appropriate. I show some Split improvements that can improve your program. [C# Split Improvement - dotnetperls.com]
You can use Replace on your string input to substitute special characters in for any escaped characters. This can solve lots of problems on parsing computer-generated code or data. [C# Split Method and Escape Characters - dotnetperls.com]
The author's further research into Split and its performance shows that it is worthwhile to declare your char[] array you are splitting on as a local instance to reduce memory pressure and improve runtime performance.
=== Slow version - before ===
//
// Split on multiple characters using new char[] inline.
//
string t = "string to split, ok";
for (int i = 0; i < 10000000; i++)
{
string[] s = t.Split(new char[] { ' ', ',' });
}
=== Fast version - after ===
//
// Split on multiple characters using new char[] already created.
//
string t = "string to split, ok";
char[] c = new char[]{ ' ', ',' }; // <-- Cache this
for (int i = 0; i < 10000000; i++)
{
string[] s = t.Split(c);
}
Interpretation of the above table. We see that storing the array of delimiters separately is good. My measurements show the above code is less than 10% faster when the array is stored outside the loop.
C# has no explode method exactly like PHP explode, but you can gain the functionality quite easily with Split, for the most part. You can replace explode with the Split method that receives a string[] array. [C# PHP explode Function - dotnetperls.com]
Here we saw several examples and two benchmarks of the Split method in the C# programming language. You can use Split to divide or separate your strings while keeping your code as simple as possible. Sometimes, using IndexOf and Substring together to parse your strings can be more precise and less error-prone. [C# IndexOf String Examples - dotnetperls.com]
只能输入数字Q?^[0-9]*$"?br />
只能输入n位的数字Q?^\d{n}$"?br />
只能输入臛_n位的数字Q?^\d{n,}$"?br />
只能输入m~n位的数字Q?^\d{m,n}$"
只能输入零和非零开头的数字Q?^(0|[1-9][0-9]*)$"?br />
只能输入有两位小数的正实敎ͼ(x)"^[0-9]+(.[0-9]{2})?$"?br />
只能输入?~3位小数的正实敎ͼ(x)"^[0-9]+(.[0-9]{1,3})?$"?br />
只能输入非零的正整数Q?^\+?[1-9][0-9]*$"?br />
只能输入非零的负整数Q?^\-[1-9][]0-9"*$?br />
只能输入长度?的字W:(x)"^.{3}$"?br />
只能输入?6个英文字母组成的字符Ԍ(x)"^[A-Za-z]+$"?br />
只能输入?6个大写英文字母组成的字符Ԍ(x)"^[A-Z]+$"?br />
只能输入?6个小写英文字母组成的字符Ԍ(x)"^[a-z]+$"?br />
只能输入由数字和26个英文字母组成的字符Ԍ(x)"^[A-Za-z0-9]+$"?br />
只能输入由数字?6个英文字母或者下划线l成的字W串Q?^\w+$"?br />
验证用户密码Q?^[a-zA-Z]\w{5,17}$"正确格式为:(x)以字母开_(d)长度?~18之间Q只能包含字W、数字和下划Uѝ?br />
验证是否含有^%&',;=?$\"{字W:(x)"[^%&',;=?$\x22]+"?br />
只能输入汉字Q?^[\u4e00-\u9fa5]{0,}$"
验证Email地址Q?^\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*$"?br />
验证InternetURLQ?^http://([\w-]+\.)+[\w-]+(/[\w-./?%&=]*)?$"?br />
验证?sh)话L(fng)Q?^(\(\d{3,4}-)|\d{3.4}-)?\d{7,8}$"正确格式为:(x)"XXX-XXXXXXX"?XXXX-XXXXXXXX"?XXX-XXXXXXX"?XXX-XXXXXXXX"?XXXXXXX"?XXXXXXXX"?br />
验证w䆾证号Q?5位或18位数字)Q?^\d{15}|\d{18}$"?br />
验证一q的12个月Q?^(0?[1-9]|1[0-2])$"正确格式为:(x)"01"?09"?1"?12"?br />
验证一个月?1天:(x)"^((0?[1-9])|((1|2)[0-9])|30|31)$"正确格式为;"01"?09"?1"?31"?
利用正则表达式限制网表单里的文本框输入内容Q?/p>
用正则表辑ּ限制只能输入中文Qonkeyup="value=value.replace(/[^\u4E00-\u9FA5]/g,'')" onbeforepaste="clipboardData.setData('text',clipboardData.getData('text').replace(/[^\u4E00-\u9FA5]/g,''))"
用正则表辑ּ限制只能输入全角字符Q?onkeyup="value=value.replace(/[^\uFF00-\uFFFF] /g,'')" onbeforepaste="clipboardData.setData('text',clipboardData.getData('text').replace(/[^\uFF00-\uFFFF]/g,''))"
用正则表辑ּ限制只能输入数字Qonkeyup="value=value.replace(/[^\d]/g,'') "onbeforepaste="clipboardData.setData('text',clipboardData.getData('text').replace(/[^\d]/g,''))"
用正则表辑ּ限制只能输入数字和英文:(x)onkeyup="value=value.replace(/[\W]/g,'') "onbeforepaste="clipboardData.setData('text',clipboardData.getData('text').replace(/[^\d]/g,''))"
得用正则表达式从URL地址中提取文件名的javascriptE序Q如下结果ؓ(f)page1
s="http://www.9499.net/page1.htm"
s=s.replace(/(.*\/){0,}([^\.]+).*/ig,"$2")
alert(s)
匚w双字节字W?包括汉字在内)Q[^\x00-\xff]
应用Q计字W串的长度(一个双字节字符长度?QASCII字符?Q?/p>
String.prototype.len=function(){return this.replace([^\x00-\xff]/g,"aa").length;}
匚wI的正则表辑ּQ\n[\s| ]*\r
匚wHTML标记的正则表辑ּQ?<(.*)>.*<\/\1>|<(.*) \/>/
匚w首尾I格的正则表辑ּQ?^\s*)|(\s*$)
String.prototype.trim = function()
{
return this.replace(/(^\s*)|(\s*$)/g, "");
}
利用正则表达式分解和转换IP地址Q?/p>
下面是利用正则表辑ּ匚wIP地址QƈIP地址转换成对应数值的JavascriptE序Q?/p>
function IP2V(ip)
{
re=/(\d+)\.(\d+)\.(\d+)\.(\d+)/g //匚wIP地址的正则表辑ּ
if(re.test(ip))
{
return RegExp.$1*Math.pow(255,3))+RegExp.$2*Math.pow(255,2))+RegExp.$3*255+RegExp.$4*1
}
else
{
throw new Error("Not a valid IP address!")
}
}
不过上面的程序如果不用正则表辑ּQ而直接用split函数来分解可能更单,E序如下Q?/p>
var ip="10.100.20.168"
ip=ip.split(".")
alert("IP值是Q?+(ip[0]*255*255*255+ip[1]*255*255+ip[2]*255+ip[3]*1))
W号解释Q?/p>
字符
描述
\
下一个字W标Cؓ(f)一个特D字W、或一个原义字W、或一?向后引用、或一个八q制转义W。例如,'n' 匚w字符 "n"?\n' 匚w一个换行符。序?'\\' 匚w "\" ?"\(" 则匹?"("?/p>
^
匚w输入字符串的开始位|。如果设|了 RegExp 对象?Multiline 属性,^ 也匹?'\n' ?'\r' 之后的位|?/p>
$
匚w输入字符串的l束位置。如果设|了RegExp 对象?Multiline 属性,$ 也匹?'\n' ?'\r' 之前的位|?/p>
*
匚w前面的子表达式零ơ或多次。例如,zo* 能匹?"z" 以及(qing) "zoo"? {h(hun)于{0,}?/p>
+
匚w前面的子表达式一ơ或多次。例如,'zo+' 能匹?"zo" 以及(qing) "zoo"Q但不能匚w "z"? {h(hun)?{1,}?/p>
?
匚w前面的子表达式零ơ或一ơ。例如,"do(es)?" 可以匚w "do" ?"does" 中的"do" ? {h(hun)?{0,1}?/p>
{n}
n 是一个非负整数。匹配确定的 n ơ。例如,'o{2}' 不能匚w "Bob" 中的 'o'Q但是能匚w "food" 中的两个 o?/p>
{n,}
n 是一个非负整数。至匹配n ơ。例如,'o{2,}' 不能匚w "Bob" 中的 'o'Q但能匹?"foooood" 中的所?o?o{1,}' {h(hun)?'o+'?o{0,}' 则等价于 'o*'?/p>
{n,m}
m ?n 均ؓ(f)非负整数Q其中n <= m。最匹?n ơ且最多匹?m ơ。例如,"o{1,3}" 匹?"fooooood" 中的前三?o?o{0,1}' {h(hun)?'o?'。请注意在逗号和两个数之间不能有空根{?/p>
?
当该字符紧跟在Q何一个其他限制符 (*, +, ?, {n}, {n,}, {n,m}) 后面Ӟ匚w模式是非贪婪的。非贪婪模式可能少的匹配所搜烦(ch)的字W串Q而默认的贪婪模式则尽可能多的匚w所搜烦(ch)的字W串。例如,对于字符?"oooo"Q?o+?' 匹配单?"o"Q?'o+' 匹配所?'o'?/p>
.
匚w?"\n" 之外的Q何单个字W。要匚w包括 '\n' 在内的Q何字W,请用象 '[.\n]' 的模式?/p>
(pattern)
匚w pattern q获取这一匚w。所获取的匹配可以从产生?Matches 集合得到Q在VBScript 中?SubMatches 集合Q在JScript 中则使用 $0…$9 属性。要匚w圆括号字W,请?'\(' ?'\)'?/p>
(?:pattern)
匚w pattern 但不获取匚wl果Q也是说这是一个非获取匚wQ不q行存储供以后用。这在?"? 字符 (|) 来组合一个模式的各个部分是很有用。例如, 'industr(?:y|ies) 是一个比 'industry|industries' 更简略的表达式?/p>
(?=pattern)
正向预查Q在M匚w pattern 的字W串开始处匚w查找字符丌Ӏ这是一个非获取匚wQ也是_(d)该匹配不需要获取供以后使用。例如,'Windows (?=95|98|NT|2000)' 能匹?"Windows 2000" 中的 "Windows" Q但不能匚w "Windows 3.1" 中的 "Windows"。预查不消耗字W,也就是说Q在一个匹配发生后Q在最后一ơ匹配之后立卛_始下一ơ匹配的搜烦(ch)Q而不是从包含预查的字W之后开始?/p>
(?!pattern)
负向预查Q在M不匹?pattern 的字W串开始处匚w查找字符丌Ӏ这是一个非获取匚wQ也是_(d)该匹配不需要获取供以后使用。例?Windows (?!95|98|NT|2000)' 能匹?"Windows 3.1" 中的 "Windows"Q但不能匚w "Windows 2000" 中的 "Windows"。预查不消耗字W,也就是说Q在一个匹配发生后Q在最后一ơ匹配之后立卛_始下一ơ匹配的搜烦(ch)Q而不是从包含预查的字W之后开?/p>
x|y
匚w x ?y。例如,'z|food' 能匹?"z" ?"food"?(z|f)ood' 则匹?"zood" ?"food"?/p>
[xyz]
字符集合。匹配所包含的Q意一个字W。例如, '[abc]' 可以匚w "plain" 中的 'a'?/p>
[^xyz]
负值字W集合。匹配未包含的Q意字W。例如, '[^abc]' 可以匚w "plain" 中的'p'?/p>
[a-z]
字符范围。匹配指定范围内的Q意字W。例如,'[a-z]' 可以匚w 'a' ?'z' 范围内的L写字母字符?/p>
[^a-z]
负值字W范围。匹配Q何不在指定范围内的Q意字W。例如,'[^a-z]' 可以匚wM不在 'a' ?'z' 范围内的L字符?/p>
\b
匚w一个单词边界,也就是指单词和空格间的位|。例如, 'er\b' 可以匚w"never" 中的 'er'Q但不能匚w "verb" 中的 'er'?/p>
\B
匚w非单词边界?er\B' 能匹?"verb" 中的 'er'Q但不能匚w "never" 中的 'er'?/p>
\cx
匚w?x 指明的控制字W。例如, \cM 匚w一?Control-M 或回车符。x 的值必Mؓ(f) A-Z ?a-z 之一。否则,?c 视ؓ(f)一个原义的 'c' 字符?/p>
\d
匚w一个数字字W。等价于 [0-9]?/p>
\D
匚w一个非数字字符。等价于 [^0-9]?/p>
\f
匚w一个换늬。等价于 \x0c ?\cL?/p>
\n
匚w一个换行符。等价于 \x0a ?\cJ?/p>
\r
匚w一个回车符。等价于 \x0d ?\cM?/p>
\s
匚wMI白字符Q包括空根{制表符、换늬{等。等价于 [ \f\n\r\t\v]?/p>
\S
匚wM非空白字W。等价于 [^ \f\n\r\t\v]?/p>
\t
匚w一个制表符。等价于 \x09 ?\cI?/p>
\v
匚w一个垂直制表符。等价于 \x0b ?\cK?/p>
\w
匚w包括下划U的M单词字符。等价于'[A-Za-z0-9_]'?/p>
\W
匚wM非单词字W。等价于 '[^A-Za-z0-9_]'?/p>
\xn
匚w nQ其?n 为十六进制{义倹{十六进制{义值必Mؓ(f)定的两个数字长。例如,'\x41' 匚w "A"?\x041' 则等价于 '\x04' & "1"。正则表辑ּ中可以?ASCII ~码?
\num
匚w numQ其?num 是一个正整数。对所获取的匹配的引用。例如,'(.)\1' 匚w两个q箋的相同字W?/p>
\n
标识一个八q制转义值或一个向后引用。如?\n 之前臛_ n 个获取的子表辑ּQ则 n 为向后引用。否则,如果 n 为八q制数字 (0-7)Q则 n Z个八q制转义倹{?/p>
\nm
标识一个八q制转义值或一个向后引用。如?\nm 之前臛_?nm 个获得子表达式,?nm 为向后引用。如?\nm 之前臛_?n 个获取,?n Z个后跟文?m 的向后引用。如果前面的条g都不满Q若 n ?m 均ؓ(f)八进制数?(0-7)Q则 \nm 匹配八q制转义?nm?/p>
\nml
如果 n 为八q制数字 (0-3)Q且 m ?l 均ؓ(f)八进制数?(0-7)Q则匚w八进制{义?nml?/p>
\un
匚w nQ其?n 是一个用四个十六q制数字表示?Unicode 字符。例如, \u00A9 匚w版权W号 (?)?/p>