正则表达式语法规则收集

varsoft

浏览: 2439614 次
性别:
来自: 上海

最近访客更多访客>>

wangyy

u012363178

songhait

骑驴找骆驼

博主相关

博客

微博

相册

留言

关于我

文章分类

全部博客 (3320)

社区版块

存档分类

正则表达式 Office Microsoft F#Blog

Microsoft已经把正则表达式的规则收录在了msdn里面了，有兴趣的朋友可以自己去研究一下(ms-help://MS.MSDNQTR.2003OCT.1033/cpgenref/html/cpconRegularExpressionsLanguageElements.htm)，这里罗列一些我找到的语法元素功能表，大家自己研究吧！

<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

转意字符表

Escaped character	Description
ordinary characters	Characters other than . $ ^ { [ ( \| ) * + ? \ match themselves.
\a	Matches a bell (alarm) \u0007.
\b	Matches a backspace \u0008 if in a [] character class; otherwise, see the note following this table.
\t	Matches a tab \u0009.
\r	Matches a carriage return \u000D.
\v	Matches a vertical tab \u000B.
\f	Matches a form feed \u000C.
\n	Matches a new line \u000A.
\e	Matches an escape \u001B.
\040	Matches an ASCII character as octal (up to three digits); numbers with no leading zero are backreferences if they have only one digit or if they correspond to a capturing group number. (For more information, see Backreferences.) For example, the character`\040` represents a space.
\x20	Matches an ASCII character using hexadecimal representation (exactly two digits).
\cC	Match+es an ASCII control character; for example, `\cC` is control-C.
\u0020	Matches a Unicode character using hexadecimal representation (exactly four digits).
\	When followed by a character that is not recognized as an escaped character, matches that character. For example, \* is the same as \x2A.

NoteThe escaped character \b is a special case. In a regular expression, \b denotes a word boundary (between \w and \W characters) except within a [] character class, where \b refers to the backspace character. In a replacement pattern, \b always denotes a backspace.

字符集

A character class is a set of characters that will find a match if any one of the characters included in the set matches. The following table summarizes character matching syntax.

Character class	Description
.	Matches any character except \n. If modified by the Singleline option, a period character matches any character. For more information, see Regular Expression Options.
[aeiou]	Matches any single character included in the specified set of characters.
[^aeiou]	Matches any single character not in the specified set of characters.
[0-9a-fA-F]	Use of a hyphen (`–`) allows specification of contiguous character ranges.
\p{name}	Matches any character in the named character class specified by {name}. Supported names are Unicode groups and block ranges. For example, Ll, Nd, Z, IsGreek, IsBoxDrawing.
\P{name}	Matches text not included in groups and block ranges specified in {name}.
\w	Matches any word character. Equivalent to the Unicode character categories `[\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}]`. If ECMAScript-compliant behavior is specified with the ECMAScript option, \w is equivalent to [a-zA-Z_0-9].
\W	Matches any nonword character. Equivalent to the Unicode categories `[^\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}]`. If ECMAScript-compliant behavior is specified with the ECMAScript option, \W is equivalent to [^a-zA-Z_0-9].
\s	Matches any white-space character. Equivalent to the Unicode character categories `[\f\n\r\t\v\x85\p{Z}]`. If ECMAScript-compliant behavior is specified with the ECMAScript option, \s is equivalent to [ \f\n\r\t\v].
\S	Matches any non-white-space character. Equivalent to the Unicode character categories `[^\f\n\r\t\v\x85\p{Z}]`. If ECMAScript-compliant behavior is specified with the ECMAScript option, \S is equivalent to [^ \f\n\r\t\v].
\d	Matches any decimal digit. Equivalent to \p{Nd} for Unicode and [0-9] for non-Unicode, ECMAScript behavior.
\D	Matches any nondigit. Equivalent to \P{Nd} for Unicode and [^0-9] for non-Unicode, ECMAScript behavior.

You can find the Unicode category a character belongs to with the method

正则表达式选项

and ECMAScript are not allowed inline.

RegexOption member	Inline character	Description
None	N/A	Specifies that no options are set.
IgnoreCase	i	Specifies case-insensitive matching.
Multiline	m	Specifies multiline mode. Changes the meaning of ^ and $ so that they match at the beginning and end, respectively, of any line, not just the beginning and end of the whole string.
ExplicitCapture	n	Specifies that the only valid captures are explicitly named or numbered groups of the form (?<name>…). This allows parentheses to act as noncapturing groups without the syntactic clumsiness of (?:…).
Compiled	N/A	Specifies that the regular expression will be compiled to an assembly. Generates Microsoft intermediate language (MSIL) code for the regular expression; yields faster execution at the expense of startup time.
Singleline	s	Specifies single-line mode. Changes the meaning of the period character (.) so that it matches every character (instead of every character except \n).
IgnorePatternWhitespace	x	Specifies that unescaped white space is excluded from the pattern and enables comments following a number sign (#). (For a list of escaped white-space characters, see Character Escapes.) Note that white space is never eliminated from within a character class.
RightToLeft	N/A	Specifies that the search moves from right to left instead of from left to right. A regular expression with this option moves to the left of the starting position instead of to the right. (Therefore, the starting position should be specified as the end of the string instead of the beginning.) This option cannot be specified in midstream, to prevent the possibility of crafting regular expressions with infinite loops. However, the (?<) lookbehind constructs provide something similar that can be used as a subexpression. RightToLeft changes the search direction only. It does not reverse the substring that is searched for. The lookahead and lookbehind assertions do not change: lookahead looks to the right; lookbehind looks to the left.
ECMAScript	N/A	Specifies that ECMAScript-compliant behavior is enabled for the expression. This option can be used only in conjunction with the IgnoreCase and Multiline flags. Use of ECMAScript with any other flags results in an exception.
CultureInvariant	N/A	Specifies that cultural differences in language is ignored. See Performing Culture-Insensitive Operations in the RegularExpressions Namespace for more information.

Atomic Zero-Width Assertions

Assertion	Description
^	Specifies that the match must occur at the beginning of the string or the beginning of the line. For more information, see the Multiline option in Regular Expression Options.
$	Specifies that the match must occur at the end of the string, before \n at the end of the string, or at the end of the line. For more information, see the Multiline option in Regular Expression Options.
\A	Specifies that the match must occur at the beginning of the string (ignores the Multiline option).
\Z	Specifies that the match must occur at the end of the string or before \n at the end of the string (ignores the Multiline option).
\z	Specifies that the match must occur at the end of the string (ignores the Multiline option).
\G	Specifies that the match must occur at the point where the previous match ended. When used with `Match.NextMatch()`, this ensures that matches are all contiguous.
\b	Specifies that the match must occur on a boundary between \w (alphanumeric) and \W (nonalphanumeric) characters. The match must occur on word boundaries — that is, at the first or last characters in words separated by any nonalphanumeric characters.
\B	Specifies that the match must not occur on a \b boundary.

数量

Quantifier	Description
*	Specifies zero or more matches; for example, `\w` or `(abc)`. Equivalent to {0,}.
+	Specifies one or more matches; for example, `\w+` or `(abc)+`. Equivalent to {1,}.
?	Specifies zero or one matches; for example, `\w?` or `(abc)?`. Equivalent to {0,1}.
{n}	Specifies exactly n matches; for example, `(pizza){2}`.
{n,}	Specifies at least n matches; for example, `(abc){2,}`.
{n,m}	Specifies at least n, but no more than m, matches.
*?	Specifies the first match that consumes as few repeats as possible (equivalent to lazy *).
+?	Specifies as few repeats as possible, but at least one (equivalent to lazy `+`).
??	Specifies zero repeats if possible, or one (lazy `?`).
{n}?	Equivalent to {n} (lazy {n}).
{n,}?	Specifies as few repeats as possible, but at least n (lazy {n,}).
{n,m}?	Specifies as few repeats as possible between n and m (lazy {n,m}).

组构造

Grouping constructs allow you to capture groups of subexpressions and to increase the efficiency of regular expressions with noncapturing lookahead and lookbehind modifiers. The following table describes the Regular Expression Grouping Constructs.

Grouping construct

Description

()

Captures the matched substring (or noncapturing group; for more information, see the ExplicitCapture option in <a href=

分享到：

拖放 DataGrid 列--来自MSDN | alert在asp.net中如何使用？？

2004-10-26 19:58
浏览 795
评论(0)
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论