`
varsoft
  • 浏览: 2439614 次
  • 性别: Icon_minigender_1
  • 来自: 上海
文章分类
社区版块
存档分类
最新评论

正则表达式语法规则收集

阅读更多
turnmissile 的 Blog http://blog.csdn.net/turnmissile/

Microsoft已经把正则表达式的规则收录在了msdn里面了,有兴趣的朋友可以自己去研究一下(ms-help://MS.MSDNQTR.2003OCT.1033/cpgenref/html/cpconRegularExpressionsLanguageElements.htm),这里罗列一些我找到的语法元素功能表,大家自己研究吧!

<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

转意字符表

Escaped character

Description

ordinary characters

Characters other than . $ ^ { [ ( | ) * + ? \ match themselves.

\a

Matches a bell (alarm) \u0007.

\b

Matches a backspace \u0008 if in a [] character class; otherwise, see the note following this table.

\t

Matches a tab \u0009.

\r

Matches a carriage return \u000D.

\v

Matches a vertical tab \u000B.

\f

Matches a form feed \u000C.

\n

Matches a new line \u000A.

\e

Matches an escape \u001B.

\040

Matches an ASCII character as octal (up to three digits); numbers with no leading zero are backreferences if they have only one digit or if they correspond to a capturing group number. (For more information, see Backreferences.) For example, the character\040 represents a space.

\x20

Matches an ASCII character using hexadecimal representation (exactly two digits).

\cC

Match+es an ASCII control character; for example, \cC is control-C.

\u0020

Matches a Unicode character using hexadecimal representation (exactly four digits).

\

When followed by a character that is not recognized as an escaped character, matches that character. For example, \* is the same as \x2A.

NoteThe escaped character \b is a special case. In a regular expression, \b denotes a word boundary (between \w and \W characters) except within a [] character class, where \b refers to the backspace character. In a replacement pattern, \b always denotes a backspace.

字符集

A character class is a set of characters that will find a match if any one of the characters included in the set matches. The following table summarizes character matching syntax.

Character class

Description

.

Matches any character except \n. If modified by the Singleline option, a period character matches any character. For more information, see Regular Expression Options.

[aeiou]

Matches any single character included in the specified set of characters.

[^aeiou]

Matches any single character not in the specified set of characters.

[0-9a-fA-F]

Use of a hyphen () allows specification of contiguous character ranges.

\p{name}

Matches any character in the named character class specified by {name}. Supported names are Unicode groups and block ranges. For example, Ll, Nd, Z, IsGreek, IsBoxDrawing.

\P{name}

Matches text not included in groups and block ranges specified in {name}.

\w

Matches any word character. Equivalent to the Unicode character categories
[\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}]. If ECMAScript-compliant behavior is specified with the ECMAScript option, \w is equivalent to [a-zA-Z_0-9].

\W

Matches any nonword character. Equivalent to the Unicode categories [^\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}]. If ECMAScript-compliant behavior is specified with the ECMAScript option, \W is equivalent to [^a-zA-Z_0-9].

\s

Matches any white-space character. Equivalent to the Unicode character categories [\f\n\r\t\v\x85\p{Z}]. If ECMAScript-compliant behavior is specified with the ECMAScript option, \s is equivalent to [ \f\n\r\t\v].

\S

Matches any non-white-space character. Equivalent to the Unicode character categories [^\f\n\r\t\v\x85\p{Z}]. If ECMAScript-compliant behavior is specified with the ECMAScript option, \S is equivalent to [^ \f\n\r\t\v].

\d

Matches any decimal digit. Equivalent to \p{Nd} for Unicode and [0-9] for non-Unicode, ECMAScript behavior.

\D

Matches any nondigit. Equivalent to \P{Nd} for Unicode and [^0-9] for non-Unicode, ECMAScript behavior.

You can find the Unicode category a character belongs to with the method

正则表达式选项

and ECMAScript are not allowed inline.

RegexOption member

Inline character

Description

None

N/A

Specifies that no options are set.

IgnoreCase

i

Specifies case-insensitive matching.

Multiline

m

Specifies multiline mode. Changes the meaning of ^ and $ so that they match at the beginning and end, respectively, of any line, not just the beginning and end of the whole string.

ExplicitCapture

n

Specifies that the only valid captures are explicitly named or numbered groups of the form (?<name>…). This allows parentheses to act as noncapturing groups without the syntactic clumsiness of (?:…).

Compiled

N/A

Specifies that the regular expression will be compiled to an assembly. Generates Microsoft intermediate language (MSIL) code for the regular expression; yields faster execution at the expense of startup time.

Singleline

s

Specifies single-line mode. Changes the meaning of the period character (.) so that it matches every character (instead of every character except \n).

IgnorePatternWhitespace

x

Specifies that unescaped white space is excluded from the pattern and enables comments following a number sign (#). (For a list of escaped white-space characters, see Character Escapes.) Note that white space is never eliminated from within a character class.

RightToLeft

N/A

Specifies that the search moves from right to left instead of from left to right. A regular expression with this option moves to the left of the starting position instead of to the right. (Therefore, the starting position should be specified as the end of the string instead of the beginning.) This option cannot be specified in midstream, to prevent the possibility of crafting regular expressions with infinite loops. However, the (?<) lookbehind constructs provide something similar that can be used as a subexpression.

RightToLeft changes the search direction only. It does not reverse the substring that is searched for. The lookahead and lookbehind assertions do not change: lookahead looks to the right; lookbehind looks to the left.

ECMAScript

N/A

Specifies that ECMAScript-compliant behavior is enabled for the expression. This option can be used only in conjunction with the IgnoreCase and Multiline flags. Use of ECMAScript with any other flags results in an exception.

CultureInvariant

N/A

Specifies that cultural differences in language is ignored. See Performing Culture-Insensitive Operations in the RegularExpressions Namespace for more information.

Atomic Zero-Width Assertions

Assertion

Description

^

Specifies that the match must occur at the beginning of the string or the beginning of the line. For more information, see the Multiline option in Regular Expression Options.

$

Specifies that the match must occur at the end of the string, before \n at the end of the string, or at the end of the line. For more information, see the Multiline option in Regular Expression Options.

\A

Specifies that the match must occur at the beginning of the string (ignores the Multiline option).

\Z

Specifies that the match must occur at the end of the string or before \n at the end of the string (ignores the Multiline option).

\z

Specifies that the match must occur at the end of the string (ignores the Multiline option).

\G

Specifies that the match must occur at the point where the previous match ended. When used with Match.NextMatch(), this ensures that matches are all contiguous.

\b

Specifies that the match must occur on a boundary between \w (alphanumeric) and \W (nonalphanumeric) characters. The match must occur on word boundaries — that is, at the first or last characters in words separated by any nonalphanumeric characters.

\B

Specifies that the match must not occur on a \b boundary.

数量

Quantifier

Description

*

Specifies zero or more matches; for example, \w* or (abc)*. Equivalent to {0,}.

+

Specifies one or more matches; for example, \w+ or (abc)+. Equivalent to {1,}.

?

Specifies zero or one matches; for example, \w? or (abc)?. Equivalent to {0,1}.

{n}

Specifies exactly n matches; for example, (pizza){2}.

{n,}

Specifies at least n matches; for example, (abc){2,}.

{n,m}

Specifies at least n, but no more than m, matches.

*?

Specifies the first match that consumes as few repeats as possible (equivalent to lazy *).

+?

Specifies as few repeats as possible, but at least one (equivalent to lazy +).

??

Specifies zero repeats if possible, or one (lazy ?).

{n}?

Equivalent to {n} (lazy {n}).

{n,}?

Specifies as few repeats as possible, but at least n (lazy {n,}).

{n,m}?

Specifies as few repeats as possible between n and m (lazy {n,m}).

组构造

Grouping constructs allow you to capture groups of subexpressions and to increase the efficiency of regular expressions with noncapturing lookahead and lookbehind modifiers. The following table describes the Regular Expression Grouping Constructs.

Grouping construct

Description

()

Captures the matched substring (or noncapturing group; for more information, see the ExplicitCapture option in <a href=

分享到:
评论

相关推荐

    精通正则表达式~~~

    对未知正则表达式进行语法检查... 475 递归的正则表达式... 475 匹配嵌套括号内的文本... 475 不能回溯到递归调用之内... 477 匹配一组嵌套的括号... 478 PHP效率... 478 模式修饰符S:“研究”. 478 扩展...

    正则表达式

    由于某些字符类非常常用,所以JavaScript的正则表达式语法包含一些特殊字符和转义序列来表示这些常用的类.例如, \s 匹配的是空格符,制表符和其它空白符, \s 匹配的则是空白符之外的任何字符. 正则表灰式的字符类 ...

    乱象,印迹 正则学习问答

    在问答过程中,我收集到学习正则表达式过程中的某些普遍问题,在这里专门花一点篇幅来回答 正则表达式是难学的,这不存在疑义。但是我认为,难点也只在语法方面。正则表达式已经有年头了,它(的语法)诞生于上世纪...

    [Excel.VBA程序开发自学宝典(第2版)].罗刚君.扫描版.pdf

    《ExcelVBA程序开发自学宝典(第2版)》是VBA入门的经典教材,对VBA的基础理论、语法规则、代码优化、编写思路、开发函数与使用数组等都进行了详尽的理论阐述和案例演示,同时还搭配窗体与控件、正则表达式、字典、...

    Perl 实例精解(第三版).pdf

    对每个主题都使用了范例说明,从Perl变量、正则表达式,到编写报表、CGI脚本和网络应用,并对范例程序进行了解释。读者可以很容易地通过这些范例掌握Perl语言的语法。附录含有完整的函数和定义、命令行开关、特殊...

    简单但绝不简陋的 Python3 爬虫项目.zip

    常用的解析工具有正则表达式、XPath、Beautiful Soup等。这些工具帮助爬虫定位和提取目标数据,如文本、图片、链接等。 数据存储: 爬虫将提取的数据存储到数据库、文件或其他存储介质中,以备后续分析或展示。常用...

    【最新版】Understand-5.1.1023-MacOSX-x86.dmg【亲测可用】最好的代码工具

    Understand还提供了用于更自定义和复杂搜索的搜索选项,例如正则表达式和通配符搜索。 Understand支持十多种语言,并且可以处理以多种语言编写的代码库。这使您可以查看语言之间的调用和依赖关系,从而可以获得有关...

    美国..现代编译原理C语言描述.高清版

    2.4.1 将正则表达式转换为NFA 16 2.4.2 将NFA转换为DFA 18 2.5 Lex:词法分析器的生成器 20 程序设计:词法分析 22 推荐阅读 23 习题 23 第3章 语法分析 27 3.1 上下文无关文法 28 3.1.1 推导 29 3.1.2 语法分析树 ...

    Java开发技术大全 电子版

    13.7.3正则表达式中的一些高级规则421 13.7.4正则表达式中的其他通用规则424 13.7.5使用技巧425 13.8Pattern类的使用426 13.9Matcher类的使用428 13.9.1匹配方法的使用429 13.9.2替换方法的使用430 13.9.3组...

    Oracle_Database_11g完全参考手册.part3/3

    第8章 正则表达式搜索 8.1 搜索串 8.2 REGExP_SUBSTR 8.3 REGEXP_INSTR 8.4 REGEXP-LIKE 8.5 REPLACE和REGEXPRRPIACE 第9章 数值处理 9.1 三类数值函数 9.2 表示法 9.3 单值函数 9.3.1 加减乘除 9.3.2 NULL 9.3.3 ...

    Oracle_Database_11g完全参考手册.part2/3

    第8章 正则表达式搜索 8.1 搜索串 8.2 REGExP_SUBSTR 8.3 REGEXP_INSTR 8.4 REGEXP-LIKE 8.5 REPLACE和REGEXPRRPIACE 第9章 数值处理 9.1 三类数值函数 9.2 表示法 9.3 单值函数 9.3.1 加减乘除 9.3.2 NULL 9.3.3 ...

    javaSE代码实例

    10.1.1 编写构造器的语法规则 176 10.1.2 访问限制修饰符与构造器 176 10.1.3 构造器与返回类型 179 10.2 创建对象 180 10.3 重载构造器 181 10.4 级联调用的构造器 182 10.4.1 构造器的调用流程及默认...

    SQL培训第一期

    1.8.10.1 oracle正则表达式:去除&lt;&gt;&lt;/&gt;格式 select REGEXP_REPLACE(title,'&lt;[^&gt;]*&gt;','') title from exam_question 1.8.11 rank() over (partition by …) 1.8.11.1 语法 select organcode,score,ranknum from ( ...

    asp.net知识库

    ASP.NET 中的正则表达式 常用的匹配正则表达式和实例 经典正则表达式 delegate vs. event 我是谁?[C#] 表达式计算引擎 正式发布表达式计算引擎WfcExp V0.9(附源码) 运算表达式类的原理及其实现 #实现的18位身份证...

    你必须知道的495个C语言问题

    3.7 是否可以安全地认为,一旦&&和||左边的表达式已经决定了整个表达式的结果,则右边的表达式不会被求值? 3.8 为什么表达式printf("%d%d",f1(),f2());先调用了f2?我觉得逗号表达式应该确保从左到右的求值顺序...

    《你必须知道的495个C语言问题》

    3.7 是否可以安全地认为,一旦&&和||左边的表达式已经决定了整个表达式的结果,则右边的表达式不会被求值? 36  3.8 为什么表达式printf("%d %d", f1(), f2()); 先调用了f2?我觉得逗号表达式应该确保从左到右的...

Global site tag (gtag.js) - Google Analytics