Regex 速查表：正则表达式完全参考指南

正则表达式 (regex) 是开发者工具箱中最强大的工具之一。无论是验证用户输入、解析日志文件，还是执行复杂的搜索替换操作，正则表达式都能提供无与伦比的精确度。本 正则表达式完全速查表 涵盖所有核心概念：锚点、字符类、量词、分组、环视断言和标志位。

在我们的 Regex 测试器中实时测试每个模式 ->

基础语法：锚点与字符类

锚点和字符类构成了每个正则模式的基础。锚点指定匹配在字符串中何处发生。字符类定义匹配哪些字符。

锚点

锚点不匹配字符本身，而是匹配字符串中的位置。

Pattern	Description	Example
`^`	Start of string (or line with `m` flag)	`^Hello` matches "Hello World"
`$`	End of string (or line with `m` flag)	`world$` matches "Hello world"
`\b`	Word boundary	`\bcat\b` matches "cat" but not "catch"
`\B`	Non-word boundary	`\Bcat\B` matches "concatenate"
`\A`	Absolute start of string (Python, Ruby)	`\AHello`
`\Z`	Absolute end of string (Python, Ruby)	`bye\Z`

字符类

字符类允许在单个位置匹配一组字符。

Pattern	Description	Equivalent
`[abc]`	Match a, b, or c	--
`[^abc]`	Match anything except a, b, c	--
`[a-z]`	Match any lowercase letter	--
`[A-Z]`	Match any uppercase letter	--
`[0-9]`	Match any digit	`\d`
`.`	Match any character (except newline by default)	--
`\d`	Digit	`[0-9]`
`\D`	Non-digit	`[^0-9]`
`\w`	Word character	`[a-zA-Z0-9_]`
`\W`	Non-word character	`[^a-zA-Z0-9_]`
`\s`	Whitespace (space, tab, newline)	`[ \t\n\r\f\v]`
`\S`	Non-whitespace	`[^ \t\n\r\f\v]`

特殊字符（元字符）

这些字符在正则中有特殊含义。要按字面意义匹配，需要用反斜杠转义。

Special characters that need escaping:
.  ^  $  *  +  ?  {  }  [  ]  \  |  (  )

To match a literal dot:   \.
To match a literal star:  \*
To match a literal pipe:  \|
To match a backslash:     \\

量词：匹配次数

量词控制前置元素必须出现的次数。默认情况下，量词是 贪婪的——尽可能多地匹配。追加 ? 使其变为懒惰（尽可能少地匹配）。

Greedy	Lazy	Description
`*`	`*?`	0 or more times
`+`	`+?`	1 or more times
`?`	`??`	0 or 1 time (optional)
`{n}`	`{n}?`	Exactly n times
`{n,}`	`{n,}?`	n or more times
`{n,m}`	`{n,m}?`	Between n and m times

贪婪 vs. 懒惰量词

贪婪模式 <.*> 匹配整个字符串，而懒惰模式 <.*?> 只匹配第一个标签。

// Input string:
const str = '<b>bold</b> and <i>italic</i>';

// Greedy: matches from first < to LAST >
str.match(/<.*>/);
// Result: '<b>bold</b> and <i>italic</i>'

// Lazy: matches from first < to FIRST >
str.match(/<.*?>/);
// Result: '<b>'

分组与捕获

分组允许将多个字符视为一个单元，对子表达式应用量词，并提取匹配的部分。

捕获组

将子表达式用括号 (...) 包裹以捕获匹配。可用反向引用 \1、\2 等引用捕获组。

// Capturing group example
const dateRegex = /^(\d{4})-(\d{2})-(\d{2})$/;
const match = '2026-02-10'.match(dateRegex);
// match[0] = '2026-02-10'  (full match)
// match[1] = '2026'        (year)
// match[2] = '02'          (month)
// match[3] = '10'          (day)

// Backreference: match repeated words
const repeated = /\b(\w+)\s+\1\b/;
repeated.test('the the');  // true
repeated.test('the cat');  // false

命名组

命名组通过 (?<name>...) 语法提高可读性。

// Named groups in JavaScript
const dateRegex = /^(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})$/;
const match = '2026-02-10'.match(dateRegex);
// match.groups.year  = '2026'
// match.groups.month = '02'
// match.groups.day   = '10'

# Named groups in Python
import re
pattern = r'^(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})$'
m = re.match(pattern, '2026-02-10')
# m.group('year')  = '2026'
# m.group('month') = '02'
# m.group('day')   = '10'

非捕获组

当需要分组但不需要捕获时使用 (?:...)。

// Non-capturing group
const regex = /(?:https?|ftp):\/\/[^\s]+/;
// Groups the protocol options without capturing them
// Only the full URL is in match[0]

交替（OR）

在分组中使用管道符 | 匹配多个选项之一。

// Alternation examples
/cat|dog/           // Match "cat" or "dog"
/(red|blue) car/    // Match "red car" or "blue car"
/^(GET|POST|PUT|DELETE)\s/  // Match HTTP methods

环视断言（零宽断言）

环视断言检查当前位置的前面或后面是否存在特定模式，而不消耗字符。这使它们在处理复杂匹配条件时非常强大。

Syntax	Type	Description
`(?=...)`	Positive lookahead	What follows must match
`(?!...)`	Negative lookahead	What follows must NOT match
`(?<=...)`	Positive lookbehind	What precedes must match
`(?<!...)`	Negative lookbehind	What precedes must NOT match

正向前瞻 (?=...)

匹配 后面跟着 指定模式的位置。

// Match a number followed by "px"
/\d+(?=px)/
// "20px 30em 40px" → matches "20" and "40" (not "30")

// Password: must contain at least one digit
/^(?=.*\d).{8,}$/

负向前瞻 (?!...)

匹配 后面没有 指定模式的位置。

// Match "cat" NOT followed by "fish"
/cat(?!fish)/
// "catfish catdog" → matches "cat" in "catdog" only

// Match numbers NOT followed by a unit
/\d+(?!\s*(px|em|rem|%))/

正向后顾 (?<=...)

匹配 前面是 指定模式的位置。

// Match a number preceded by "$"
/(?<=\$)\d+(\.\d{2})?/
// "$49.99 and €29.99" → matches "49.99" only

// Extract value after "price:"
/(?<=price:\s*)\d+/

负向后顾 (?<!...)

匹配 前面不是 指定模式的位置。

// Match "cat" NOT preceded by "wild"
/(?<!wild)cat/
// "wildcat housecat" → matches "cat" in "housecat" only

// Match digits not preceded by a minus sign
/(?<!-)\b\d+\b/

注意：后顾断言在某些正则引擎中支持有限。JavaScript (ES2018+) 支持，但较旧的引擎可能不支持。Go 的 RE2 引擎不支持任何环视断言。

正则标志位（修饰符）

标志位修改正则引擎解释模式的方式。

Flag	Name	Description	Example
`g`	Global	Find all matches, not just the first	`/cat/g` finds all "cat" occurrences
`i`	Case-insensitive	Match upper and lowercase interchangeably	`/hello/i` matches "Hello", "HELLO"
`m`	Multiline	`^` and `$` match line starts/ends	`/^start/m` matches at each line start
`s`	Dotall (Single-line)	`.` matches newline characters too	`/a.b/s` matches "a\nb"
`u`	Unicode	Enable full Unicode matching	`/\u{1F600}/u` matches emoji
`y`	Sticky	Match only at `lastIndex` position	Used for tokenizing / lexing

// Combining flags
const regex = /^hello world$/gim;

// In Python, flags are constants:
import re
pattern = re.compile(r'^hello world$', re.IGNORECASE | re.MULTILINE)

// In Go, use inline flags:
// (?i) for case-insensitive, (?m) for multiline, (?s) for dotall
regexp.MustCompile("(?im)^hello world$")

常用模式速查

以下是最常用的正则模式。在我们的正则测试器中测试任意模式。

Use Case	Pattern
Email	`^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$`
URL (HTTP/S)	`^https?:\/\/[^\s]+$`
IPv4 Address	`^((25[0-5]\|(2[0-4]\|1\d\|[1-9]\|)\d)\.?\b){4}$`
Date (YYYY-MM-DD)	`^\d{4}-(0[1-9]\|1[0-2])-(0[1-9]\|[12]\d\|3[01])$`
Time (HH:MM:SS)	`^([01]\d\|2[0-3]):[0-5]\d:[0-5]\d$`
Hex Color	`^#([0-9A-Fa-f]{3}\|[0-9A-Fa-f]{6})$`
Strong Password	`^(?=.[a-z])(?=.[A-Z])(?=.\d)(?=.[@$!%*?&]).{8,}$`
Phone (E.164)	`^\+[1-9]\d{1,14}$`
UUID	`^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$`
Semantic Version	`^(0\|[1-9]\d)\.(0\|[1-9]\d)\.(0\|[1-9]\d*)(-[\w.]+)?(\+[\w.]+)?$`
HTML Tag	`<([a-zA-Z][a-zA-Z0-9])\b[^>]>(.*?)<\/\1>`
Trim Whitespace	`^\s+\|\s+$`

使用我们的 Regex 测试器测试所有模式 ->

不同编程语言中的正则

虽然正则语法基本通用，但每种语言有自己的 API 来创建、测试和应用模式。

JavaScript

// Creating regex in JavaScript
const regex1 = /^\d+$/;             // Literal syntax
const regex2 = new RegExp('^\\d+$'); // Constructor (needs double-escape)

// Testing
regex1.test('12345');                // true

// Matching
'hello world'.match(/\w+/g);        // ['hello', 'world']

// Replacing
'2026-02-10'.replace(
  /^(\d{4})-(\d{2})-(\d{2})$/,
  '$2/$3/$1'
);  // '02/10/2026'

// matchAll (ES2020) - get all matches with groups
const text = 'Price: $10, Tax: $2';
for (const m of text.matchAll(/\$(\d+)/g)) {
  console.log(m[0], m[1]);
  // '$10' '10', then '$2' '2'
}

// Named groups (ES2018+)
const { groups } = '2026-02-10'.match(
  /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/
);
console.log(groups.year);  // '2026'

Python

import re

# Compile for reuse (recommended)
pattern = re.compile(r'^\d+$')

# Test if the entire string matches
pattern.match('12345')      # Match object (truthy)
pattern.match('abc')        # None (falsy)

# Search anywhere in the string
re.search(r'\d+', 'abc 123 def')  # Finds '123'

# Find all matches
re.findall(r'\w+', 'hello world')  # ['hello', 'world']

# Replace
re.sub(
    r'^(\d{4})-(\d{2})-(\d{2})$',
    r'\2/\3/\1',
    '2026-02-10'
)  # '02/10/2026'

# Named groups
m = re.match(
    r'(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})',
    '2026-02-10'
)
m.group('year')   # '2026'
m.group('month')  # '02'

# Flags
re.findall(r'^start', text, re.MULTILINE | re.IGNORECASE)

# Split by pattern
re.split(r'[,;\s]+', 'a, b; c  d')  # ['a', 'b', 'c', 'd']

Go (Golang)

package main

import (
    "fmt"
    "regexp"
)

func main() {
    // Compile (panics on invalid pattern)
    re := regexp.MustCompile(`^\d+$`)

    // Test
    fmt.Println(re.MatchString("12345"))  // true
    fmt.Println(re.MatchString("abc"))    // false

    // Find first match
    re2 := regexp.MustCompile(`\d+`)
    fmt.Println(re2.FindString("abc 123 def"))  // "123"

    // Find all matches
    fmt.Println(re2.FindAllString("10 cats and 20 dogs", -1))
    // ["10", "20"]

    // Replace
    re3 := regexp.MustCompile(`(\d{4})-(\d{2})-(\d{2})`)
    result := re3.ReplaceAllString("2026-02-10", "$2/$3/$1")
    fmt.Println(result)  // "02/10/2026"

    // Named groups
    re4 := regexp.MustCompile(`(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})`)
    match := re4.FindStringSubmatch("2026-02-10")
    for i, name := range re4.SubexpNames() {
        if name != "" {
            fmt.Printf("%s: %s\n", name, match[i])
        }
    }
    // year: 2026, month: 02, day: 10

    // Inline flags: (?i) case-insensitive, (?m) multiline, (?s) dotall
    re5 := regexp.MustCompile(`(?i)hello`)
    fmt.Println(re5.MatchString("HELLO"))  // true
}

Go 使用 RE2 引擎，不支持前瞻、后顾和反向引用。这是为了保证线性时间匹配的设计决策。

性能技巧与最佳实践

精确匹配：当你知道要匹配什么字符时，使用 [a-zA-Z] 而不是 .。精确的模式更快且不易出错。

避免灾难性回溯：嵌套量词如 (a+)+ 可能导致指数级时间复杂度。

// BAD: Catastrophic backtracking risk
const bad = /^(a+)+$/;
bad.test('aaaaaaaaaaaaaaaaaaaaa!');  // Extremely slow!

// GOOD: Flatten nested quantifiers
const good = /^a+$/;
good.test('aaaaaaaaaaaaaaaaaaaaa!'); // Instant: false

使用非捕获组：当不需要提取匹配文本时，使用 (?:...) 以提高性能。

编译一次，多次使用：在 Python 和 Go 中，编译一次正则并重复使用编译后的对象。

# Python: compile once, reuse many times
import re
email_re = re.compile(r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$')

# Fast: uses pre-compiled pattern
for addr in addresses:
    if email_re.match(addr):
        print(f"Valid: {addr}")

增量测试：逐步构建复杂模式，单独测试每个部分后再组合。

在我们的 Regex 测试器中实时测试每个模式 ->

常见问题

什么是正则表达式速查表？为什么需要它？

正则表达式速查表是将语法、元字符、量词、标志位和常用模式汇集在一处的快速参考指南。即使是有经验的开发者也无法记住所有正则构造。速查表节省时间，减少编写正则时的错误。

正则中 .* 和 .*? 有什么区别？

.* 是贪婪量词，匹配尽可能多的字符（最长匹配）；.*? 是懒惰量词，匹配尽可能少的字符（最短匹配）。当你需要在第一个分隔符处停止时，使用懒惰量词。

所有编程语言都支持相同的正则语法吗？

大多数语言支持 PCRE 或其变体，核心特性如字符类、量词和分组在所有语言中都可用。但高级特性有差异：JavaScript 在 ES2018 才添加后顾断言，Go (RE2) 完全不支持环视断言和反向引用。

如何测试和调试正则模式？

最好的方式是使用交互式正则测试器，实时显示匹配结果。DevToolBox 的 Regex 测试器可以输入模式和测试字符串，高亮显示匹配并查看捕获组。逐步构建模式，单独测试每部分。

什么是灾难性回溯？如何避免？

灾难性回溯发生在正则引擎需要指数级时间来确定字符串不匹配时。通常由嵌套量词如 (a+)+ 导致。避免方法：使用精确的字符类，避免嵌套重复，使用原子组，对长的不匹配字符串测试性能。