Blame view

3rdparty/boost_1_81_0/libs/regex/doc/faq.qbk 4.56 KB
73ef4ff3   Hu Chunming   提交三方库
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
  [/ 
    Copyright 2006-2007 John Maddock.
    Distributed under the Boost Software License, Version 1.0.
    (See accompanying file LICENSE_1_0.txt or copy at
    http://www.boost.org/LICENSE_1_0.txt).
  ]
  
  [section:faq FAQ]
  
  [*Q.] I can't get regex++ to work with escape characters, what's going on?
  
  [*A.] If you embed regular expressions in C++ code, then remember that escape 
  characters are processed twice: once by the C++ compiler, and once by the 
  Boost.Regex expression compiler, so to pass the regular expression \d+ 
  to Boost.Regex, you need to embed "\\d+" in your code. Likewise to match a 
  literal backslash you will need to embed "\\\\" in your code.
  
  [*Q.] No matter what I do regex_match always returns false, what's going on?
  
  [*A.] The algorithm regex_match only succeeds if the expression matches *all*
  of the text, if you want to *find* a sub-string within the text that matches
  the expression then use regex_search instead.
  
  [*Q.] Why does using parenthesis in a POSIX regular expression change the 
  result of a match?
  
  [*A.] For POSIX (extended and basic) regular expressions, but not for perl regexes, 
  parentheses don't only mark; they determine what the best match is as well. 
  When the expression is compiled as a POSIX basic or extended regex then Boost.Regex 
  follows the POSIX standard leftmost longest rule for determining what matched. 
  So if there is more than one possible match after considering the whole expression, 
  it looks next at the first sub-expression and then the second sub-expression 
  and so on. So...
  
  "'''(0*)([0-9]*)'''" against "00123" would produce
  $1 = "00"
  $2 = "123"
  
  where as
  
  "0*([0-9])*" against "00123" would produce
  $1 = "00123"
  
  If you think about it, had $1 only matched the "123", this would be "less good" 
  than the match "00123" which is both further to the left and longer. If you 
  want $1 to match only the "123" part, then you need to use something like:
  
  "0*([1-9][0-9]*)"
  
  as the expression.
  
  [*Q.] Why don't character ranges work properly (POSIX mode only)?
  
  [*A.] The POSIX standard specifies that character range expressions are 
  locale sensitive - so for example the expression [A-Z] will match any 
  collating element that collates between 'A' and 'Z'. That means that for 
  most locales other than "C" or "POSIX", [A-Z] would match the single 
  character 't' for example, which is not what most people expect - or 
  at least not what most people have come to expect from regular 
  expression engines. For this reason, the default behaviour of Boost.Regex 
  (perl mode) is to turn locale sensitive collation off by not setting the 
  `regex_constants::collate` compile time flag. However if you set a non-default 
  compile time flag - for example `regex_constants::extended` or 
  `regex_constants::basic`, then locale dependent collation will be enabled, 
  this also applies to the POSIX API functions which use either 
  `regex_constants::extended` or `regex_constants::basic` internally. 
  [Note - when `regex_constants::nocollate` in effect, the library behaves 
  "as if" the LC_COLLATE locale category were always "C", regardless of what 
  its actually set to - end note].
  
  [*Q.] Why are there no throw specifications on any of the functions? 
  What exceptions can the library throw?
  
  [*A.] Not all compilers support (or honor) throw specifications, others 
  support them but with reduced efficiency. Throw specifications may be added 
  at a later date as compilers begin to handle this better. The library 
  should throw only three types of exception: [boost::regex_error] can be 
  thrown by [basic_regex] when compiling a regular expression, `std::runtime_error` 
  can be thrown when a call to `basic_regex::imbue` tries to open a message 
  catalogue that doesn't exist, or when a call to [regex_search] or [regex_match] 
  results in an "everlasting" search, or when a call to `RegEx::GrepFiles` or 
  `RegEx::FindFiles` tries to open a file that cannot be opened, finally 
  `std::bad_alloc` can be thrown by just about any of the functions in this library.
  
  [*Q.] Why can't I use the "convenience" versions of regex_match / 
  regex_search / regex_grep / regex_format / regex_merge?
  
  [*A.] These versions may or may not be available depending upon the 
  capabilities of your compiler, the rules determining the format of 
  these functions are quite complex - and only the versions visible to 
  a standard compliant compiler are given in the help. To find out 
  what your compiler supports, run <boost/regex.hpp> through your 
  C++ pre-processor, and search the output file for the function 
  that you are interested in.  Note however, that very few current
  compilers still have problems with these overloaded functions.
  
  [endsect]