Blame view

3rdparty/boost_1_81_0/libs/spirit/doc/rationale.qbk 5.84 KB
73ef4ff3   Hu Chunming   提交三方库
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
  [/==============================================================================
      Copyright (C) 2001-2011 Joel de Guzman
      Copyright (C) 2001-2011 Hartmut Kaiser
  
      Distributed under the Boost Software License, Version 1.0. (See accompanying
      file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
  ===============================================================================/]
  
  [section Rationale]
  
  [heading Naming]
  
  Why use the name "Spirit", "Qi" and "Karma"? Because xpressive names
  have a better spirit, brings qi to your software and will enhance your
  karma so they can heal your (con)fusion and make you wave like a phoenix
  from the ashes. (Joachim Faulhaber)
  
  [heading Type Erasure: From static to dynamic C++]
  
  Rules straddle the border between static and dynamic C++. In effect, a
  rule transforms compile-time polymorphism (using templates) into
  run-time polymorphism (using virtual functions). This is necessary due
  to C++'s inability to automatically declare a variable of a type deduced
  from an arbitrarily complex expression in the right-hand side (rhs) of
  an assignment. Basically, we want to do something like:
  
      T rule = an_arbitrarily_complex_expression;
  
  without having to know or care about the resulting type of the
  right-hand side (rhs) of the assignment expression. This can be done
  with modern C++ 0x compilers using `auto`:
  
      auto rule = an_arbitrarily_complex_expression;
  
  Apart from this, we also need a facility to forward declare an unknown
  type:
  
      T rule;
      ...
      rule = a | b;
  
  These limitations lead us to this implementation of rules. This comes at
  the expense of the overhead of a type-erased call which is an indirect
  function call that connot be inlined, once through each invocation of a
  rule.
  
  [heading Multiple declaration]
  
  Some BNF variants allow multiple declarations of a rule. The
  declarations are taken as alternatives. Example:
  
     r = a;
     r = b;
  
  is equivalent to:
  
     r = a | b;
  
  Spirit v1.3 allowed this behavior. However, the current version of
  Spirit no longer allows this because experience shows that this behavior
  leads to unwanted gotchas (for instance, it does not allow rules to be
  held in containers). In the current release of Spirit, a second
  assignment to a rule will simply redefine it. The old definition is
  destructed. This follows more closely C++ semantics and is more in line
  with what the user expects the rule to behave.
  
  [heading Sequencing Syntax]
  
  The comma operator as in `a, b` seems to be a better candidate,
  syntax-wise. But then the problem is with its precedence. It has the
  lowest precedence in C/C++, which makes it virtually useless.
  
  Bjarne Stroustrup, in his article "Generalizing Overloading for C++2000"
  talks about overloading whitespace. Such a feature would allow
  juxtapositioning of parser objects exactly as we do in (E)BNF (e.g. a b
  | c instead of a >> b | c). Unfortunately, the article was dated April
  1, 1998. Oh well.
  
  [heading Forward iterators]
  
  In general, the expected iterator is at least a standard conforming
  forward iterator. Forward iterators are needed for backtracking where
  the iterator needs to be saved and restored later. Generally speaking,
  Spirit is a backtracking parser. The implication of this is that at some
  point, the iterator position will have to be saved to allow the parser
  to backtrack to a previous point. Thus, for backtracking to work, the
  framework requires at least a forward iterator.
  
  There are remedies of course. In cases where we need to use input
  iterators, you can use the __multi_pass__ iterator to make the forward
  iterators.
  
  Some parsers might require more specialized iterators (bi-directional or
  even random access). Perhaps in the future, deterministic parsers when
  added to the framework, will perform no backtracking and will need just
  a single token lookahead, hence will require input iterators only.
  
  [heading Exhaustive backtracking and greedy RD]
  
  Spirit doesn't do exhaustive backtracking like regular expressions are
  expected to. For example:
  
      *char_('a') >> char_('a');
  
  will always fail to match because Spirit's Kleene star does not back off
  when the rest of the rule fails to match.
  
  Actually, there's a solution to this greedy RD problem. Such a scheme is
  discussed in section 6.6.2 of Parsing Techniques: A Practical Guide. The
  trick involves passing a tail parser (in addition to the scanner) to
  each parser. The start parser will then simply be:
  
      start >> end;
  
  (end is the start's tail).
  
  Spirit is greedy --using straight forward, naive RD. It is certainly
  possible to implement the fully backtracking scheme presented above, but
  there will be also certainly be a performance hit. The scheme will
  always try to match all possible parser paths (full parser hierarchy
  traversal) until it reaches a point of certainty, that the whole thing
  matches or fails to match.
  
  [:Backtracking and Greedy RD
  
  Spirit is quite consistent and intuitive about when it backtracks and to
  where, although it may not be obvious to those coming from different
  backgrounds. In general, any (sub)parser will, given the same input,
  always match the same portion of the input (or fail to match the input
  at all). This means that Spirit is inherently greedy. Spirit will only
  backtrack when a (sub)parser fails to match the input, and it will
  always backtrack to the next choice point upward (not backward) in the
  parser structure. In other words abb|ab will match `"ab"`, as will
  `a(bb|b)`, but `(ab|a)b` won't because the `(ab|a)` subparser will
  always match the `'b'` after the `'a'` if it is available.
  
  --Rainer Deyke]
  
  This is the very nature of __peg__.
  
  There's a strong preference on "simplicity with all the knobs when you
  need them" approach, right now. On the other hand, the flexibility of
  Spirit makes it possible to have different optional schemes available.
  It might be possible to implement an exhaustive backtracking RD scheme
  as an optional feature in the future.
  
  [endsect]