Blame view

3rdparty/boost_1_81_0/libs/spirit/doc/introduction.qbk 9.34 KB
73ef4ff3   Hu Chunming   提交三方库
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
  [/==============================================================================
      Copyright (C) 2001-2011 Joel de Guzman
      Copyright (C) 2001-2011 Hartmut Kaiser
  
      Distributed under the Boost Software License, Version 1.0. (See accompanying
      file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
  ===============================================================================/]
  
  [section Introduction]
  
  Boost Spirit is an object-oriented, recursive-descent parser and
  output generation library for C++. It allows you to write grammars and
  format descriptions using a format similar to Extended Backus Naur
  Form (EBNF)[footnote [@http://www.cl.cam.ac.uk/%7Emgk25/iso-14977.pdf
  ISO-EBNF]] directly in C++. These inline grammar
  specifications can mix freely with other C++ code and, thanks to the
  generative power of C++ templates, are immediately executable.  In
  retrospect, conventional compiler-compilers or parser-generators have
  to perform an additional translation step from the source EBNF code to
  C or C++ code.
  
  The syntax and semantics of the libraries' API directly form domain-specific 
  embedded languages (DSEL). In fact, Spirit exposes 3 different DSELs to the 
  user: 
  
  * one for creating parser grammars, 
  * one for the specification of the required tokens to be used for parsing, 
  * and one for the description of the required output formats. 
  
  Since the target input grammars and output formats are written entirely in C++ 
  we do not need any separate tools to compile, preprocess or integrate those 
  into the build process. __spirit__ allows seamless integration of the parsing 
  and output generation process with other C++ code. This often allows for 
  simpler and more efficient code.
  
  Both the created parsers and generators are fully attributed, which allows you 
  to easily build and handle hierarchical data structures in memory. These data 
  structures resemble the structure of the input data and can directly be used 
  to generate arbitrarily-formatted output.
  
  The [link spirit.spiritstructure figure] below depicts the overall structure 
  of the Boost Spirit library. The library consists of 4 major parts:
  
  * __classic__: This is the almost-unchanged code base taken from the 
    former Boost Spirit V1.8 distribution. It has been moved into the namespace 
    boost::spirit::classic. A special compatibility layer has been added to 
    ensure complete compatibility with existing code using Spirit V1.8.
  * __qi__: This is the parser library allowing you to build recursive 
    descent parsers. The exposed domain-specific language can be used to describe
    the grammars to implement, and the rules for storing the parsed information.
  * __lex__: This is the library usable to create tokenizers (lexers). The
    domain-specific language exposed by __lex__ allows you to define regular 
    expressions used to match tokens (create token definitions), associate these 
    regular expressions with code to be executed whenever they are matched, and 
    to add the token definitions to the lexical analyzer.
  * __karma__: This is the generator library allowing you to create code for 
    recursive descent, data type-driven output formatting. The exposed
    domain-specific language is almost equivalent to the parser description language 
    used in __qi__, except that it is used to describe the required output 
    format to generate from a given data structure.
  
  [fig spiritstructure.png..The overall structure of the Boost Spirit library..spirit.spiritstructure]
  
  
  The three components, __qi__, __karma__ and __lex__, are designed to be used
  either stand alone, or together. The general methodology is to use the token 
  sequence generated by __lex__ as the input for a parser generated by __qi__.
  On the opposite side of the equation, the hierarchical data structures generated 
  by __qi__ are used for the output generators created using __karma__. 
  However, there is nothing to stop you from using any of these components all 
  by themselves.
  
  The [link spirit.spiritkarmaflow figure] below shows the typical data flow of 
  some input being converted to some internal representation. After some 
  (optional) transformation these data are converted back into some different, 
  external representation. The picture highlights Spirit's place in this data 
  transformation flow.
  
  [fig spiritkarmaflow.png..The place of __qi__ and __karma__ in a data transformation flow of a typical application..spirit.spiritkarmaflow]
  
  [heading A Quick Overview of Parsing with __qi__]
  
  __qi__ is Spirit's sublibrary dealing with generating parsers based on a given
  target grammar (essentially a format description of the input data to read).
  
  A simple EBNF grammar snippet: 
  
      group       ::= '(' expression ')'
      factor      ::= integer | group
      term        ::= factor (('*' factor) | ('/' factor))*
      expression  ::= term (('+' term) | ('-' term))*
  
  is approximated using facilities of Spirit's /Qi/ sublibrary as seen in this 
  code snippet: 
  
      group       = '(' >> expression >> ')';
      factor      = integer | group;
      term        = factor >> *(('*' >> factor) | ('/' >> factor));
      expression  = term >> *(('+' >> term) | ('-' >> term));
  
  Through the magic of expression templates, this is perfectly valid and 
  executable C++ code. The production rule `expression` is, in fact, an object that 
  has a member function `parse` that does the work given a source code written in 
  the grammar that we have just declared. Yes, it's a calculator. We shall 
  simplify for now by skipping the type declarations and the definition of the 
  rule `integer` invoked by `factor`. Now, the production rule `expression` in our 
  grammar specification, traditionally called the `start` symbol, can recognize 
  inputs such as: 
  
      12345
      -12345
      +12345
      1 + 2
      1 * 2
      1/2 + 3/4
      1 + 2 + 3 + 4
      1 * 2 * 3 * 4
      (1 + 2) * (3 + 4)
      (-1 + 2) * (3 + -4)
      1 + ((6 * 200) - 20) / 6
      (1 + (2 + (3 + (4 + 5))))
  
  Certainly we have modified the original EBNF syntax. This is done to
  conform to C++ syntax rules. Most notably we see the abundance of
  shift >> operators. Since there are no 'empty' operators in C++, it is
  simply not possible to write something like:
  
      a b
  
  as seen in math syntax, for example, to mean multiplication or, in our case, 
  as seen in EBNF syntax to mean sequencing (b should follow a). __qi__ 
  uses the shift `>>` operator instead for this purpose. We take the `>>` operator, 
  with arrows pointing to the right, to mean "is followed by". Thus we write: 
  
      a >> b
  
  The alternative operator `|` and the parentheses `()` remain as is. The 
  assignment operator `=` is used in place of EBNF's `::=`. Last but not least, 
  the Kleene star `*`, which in this case is a postfix operator in EBNF becomes a 
  prefix. Instead of: 
  
      a* //... in EBNF syntax,
  
  we write: 
  
      *a //... in Spirit.
  
  since there are no postfix stars, `*`, in C/C++. Finally, we terminate each 
  rule with the ubiquitous semi-colon, `;`. 
  
  
  [heading A Quick Overview of Output Generation with __karma__]
  
  Spirit not only allows you to describe the structure of the input, it also enables 
  the specification of the output format for your data in a similar way, and based 
  on a single syntax and compatible semantics.
  
  Let's assume we need to generate a textual representation from a simple data 
  structure such as a `std::vector<int>`. Conventional code probably would look like:
  
      std::vector<int> v (initialize_and_fill());
      std::vector<int>::iterator end = v.end();
      for (std::vector<int>::iterator it = v.begin(); it != end; ++it)
          std::cout << *it << std::endl;
  
  which is not very flexible and quite difficult to maintain when it comes to 
  changing the required output format. Spirit's sublibrary /Karma/ allows you to 
  specify output formats for arbitrary data structures in a very flexible way. 
  The following snippet is the /Karma/ format description used to create the 
  same output as the traditional code above:
  
      *(int_ << eol)
  
  Here are some more examples of format descriptions for different output 
  representations of the same `std::vector<int>`:
  
  [table Different output formats for `std::vector<int>` 
      [ [Format]                         [Example]          [Description] ]
      [ [`'[' << *(int_ << ',') << ']'`] [`[1,8,10,]`]      [Comma separated list of integers] ]
      [ [`*('(' << int_ << ')' << ',')`] [`(1),(8),(10),`]  [Comma separated list of integers in parenthesis] ]
      [ [`*hex`]                         [`18a`]            [A list of hexadecimal numbers] ]
      [ [`*(double_ << ',')`]            [`1.0,8.0,10.0,`]  [A list of floating point numbers] ]
  ]
  
  We will see later in this documentation how it is possible to avoid printing 
  the trailing `','`.
  
  Overall, the syntax is similar to __qi__ with the exception that we use the `<<` 
  operator for output concatenation. This should be easy to understand as it 
  follows the conventions used in the Standard's I/O streams. 
  
  Another important feature of __karma__ allows you to fully decouple the data 
  type from the output format. You can use the same output format with different 
  data types as long as these conform conceptually. The next table gives some 
  related examples.
  
  [table Different data types usable with the output format `*(int_ << eol)`
      [ [Data type]                 [Description] ]
      [ [`int i[4]`]                [C style arrays] ]
      [ [`std::vector<int>`]        [Standard vector] ]
      [ [`std::list<int>`]          [Standard list] ]
      [ [`boost::array<long, 20>`]  [Boost array] ]
  ]
  
  [endsect]