Blame view

3rdparty/boost_1_81_0/libs/locale/doc/conversions.txt 3.97 KB
73ef4ff3   Hu Chunming   提交三方库
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
  //
  // Copyright (c) 2009-2011 Artyom Beilis (Tonkikh)
  //
  // Distributed under the Boost Software License, Version 1.0.
  // https://www.boost.org/LICENSE_1_0.txt
  
  /*!
  \page conversions Text Conversions
  
  Boost.Locale provides several functions for basic string manipulation:
  
  - \ref boost::locale::to_upper "to_upper": convert a string to upper case
  - \ref boost::locale::to_upper "to_lower": convert a string to lower case
  - \ref boost::locale::to_title "to_title": convert a string to title case
  - \ref boost::locale::fold_case "fold_case": makes a string case-agnostic (see \ref term_case_folding "case folding")
  - \ref boost::locale::normalize "normalize": convert equivalent code points to a consistent binary form (\ref term_normalization "normalization")
  
  All these functions receive an \c std::locale object as parameter or use a global locale by default.
  
  Global locale is used in all examples below.
  
  \section conversions_case Case Handing
  
  For example:
  \code
      std::string grussen = "grüßEN";
      std::cout   <<"Upper "<< boost::locale::to_upper(grussen) << std::endl
                  <<"Lower "<< boost::locale::to_lower(grussen) << std::endl
                  <<"Title "<< boost::locale::to_title(grussen) << std::endl
                  <<"Fold  "<< boost::locale::fold_case(grussen) << std::endl;
  \endcode
  
  Would print:
  
  \verbatim
  Upper GRÜSSEN
  Lower grüßen
  Title Grüßen
  Fold  grüssen
  \endverbatim
  
  There are existing functions \c to_upper and \c to_lower in the Boost.StringAlgo library, however the
  Boost.Locale functions operate on an entire string instead of performing incorrect character-by-character conversions.
  
  For example:
  
  \code
      std::wstring grussen = L"grüßen";
      std::wcout << boost::algorithm::to_upper_copy(grussen) << " " << boost::locale::to_upper(grussen) << std::endl;
  \endcode
  
  Would give in output:
  
  \verbatim
  GRÜßEN GRÜSSEN
  \endverbatim
  
  Where a letter "ß" was not converted correctly to double-S in first case because of a limitation of \c std::ctype facet.
  
  This is even more problematic in case of UTF-8 encodings where non US-ASCII are not converted at all.
  For example, this code:
  
  \code
      std::string grussen = "grüßen";
      std::cout << boost::algorithm::to_upper_copy(grussen) << " " << boost::locale::to_upper(grussen) << std::endl;
  \endcode
  
  Would modify ASCII characters only
  
  \verbatim
  GRüßEN GRÜSSEN
  \endverbatim
  
  \section conversions_normalization Unicode Normalization
  
  Unicode normalization is the process of converting strings to a standard form, suitable for text processing and
  comparison. For example, character "ü" can be represented by a single code point or a combination of the character "u" and the
  diaeresis "¨". Normalization is an important part of Unicode text processing.
  
  Unicode defines four normalization forms. Each specific form is selected by a flag passed
  to \ref boost::locale::normalize() "normalize" function:
  
  - NFD - Canonical decomposition - boost::locale::norm_nfd
  - NFC - Canonical decomposition followed by canonical composition - boost::locale::norm_nfc or boost::locale::norm_default
  - NFKD - Compatibility decomposition - boost::locale::norm_nfkd
  - NFKC - Compatibility decomposition followed by canonical composition - boost::locale::norm_nfkc
  
  For more details on normalization forms, read <a href="http://unicode.org/reports/tr15/#Norm_Forms">this report on unicode.org</a>.
  
  \section conversions_notes Notes
  
  -   \ref boost::locale::normalize() "normalize" operates only on Unicode-encoded strings, i.e.: UTF-8, UTF-16 and UTF-32 depending on the
      character width. So be careful when using non-UTF encodings as they may be treated incorrectly.
  -   \ref boost::locale::fold_case() "fold_case" is generally a locale-independent operation, but it receives a locale as a parameter to
      determine the 8-bit encoding.
  -   All of these functions can work with an STL string, a NULL terminated string, or a range defined by two pointers. They always
      return a newly created STL string.
  -   The length of the string may change; see the above example.
  */