// // Copyright (c) 2009-2011 Artyom Beilis (Tonkikh) // // Distributed under the Boost Software License, Version 1.0. // https://www.boost.org/LICENSE_1_0.txt /*! \page conversions Text Conversions Boost.Locale provides several functions for basic string manipulation: - \ref boost::locale::to_upper "to_upper": convert a string to upper case - \ref boost::locale::to_upper "to_lower": convert a string to lower case - \ref boost::locale::to_title "to_title": convert a string to title case - \ref boost::locale::fold_case "fold_case": makes a string case-agnostic (see \ref term_case_folding "case folding") - \ref boost::locale::normalize "normalize": convert equivalent code points to a consistent binary form (\ref term_normalization "normalization") All these functions receive an \c std::locale object as parameter or use a global locale by default. Global locale is used in all examples below. \section conversions_case Case Handing For example: \code std::string grussen = "grüßEN"; std::cout <<"Upper "<< boost::locale::to_upper(grussen) << std::endl <<"Lower "<< boost::locale::to_lower(grussen) << std::endl <<"Title "<< boost::locale::to_title(grussen) << std::endl <<"Fold "<< boost::locale::fold_case(grussen) << std::endl; \endcode Would print: \verbatim Upper GRÜSSEN Lower grüßen Title Grüßen Fold grüssen \endverbatim There are existing functions \c to_upper and \c to_lower in the Boost.StringAlgo library, however the Boost.Locale functions operate on an entire string instead of performing incorrect character-by-character conversions. For example: \code std::wstring grussen = L"grüßen"; std::wcout << boost::algorithm::to_upper_copy(grussen) << " " << boost::locale::to_upper(grussen) << std::endl; \endcode Would give in output: \verbatim GRÜßEN GRÜSSEN \endverbatim Where a letter "ß" was not converted correctly to double-S in first case because of a limitation of \c std::ctype facet. This is even more problematic in case of UTF-8 encodings where non US-ASCII are not converted at all. For example, this code: \code std::string grussen = "grüßen"; std::cout << boost::algorithm::to_upper_copy(grussen) << " " << boost::locale::to_upper(grussen) << std::endl; \endcode Would modify ASCII characters only \verbatim GRüßEN GRÜSSEN \endverbatim \section conversions_normalization Unicode Normalization Unicode normalization is the process of converting strings to a standard form, suitable for text processing and comparison. For example, character "ü" can be represented by a single code point or a combination of the character "u" and the diaeresis "¨". Normalization is an important part of Unicode text processing. Unicode defines four normalization forms. Each specific form is selected by a flag passed to \ref boost::locale::normalize() "normalize" function: - NFD - Canonical decomposition - boost::locale::norm_nfd - NFC - Canonical decomposition followed by canonical composition - boost::locale::norm_nfc or boost::locale::norm_default - NFKD - Compatibility decomposition - boost::locale::norm_nfkd - NFKC - Compatibility decomposition followed by canonical composition - boost::locale::norm_nfkc For more details on normalization forms, read this report on unicode.org. \section conversions_notes Notes - \ref boost::locale::normalize() "normalize" operates only on Unicode-encoded strings, i.e.: UTF-8, UTF-16 and UTF-32 depending on the character width. So be careful when using non-UTF encodings as they may be treated incorrectly. - \ref boost::locale::fold_case() "fold_case" is generally a locale-independent operation, but it receives a locale as a parameter to determine the 8-bit encoding. - All of these functions can work with an STL string, a NULL terminated string, or a range defined by two pointers. They always return a newly created STL string. - The length of the string may change; see the above example. */