c++ - Do C++11 regular expressions work with UTF-8 strings? -
if want use c++11's regular expressions unicode strings, work char* utf-8 or have convert them wchar_t* string?
you need test compiler , system using, in theory, supported if system has utf-8 locale. following test returned true me on clang/os x.
bool test_unicode() { std::locale old; std::locale::global(std::locale("en_us.utf-8")); std::regex pattern("[[:alpha:]]+", std::regex_constants::extended); bool result = std::regex_match(std::string("abcdéfg"), pattern); std::locale::global(old); return result; } note: compiled in file utf-8 encoded.
just safe used string explicit hex versions. worked also.
bool test_unicode2() { std::locale old; std::locale::global(std::locale("en_us.utf-8")); std::regex pattern("[[:alpha:]]+", std::regex_constants::extended); bool result = std::regex_match(std::string("abcd\xc3\xa9""fg"), pattern); std::locale::global(old); return result; } update test_unicode() still works me
$ file regex-test.cpp regex-test.cpp: utf-8 unicode c program text $ g++ --version configured with: --prefix=/applications/xcode-8.2.1.app/contents/developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1 apple llvm version 8.0.0 (clang-800.0.42.1) target: x86_64-apple-darwin15.6.0 thread model: posix installeddir: /applications/xcode-8.2.1.app/contents/developer/toolchains/xcodedefault.xctoolchain/usr/bin
Comments
Post a Comment