c++ - Do C++11 regular expressions work with UTF-8 strings? -


if want use c++11's regular expressions unicode strings, work char* utf-8 or have convert them wchar_t* string?

you need test compiler , system using, in theory, supported if system has utf-8 locale. following test returned true me on clang/os x.

bool test_unicode() {     std::locale old;     std::locale::global(std::locale("en_us.utf-8"));      std::regex pattern("[[:alpha:]]+", std::regex_constants::extended);     bool result = std::regex_match(std::string("abcdéfg"), pattern);      std::locale::global(old);      return result; } 

note: compiled in file utf-8 encoded.


just safe used string explicit hex versions. worked also.

bool test_unicode2() {     std::locale old;     std::locale::global(std::locale("en_us.utf-8"));      std::regex pattern("[[:alpha:]]+", std::regex_constants::extended);     bool result = std::regex_match(std::string("abcd\xc3\xa9""fg"), pattern);      std::locale::global(old);      return result; } 

update test_unicode() still works me

$ file regex-test.cpp  regex-test.cpp: utf-8 unicode c program text  $ g++ --version configured with: --prefix=/applications/xcode-8.2.1.app/contents/developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1 apple llvm version 8.0.0 (clang-800.0.42.1) target: x86_64-apple-darwin15.6.0 thread model: posix installeddir: /applications/xcode-8.2.1.app/contents/developer/toolchains/xcodedefault.xctoolchain/usr/bin 

Comments

Popular posts from this blog

c# - SVN Error : "svnadmin: E205000: Too many arguments" -

c++ - Using OpenSSL in a multi-threaded application -

All overlapping substrings matching a java regex -