Windows和Linux下UTF-16转UTF-8,C语言
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2867123/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Convert UTF-16 to UTF-8 under Windows and Linux, in C
提问by DooriBar
I was wondering if there is a recommended 'cross' Windows and Linux method for the purpose of converting strings from UTF-16LE to UTF-8? or one should use different methods for each environment?
我想知道是否有推荐的“跨”Windows 和 Linux 方法来将字符串从 UTF-16LE 转换为 UTF-8?还是应该为每种环境使用不同的方法?
I've managed to google few references to 'iconv' , but for somreason I can't find samples of basic conversions, such as - converting a wchar_t UTF-16 to UTF-8.
我设法在谷歌上搜索了一些对 'iconv' 的引用,但由于某些原因,我找不到基本转换的示例,例如 - 将 wchar_t UTF-16 转换为 UTF-8。
Anybody can recommend a method that would be 'cross', and if you know of references or a guide with samples, would very appreciate it.
任何人都可以推荐一种“交叉”的方法,如果您知道参考资料或带有样本的指南,将非常感激。
Thanks, Doori Bar
谢谢,多里酒吧
采纳答案by DooriBar
Thanks guys, this is how I managed to solve the 'cross' windows and linux requirement:
谢谢大家,这就是我设法解决“跨”窗口和 linux 要求的方法:
- Downloaded and installed:
MinGW, andMSYS - Downloaded the
libiconvsource package - Compiled
libiconvviaMSYS.
- 下载并安装:
MinGW, 和MSYS - 下载了
libiconv源码包 libiconv通过编译MSYS。
That's about it.
就是这样。
回答by user4657497
Change encoding to UTF-8 with PowerShell:
使用 PowerShell 将编码更改为 UTF-8:
powershell -Command "Get-Content PATH\temp.txt -Encoding Unicode | Set-Content -Encoding UTF8 PATH2\temp.txt"
回答by Alex B
If you don't want to use ICU,
如果你不想使用ICU,
- Windows: WideCharToMultiByte
- Linux: iconv(Glibc)
- Windows:WideCharToMultiByte
- Linux: iconv(Glibc)
回答by Hans Passant
The open source ICU libraryis very commonly used.
开源ICU库非常常用。
回答by Daniel King
I have run into this problem too, I solve it by using boost locale library
我也遇到了这个问题,我用boost locale library解决了
try
{
std::string utf8 = boost::locale::conv::utf_to_utf<char, short>(
(short*)wcontent.c_str(),
(short*)(wcontent.c_str() + wcontent.length()));
content = boost::locale::conv::from_utf(utf8, "ISO-8859-1");
}
catch (boost::locale::conv::conversion_error e)
{
std::cout << "Fail to convert from UTF-8 to " << toEncoding << "!" << std::endl;
break;
}
The boost::locale::conv::utf_to_utffunction try to convert from a buffer that encoded by UTF-16LE to UTF-8, The boost::locale::conv::from_utffunction try to convert from a buffer that encoded by UTF-8 to ANSI, make sure the encoding is right(Here I use encoding for Latin-1, ISO-8859-1).
该升压::区域:: CONV :: utf_to_utf功能尝试从编码由UTF-16LE为UTF-8的缓冲转换,该升压::区域:: CONV :: from_utf功能尝试从被编码的缓冲区转换UTF-8 转 ANSI,确保编码正确(这里我使用 Latin-1、ISO-8859-1 的编码)。
Another reminder is, in Linux std::wstring is 4 bytes long, but in Windows std::wstring is 2 bytes long, so you would better not use std::wstring to contain UTF-16LE buffer.
另一个提醒是,在 Linux 中 std::wstring 是 4 个字节长,但在 Windows 中 std::wstring 是 2 个字节长,所以你最好不要使用 std::wstring 来包含 UTF-16LE 缓冲区。
回答by Remy Lebeau
wchar_t *src = ...;
int srclen = ...;
char *dst = ...;
int dstlen = ...;
iconv_t conv = iconv_open("UTF-8", "UTF-16");
iconv(conv, (char*)&src, &srclen, &dst, &dstlen);
iconv_close(conv);
回答by M.M
If you have MSYS2 installed then the iconvpackage (which is installed by default) lets you use:
如果您安装了 MSYS2,那么该iconv软件包(默认情况下已安装)允许您使用:
iconv -f utf-16le -t utf-8 <input.txt >output.txt

