Windows和Linux下UTF-16转UTF-8,C语言

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2867123/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-02 05:28:51  来源:igfitidea点击:

Convert UTF-16 to UTF-8 under Windows and Linux, in C

cunicodeutf-8utf-16

提问by DooriBar

I was wondering if there is a recommended 'cross' Windows and Linux method for the purpose of converting strings from UTF-16LE to UTF-8? or one should use different methods for each environment?

我想知道是否有推荐的“跨”Windows 和 Linux 方法来将字符串从 UTF-16LE 转换为 UTF-8?还是应该为每种环境使用不同的方法?

I've managed to google few references to 'iconv' , but for somreason I can't find samples of basic conversions, such as - converting a wchar_t UTF-16 to UTF-8.

我设法在谷歌上搜索了一些对 'iconv' 的引用,但由于某些原因,我找不到基本转换的示例,例如 - 将 wchar_t UTF-16 转换为 UTF-8。

Anybody can recommend a method that would be 'cross', and if you know of references or a guide with samples, would very appreciate it.

任何人都可以推荐一种“交叉”的方法,如果您知道参考资料或带有样本的指南,将非常感激。

Thanks, Doori Bar

谢谢,多里酒吧

采纳答案by DooriBar

Thanks guys, this is how I managed to solve the 'cross' windows and linux requirement:

谢谢大家,这就是我设法解决“跨”窗口和 linux 要求的方法:

  1. Downloaded and installed: MinGW, and MSYS
  2. Downloaded the libiconvsource package
  3. Compiled libiconvvia MSYS.
  1. 下载并安装:MinGW, 和MSYS
  2. 下载了libiconv源码包
  3. libiconv通过编译MSYS

That's about it.

就是这样。

回答by user4657497

Change encoding to UTF-8 with PowerShell:

使用 PowerShell 将编码更改为 UTF-8:

powershell -Command "Get-Content PATH\temp.txt -Encoding Unicode | Set-Content -Encoding UTF8 PATH2\temp.txt"

回答by Alex B

If you don't want to use ICU,

如果你不想使用ICU,

  1. Windows: WideCharToMultiByte
  2. Linux: iconv(Glibc)
  1. Windows:WideCharToMultiByte
  2. Linux: iconv(Glibc)

回答by Hans Passant

The open source ICU libraryis very commonly used.

开源ICU库非常常用。

回答by Daniel King

I have run into this problem too, I solve it by using boost locale library

我也遇到了这个问题,我用boost locale library解决了

try
{           
    std::string utf8 = boost::locale::conv::utf_to_utf<char, short>(
                        (short*)wcontent.c_str(), 
                        (short*)(wcontent.c_str() + wcontent.length()));
    content = boost::locale::conv::from_utf(utf8, "ISO-8859-1");
}
catch (boost::locale::conv::conversion_error e)
{
    std::cout << "Fail to convert from UTF-8 to " << toEncoding << "!" << std::endl;
    break;
}

The boost::locale::conv::utf_to_utffunction try to convert from a buffer that encoded by UTF-16LE to UTF-8, The boost::locale::conv::from_utffunction try to convert from a buffer that encoded by UTF-8 to ANSI, make sure the encoding is right(Here I use encoding for Latin-1, ISO-8859-1).

升压::区域:: CONV :: utf_to_utf功能尝试从编码由UTF-16LE为UTF-8的缓冲转换,该升压::区域:: CONV :: from_utf功能尝试从被编码的缓冲区转换UTF-8 转 ANSI,确保编码正确(这里我使用 Latin-1、ISO-8859-1 的编码)。

Another reminder is, in Linux std::wstring is 4 bytes long, but in Windows std::wstring is 2 bytes long, so you would better not use std::wstring to contain UTF-16LE buffer.

另一个提醒是,在 Linux 中 std::wstring 是 4 个字节长,但在 Windows 中 std::wstring 是 2 个字节长,所以你最好不要使用 std::wstring 来包含 UTF-16LE 缓冲区。

回答by Remy Lebeau

wchar_t *src = ...;
int srclen = ...;
char *dst = ...;
int dstlen = ...;
iconv_t conv = iconv_open("UTF-8", "UTF-16");
iconv(conv, (char*)&src, &srclen, &dst, &dstlen);
iconv_close(conv);

回答by Kevin Smyth

There's also utfcpp, which is a header-only library.

还有utfcpp,它是一个只有头文件的库。

回答by M.M

If you have MSYS2 installed then the iconvpackage (which is installed by default) lets you use:

如果您安装了 MSYS2,那么该iconv软件包(默认情况下已安装)允许您使用:

 iconv -f utf-16le -t utf-8 <input.txt >output.txt