哪种编码可以在 Mac 和 Windows 上使用 Excel 正确打开 CSV 文件?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/6588068/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Which encoding opens CSV files correctly with Excel on both Mac and Windows?
提问by Timm
We have a web app that exports CSV files containing foreign characters with UTF-8, no BOM. Both Windows and Mac users get garbage characters in Excel. I tried converting to UTF-8 with BOM; Excel/Win is fine with it, Excel/Mac shows gibberish. I'm using Excel 2003/Win, Excel 2011/Mac. Here's all the encodings I tried:
我们有一个 Web 应用程序,可以导出包含 UTF-8 外来字符的 CSV 文件,没有 BOM。Windows 和 Mac 用户都会在 Excel 中获得垃圾字符。我尝试使用 BOM 转换为 UTF-8;Excel/Win 很好用,Excel/Mac 显示乱码。我使用的是 Excel 2003/Win、Excel 2011/Mac。这是我尝试过的所有编码:
Encoding BOM Win Mac
-------- --- ---------------------------- ------------
utf-8 -- scrambled scrambled
utf-8 BOM WORKS scrambled
utf-16 -- file not recognized file not recognized
utf-16 BOM file not recognized Chinese gibberish
utf-16LE -- file not recognized file not recognized
utf-16LE BOM characters OK, same as Win
row data all in first field
The best one is UTF-16LE with BOM, but the CSV is not recognized as such. The field separator is comma, but semicolon doesn't change things.
最好的一种是带有 BOM 的 UTF-16LE,但 CSV 不被识别。字段分隔符是逗号,但分号不会改变。
Is there any encoding that works in both worlds?
是否有任何编码适用于两个世界?
采纳答案by Timm
The lowdown is: There is no solution. Excel 2011/Mac cannot correctly interpret a CSV file containing umlauts and diacritical marks no matter what encoding or hoop jumping you do. I'd be glad to hear someone tell me different!
底线是:没有解决方案。无论您使用何种编码或跳圈,Excel 2011/Mac 都无法正确解释包含变音符号和变音符号的 CSV 文件。我很高兴听到有人告诉我不同的!
回答by mikezter
Excel Encodings
Excel 编码
I found the WINDOWS-1252
encoding to be the least frustrating when dealing with Excel. Since its basically Microsofts own proprietary character set, one can assume it will work on both the Mac and the Windows version of MS-Excel. Both versions at least include a corresponding "File origin" or "File encoding" selector which correctly reads the data.
我发现WINDOWS-1252
在处理 Excel 时,编码是最不令人沮丧的。因为它基本上是微软自己的专有字符集,所以可以假设它可以在 Mac 和 Windows 版本的 MS-Excel 上运行。两个版本都至少包含相应的“文件来源”或“文件编码”选择器,可以正确读取数据。
Depending on your system and the tools you use, this encoding could also be named CP1252
, ANSI
, Windows (ANSI)
, MS-ANSI
or just Windows
, among other variations.
根据您的系统和使用的工具上,该编码也可以被命名为CP1252
,ANSI
,Windows (ANSI)
,MS-ANSI
或只Windows
,其他变化之中。
This encoding is a superset of ISO-8859-1
(aka LATIN1
and others), so you can fallback to ISO-8859-1
if you cannot use WINDOWS-1252
for some reason. Be advised that ISO-8859-1
is missing some characters from WINDOWS-1252
as shown here:
此编码是ISO-8859-1
(又名LATIN1
和其他)的超集,因此ISO-8859-1
如果WINDOWS-1252
由于某种原因无法使用,您可以回退到。请注意,ISO-8859-1
这里缺少一些字符,WINDOWS-1252
如下所示:
| Char | ANSI | Unicode | ANSI Hex | Unicode Hex | HTML entity | Unicode Name | Unicode Range |
| | 128 | 8364 | 0x80 | U+20AC | € | euro sign | Currency Symbols |
| ? | 130 | 8218 | 0x82 | U+201A | ‚ | single low-9 quotation mark | General Punctuation |
| ? | 131 | 402 | 0x83 | U+0192 | ƒ | Latin small letter f with hook | Latin Extended-B |
| ? | 132 | 8222 | 0x84 | U+201E | „ | double low-9 quotation mark | General Punctuation |
| … | 133 | 8230 | 0x85 | U+2026 | … | horizontal ellipsis | General Punctuation |
| ? | 134 | 8224 | 0x86 | U+2020 | † | dagger | General Punctuation |
| ? | 135 | 8225 | 0x87 | U+2021 | ‡ | double dagger | General Punctuation |
| ? | 136 | 710 | 0x88 | U+02C6 | ˆ | modifier letter circumflex accent | Spacing Modifier Letters |
| ‰ | 137 | 8240 | 0x89 | U+2030 | ‰ | per mille sign | General Punctuation |
| ? | 138 | 352 | 0x8A | U+0160 | Š | Latin capital letter S with caron | Latin Extended-A |
| ? | 139 | 8249 | 0x8B | U+2039 | ‹ | single left-pointing angle quotation mark | General Punctuation |
| ? | 140 | 338 | 0x8C | U+0152 | Œ | Latin capital ligature OE | Latin Extended-A |
| ? | 142 | 381 | 0x8E | U+017D | | Latin capital letter Z with caron | Latin Extended-A |
| ‘ | 145 | 8216 | 0x91 | U+2018 | ‘ | left single quotation mark | General Punctuation |
| ' | 146 | 8217 | 0x92 | U+2019 | ’ | right single quotation mark | General Punctuation |
| “ | 147 | 8220 | 0x93 | U+201C | “ | left double quotation mark | General Punctuation |
| ” | 148 | 8221 | 0x94 | U+201D | ” | right double quotation mark | General Punctuation |
| ? | 149 | 8226 | 0x95 | U+2022 | • | bullet | General Punctuation |
| – | 150 | 8211 | 0x96 | U+2013 | – | en dash | General Punctuation |
| — | 151 | 8212 | 0x97 | U+2014 | — | em dash | General Punctuation |
| ? | 152 | 732 | 0x98 | U+02DC | ˜ | small tilde | Spacing Modifier Letters |
| ? | 153 | 8482 | 0x99 | U+2122 | ™ | trade mark sign | Letterlike Symbols |
| ? | 154 | 353 | 0x9A | U+0161 | š | Latin small letter s with caron | Latin Extended-A |
| ? | 155 | 8250 | 0x9B | U+203A | › | single right-pointing angle quotation mark | General Punctuation |
| ? | 156 | 339 | 0x9C | U+0153 | œ | Latin small ligature oe | Latin Extended-A |
| ? | 158 | 382 | 0x9E | U+017E | | Latin small letter z with caron | Latin Extended-A |
| ? | 159 | 376 | 0x9F | U+0178 | Ÿ | Latin capital letter Y with diaeresis | Latin Extended-A |
Note that the euro sign is missing. This table can be found at Alan Wood.
请注意,缺少欧元符号。这张表可以在Alan Wood找到。
Conversion
转换
Conversion is done differently in every tool and language. However, suppose you have a file query_result.csv
which you know is UTF-8
encoded. Convert it to WINDOWS-1252
using iconv
:
每种工具和语言的转换方式都不同。但是,假设您有一个query_result.csv
已知已UTF-8
编码的文件。将其转换为WINDOWS-1252
使用iconv
:
iconv -f UTF-8 -t WINDOWS-1252 query_result.csv > query_result-win.csv
回答by Duncan Smart
For UTF-16LE with BOM if you use tab characters as your delimiters instead of commas Excel will recognise the fields. The reason it works is that Excel actually ends up using its Unicode *.txt parser.
对于带有 BOM 的 UTF-16LE,如果您使用制表符作为分隔符而不是逗号,Excel 将识别这些字段。它工作的原因是 Excel 实际上最终使用了它的 Unicode *.txt 解析器。
Caveat: If the file is edited in Excel and saved, it will be saved as tab-delimited ASCII. The problem now is that when you re-open the file Excel assumes it's real CSV (with commas), sees that it's not Unicode, so parses it as comma-delimited - and hence will make a hash of it!
警告:如果文件在 Excel 中编辑并保存,它将保存为制表符分隔的 ASCII。现在的问题是,当您重新打开文件时,Excel 假定它是真正的 CSV(带逗号),发现它不是 Unicode,因此将其解析为逗号分隔 - 因此将对它进行哈希处理!
Update: The above caveat doesn't appear to be happening for me today in Excel 2010 (Windows) at least, although there does appear to be a difference in saving behaviour if:
更新:至少在今天的 Excel 2010 (Windows) 中,上述警告似乎不会发生在我身上,尽管在以下情况下保存行为似乎有所不同:
- you edit and quit Excel (tries to save as 'Unicode *.txt')
- 您编辑并退出 Excel(尝试另存为“Unicode *.txt”)
compared to:
相比:
- editing and closing just the file(works as expected).
- 仅编辑和关闭文件(按预期工作)。
回答by Walter Tross
You only have tried comma-separated and semicolon-separated CSV. If you had tried tab-separated CSV (also called TSV) you would have found the answer:
您只尝试过逗号分隔和分号分隔的 CSV。如果您尝试过制表符分隔的 CSV(也称为 TSV),您会找到答案:
UTF-16LEwith BOM(byte order mark), tab-separated
带有BOM(字节顺序标记)的UTF-16LE,制表符分隔
But: In a comment you mention that TSV is not an option for you (I haven't been able to find this requirement in your question though). That's a pity. It often means that you allow manual editing of TSV files, which probably is not a good idea. Visual checking of TSV files is not a problem. Furthermore editors can be set to display a special character to mark tabs.
但是:在评论中,您提到 TSV 不适合您(尽管我在您的问题中找不到此要求)。太可惜了。这通常意味着您允许手动编辑 TSV 文件,这可能不是一个好主意。目视检查 TSV 文件不是问题。此外,可以将编辑器设置为显示特殊字符以标记选项卡。
And yes, I tried this out on Windows and Mac.
是的,我在 Windows 和 Mac 上试过了。
回答by Timm
Here's the clincher on importing utf8-encoded CSV into Excel 2011 for Mac: Microsoft says: "Excel for Mac does not currently support UTF-8." Excel for Mac 2011 and UTF-8
这是将 utf8 编码的 CSV 导入 Excel 2011 for Mac 的关键:微软表示:“Excel for Mac 目前不支持 UTF-8。” Excel for Mac 2011 和 UTF-8
Yay, way to go MS!
是的,MS 加油!
回答by brablc
The best workaround for reading CSV files with UTF-8 on Mac is to convert them into XLSX format. I have found a script made by Konrad Foerstner, which I have improved little bit by adding support for different delimiter characters.
在 Mac 上使用 UTF-8 读取 CSV 文件的最佳解决方法是将它们转换为 XLSX 格式。我找到了 Konrad Foerstner 制作的脚本,通过添加对不同分隔符的支持,我对其进行了一些改进。
Download the script from Github https://github.com/brablc/clit/blob/master/csv2xlsx.py. In order to run it you will need to install a python module openpyxlfor Excel file manipulation: sudo easy_install openpyxl
.
从 Github https://github.com/brablc/clit/blob/master/csv2xlsx.py下载脚本。为了运行它,您需要安装一个用于 Excel 文件操作的 python 模块openpyxl:sudo easy_install openpyxl
.
回答by user525081
It seems to my case that Excel 2011 for Mac OS is not using Encoding.GetEncoding("10000") as i thought and wasted 2 days with but the same iso as on Microsoft OS. The best proof for this is to make a file in Excel 2011 for MAC with special chars, save it as CSV and then open it in MAC text editor and the chars are scrambled.
在我看来,Mac OS 的 Excel 2011 没有像我想的那样使用 Encoding.GetEncoding("10000") 并且浪费了 2 天,但与 Microsoft OS 上的 iso 相同。最好的证明是在 Excel 2011 for MAC 中制作一个带有特殊字符的文件,将其另存为 CSV,然后在 MAC 文本编辑器中打开它,然后对字符进行加扰。
For me this approach worked - meaning that csv export on Excel 2011 on MAC OS has special western europeean chars inside:
对我来说,这种方法有效 - 这意味着 MAC OS 上 Excel 2011 上的 csv 导出里面有特殊的西欧字符:
Encoding isoMacOS = Encoding.GetEncoding("iso-8859-1");
Encoding defaultEncoding = Encoding.Default;
// Convert the string into a byte array.
byte[] defaultEncodingBytes = defaultEncoding.GetBytes(exportText);
// Perform the conversion from one encoding to the other.
byte[] ansiBytes = Encoding.Convert(defaultEncoding, isoMacOS, defaultEncodingBytes);
decodedString = isoMacOS.GetString(ansiBytes);
回答by Craig Stuntz
UTF-8 with no BOM currently works for me in Excel Mac 2011 14.3.2.
没有 BOM 的 UTF-8 目前适用于 Excel Mac 2011 14.3.2。
UTF-8 + BOM kind of works, but BOM rendered as gibberish.
UTF-8 + BOM 有点工作,但 BOM 呈现为胡言乱语。
UTF-16 works if you Import the file and complete the wizard, but not if you just double-click it.
如果您导入文件并完成向导,则 UTF-16 有效,但如果您只是双击它,则无效。
回答by Nuno
On my Mac OS, Text Wrangler identified a CSV file created with Excel as having "Western" encoding.
在我的 Mac OS 上,Text Wrangler 将使用 Excel 创建的 CSV 文件识别为具有“西方”编码。
After some googling I have made this small script (I am not sure about Windows availability, maybe with Cygwin?):
经过一番谷歌搜索后,我制作了这个小脚本(我不确定 Windows 的可用性,也许是Cygwin?):
$ cat /usr/local/bin/utf8.sh
$ cat /usr/local/bin/utf8.sh
#!/bin/bash
INPUTFILE=""
iconv -f macroman -c -t UTF-8 $INPUTFILE |tr '\r' '\n' >/tmp/file.$$.csv
mv $INPUTFILE ms_trash
mv /tmp/file.$$.csv $INPUTFILE
回答by Marek Dem?ák
In my case this worked (Mac, Excel 2011, both Cyrillic and Latin characters with Czech diacritics):
在我的情况下,这有效(Mac,Excel 2011,带有捷克变音符号的西里尔字母和拉丁字符):
- Charset UTF-16LE (simply UTF-16 was not enough)
- BOM "\xFF\xFE"
- \t (tab) as separator
- Don't forget to encode also separator and CRLFs :-)
- Use iconv instead of mb_convert_encoding
- 字符集 UTF-16LE(仅仅 UTF-16 是不够的)
- 物料清单“\xFF\xFE”
- \t (tab) 作为分隔符
- 不要忘记对分隔符和 CRLF 进行编码 :-)
- 使用 iconv 而不是 mb_convert_encoding