哪种编码可以在 Mac 和 Windows 上使用 Excel 正确打开 CSV 文件?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6588068/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 08:34:02  来源:igfitidea点击:

Which encoding opens CSV files correctly with Excel on both Mac and Windows?

windowsexcelmacoscsvutf

提问by Timm

We have a web app that exports CSV files containing foreign characters with UTF-8, no BOM. Both Windows and Mac users get garbage characters in Excel. I tried converting to UTF-8 with BOM; Excel/Win is fine with it, Excel/Mac shows gibberish. I'm using Excel 2003/Win, Excel 2011/Mac. Here's all the encodings I tried:

我们有一个 Web 应用程序,可以导出包含 UTF-8 外来字符的 CSV 文件,没有 BOM。Windows 和 Mac 用户都会在 Excel 中获得垃圾字符。我尝试使用 BOM 转换为 UTF-8;Excel/Win 很好用,Excel/Mac 显示乱码。我使用的是 Excel 2003/Win、Excel 2011/Mac。这是我尝试过的所有编码:

Encoding  BOM      Win                            Mac
--------  ---      ----------------------------   ------------
utf-8     --       scrambled                      scrambled
utf-8     BOM      WORKS                          scrambled
utf-16    --       file not recognized            file not recognized
utf-16    BOM      file not recognized            Chinese gibberish
utf-16LE  --       file not recognized            file not recognized
utf-16LE  BOM      characters OK,                 same as Win
                   row data all in first field

The best one is UTF-16LE with BOM, but the CSV is not recognized as such. The field separator is comma, but semicolon doesn't change things.

最好的一种是带有 BOM 的 UTF-16LE,但 CSV 不被识别。字段分隔符是逗号,但分号不会改变。

Is there any encoding that works in both worlds?

是否有任何编码适用于两个世界?

采纳答案by Timm

The lowdown is: There is no solution. Excel 2011/Mac cannot correctly interpret a CSV file containing umlauts and diacritical marks no matter what encoding or hoop jumping you do. I'd be glad to hear someone tell me different!

底线是:没有解决方案。无论您使用何种编码或跳圈,Excel 2011/Mac 都无法正确解释包含变音符号和变音符号的 CSV 文件。我很高兴听到有人告诉我不同​​的!

回答by mikezter

Excel Encodings

Excel 编码

I found the WINDOWS-1252encoding to be the least frustrating when dealing with Excel. Since its basically Microsofts own proprietary character set, one can assume it will work on both the Mac and the Windows version of MS-Excel. Both versions at least include a corresponding "File origin" or "File encoding" selector which correctly reads the data.

我发现WINDOWS-1252在处理 Excel 时,编码是最不令人沮丧的。因为它基本上是微软自己的专有字符集,所以可以假设它可以在 Mac 和 Windows 版本的 MS-Excel 上运行。两个版本都至少包含相应的“文件来源”或“文件编码”选择器,可以正确读取数据。

Depending on your system and the tools you use, this encoding could also be named CP1252, ANSI, Windows (ANSI), MS-ANSIor just Windows, among other variations.

根据您的系统和使用的工具上,该编码也可以被命名为CP1252ANSIWindows (ANSI)MS-ANSI或只Windows,其他变化之中。

This encoding is a superset of ISO-8859-1(aka LATIN1and others), so you can fallback to ISO-8859-1if you cannot use WINDOWS-1252for some reason. Be advised that ISO-8859-1is missing some characters from WINDOWS-1252as shown here:

此编码是ISO-8859-1(又名LATIN1和其他)的超集,因此ISO-8859-1如果WINDOWS-1252由于某种原因无法使用,您可以回退到。请注意,ISO-8859-1这里缺少一些字符,WINDOWS-1252如下所示:

| Char | ANSI | Unicode | ANSI Hex | Unicode Hex | HTML entity | Unicode Name                               | Unicode Range            |
|     | 128  | 8364    | 0x80     | U+20AC      | €      | euro sign                                  | Currency Symbols         |
| ?    | 130  | 8218    | 0x82     | U+201A      | ‚     | single low-9 quotation mark                | General Punctuation      |
| ?    | 131  | 402     | 0x83     | U+0192      | ƒ      | Latin small letter f with hook             | Latin Extended-B         |
| ?    | 132  | 8222    | 0x84     | U+201E      | „     | double low-9 quotation mark                | General Punctuation      |
| …    | 133  | 8230    | 0x85     | U+2026      | …    | horizontal ellipsis                        | General Punctuation      |
| ?    | 134  | 8224    | 0x86     | U+2020      | †    | dagger                                     | General Punctuation      |
| ?    | 135  | 8225    | 0x87     | U+2021      | ‡    | double dagger                              | General Punctuation      |
| ?    | 136  | 710     | 0x88     | U+02C6      | ˆ      | modifier letter circumflex accent          | Spacing Modifier Letters |
| ‰    | 137  | 8240    | 0x89     | U+2030      | ‰    | per mille sign                             | General Punctuation      |
| ?    | 138  | 352     | 0x8A     | U+0160      | Š    | Latin capital letter S with caron          | Latin Extended-A         |
| ?    | 139  | 8249    | 0x8B     | U+2039      | ‹    | single left-pointing angle quotation mark  | General Punctuation      |
| ?    | 140  | 338     | 0x8C     | U+0152      | Œ     | Latin capital ligature OE                  | Latin Extended-A         |
| ?    | 142  | 381     | 0x8E     | U+017D      |             | Latin capital letter Z with caron          | Latin Extended-A         |
| ‘    | 145  | 8216    | 0x91     | U+2018      | ‘     | left single quotation mark                 | General Punctuation      |
| '    | 146  | 8217    | 0x92     | U+2019      | ’     | right single quotation mark                | General Punctuation      |
| “    | 147  | 8220    | 0x93     | U+201C      | “     | left double quotation mark                 | General Punctuation      |
| ”    | 148  | 8221    | 0x94     | U+201D      | ”     | right double quotation mark                | General Punctuation      |
| ?    | 149  | 8226    | 0x95     | U+2022      | •      | bullet                                     | General Punctuation      |
| –    | 150  | 8211    | 0x96     | U+2013      | –     | en dash                                    | General Punctuation      |
| —    | 151  | 8212    | 0x97     | U+2014      | —     | em dash                                    | General Punctuation      |
| ?    | 152  | 732     | 0x98     | U+02DC      | ˜     | small tilde                                | Spacing Modifier Letters |
| ?    | 153  | 8482    | 0x99     | U+2122      | ™     | trade mark sign                            | Letterlike Symbols       |
| ?    | 154  | 353     | 0x9A     | U+0161      | š    | Latin small letter s with caron            | Latin Extended-A         |
| ?    | 155  | 8250    | 0x9B     | U+203A      | ›    | single right-pointing angle quotation mark | General Punctuation      |
| ?    | 156  | 339     | 0x9C     | U+0153      | œ     | Latin small ligature oe                    | Latin Extended-A         |
| ?    | 158  | 382     | 0x9E     | U+017E      |             | Latin small letter z with caron            | Latin Extended-A         |
| ?    | 159  | 376     | 0x9F     | U+0178      | Ÿ      | Latin capital letter Y with diaeresis      | Latin Extended-A         |

Note that the euro sign is missing. This table can be found at Alan Wood.

请注意,缺少欧元符号。这张表可以在Alan Wood找到。

Conversion

转换

Conversion is done differently in every tool and language. However, suppose you have a file query_result.csvwhich you know is UTF-8encoded. Convert it to WINDOWS-1252using iconv:

每种工具和语言的转换方式都不同。但是,假设您有一个query_result.csv已知已UTF-8编码的文件。将其转换为WINDOWS-1252使用iconv

iconv -f UTF-8 -t WINDOWS-1252 query_result.csv > query_result-win.csv

回答by Duncan Smart

For UTF-16LE with BOM if you use tab characters as your delimiters instead of commas Excel will recognise the fields. The reason it works is that Excel actually ends up using its Unicode *.txt parser.

对于带有 BOM 的 UTF-16LE,如果您使用制表符作为分隔符而不是逗号,Excel 将识别这些字段。它工作的原因是 Excel 实际上最终使用了它的 Unicode *.txt 解析器。

Caveat: If the file is edited in Excel and saved, it will be saved as tab-delimited ASCII. The problem now is that when you re-open the file Excel assumes it's real CSV (with commas), sees that it's not Unicode, so parses it as comma-delimited - and hence will make a hash of it!

警告:如果文件在 Excel 中编辑并保存,它将保存为制表符分隔的 ASCII。现在的问题是,当您重新打开文件时,Excel 假定它是真正的 CSV(带逗号),发现它不是 Unicode,因此将其解析为逗号分隔 - 因此将对它进行哈希处理!

Update: The above caveat doesn't appear to be happening for me today in Excel 2010 (Windows) at least, although there does appear to be a difference in saving behaviour if:

更新:至少在今天的 Excel 2010 (Windows) 中,上述警告似乎不会发生在我身上,尽管在以下情况下保存行为似乎有所不同:

  • you edit and quit Excel (tries to save as 'Unicode *.txt')
  • 您编辑并退出 Excel(尝试另存为“Unicode *.txt”)

compared to:

相比:

  • editing and closing just the file(works as expected).
  • 仅编辑和关闭文件(按预期工作)。

回答by Walter Tross

You only have tried comma-separated and semicolon-separated CSV. If you had tried tab-separated CSV (also called TSV) you would have found the answer:

您只尝试过逗号分隔和分号分隔的 CSV。如果您尝试过制表符分隔的 CSV(也称为 TSV),您会找到答案:

UTF-16LEwith BOM(byte order mark), tab-separated

带有BOM(字节顺序标记)的UTF-16LE制表符分隔



But: In a comment you mention that TSV is not an option for you (I haven't been able to find this requirement in your question though). That's a pity. It often means that you allow manual editing of TSV files, which probably is not a good idea. Visual checking of TSV files is not a problem. Furthermore editors can be set to display a special character to mark tabs.

但是:在评论中,您提到 TSV 不适合您(尽管我在您的问题中找不到此要求)。太可惜了。这通常意味着您允许手动编辑 TSV 文件,这可能不是一个好主意。目视检查 TSV 文件不是问题。此外,可以将编辑器设置为显示特殊字符以标记选项卡。

And yes, I tried this out on Windows and Mac.

是的,我在 Windows 和 Mac 上试过了。

回答by Timm

Here's the clincher on importing utf8-encoded CSV into Excel 2011 for Mac: Microsoft says: "Excel for Mac does not currently support UTF-8." Excel for Mac 2011 and UTF-8

这是将 utf8 编码的 CSV 导入 Excel 2011 for Mac 的关键:微软表示:“Excel for Mac 目前不支持 UTF-8。” Excel for Mac 2011 和 UTF-8

Yay, way to go MS!

是的,MS 加油!

回答by brablc

The best workaround for reading CSV files with UTF-8 on Mac is to convert them into XLSX format. I have found a script made by Konrad Foerstner, which I have improved little bit by adding support for different delimiter characters.

在 Mac 上使用 UTF-8 读取 CSV 文件的最佳解决方法是将它们转换为 XLSX 格式。我找到了 Konrad Foerstner 制作的脚本,通过添加对不同分隔符的支持,我对其进行了一些改进。

Download the script from Github https://github.com/brablc/clit/blob/master/csv2xlsx.py. In order to run it you will need to install a python module openpyxlfor Excel file manipulation: sudo easy_install openpyxl.

从 Github https://github.com/brablc/clit/blob/master/csv2xlsx.py下载脚本。为了运行它,您需要安装一个用于 Excel 文件操作的 python 模块openpyxlsudo easy_install openpyxl.

回答by user525081

It seems to my case that Excel 2011 for Mac OS is not using Encoding.GetEncoding("10000") as i thought and wasted 2 days with but the same iso as on Microsoft OS. The best proof for this is to make a file in Excel 2011 for MAC with special chars, save it as CSV and then open it in MAC text editor and the chars are scrambled.

在我看来,Mac OS 的 Excel 2011 没有像我想的那样使用 Encoding.GetEncoding("10000") 并且浪费了 2 天,但与 Microsoft OS 上的 iso 相同。最好的证明是在 Excel 2011 for MAC 中制作一个带有特殊字符的文件,将其另存为 CSV,然后在 MAC 文本编辑器中打开它,然后对字符进行加扰。

For me this approach worked - meaning that csv export on Excel 2011 on MAC OS has special western europeean chars inside:

对我来说,这种方法有效 - 这意味着 MAC OS 上 Excel 2011 上的 csv 导出里面有特殊的西欧字符:

Encoding isoMacOS = Encoding.GetEncoding("iso-8859-1");
Encoding defaultEncoding = Encoding.Default; 

// Convert the string into a byte array.
byte[] defaultEncodingBytes = defaultEncoding.GetBytes(exportText);

// Perform the conversion from one encoding to the other.
byte[] ansiBytes = Encoding.Convert(defaultEncoding, isoMacOS, defaultEncodingBytes);

decodedString = isoMacOS.GetString(ansiBytes);

回答by Craig Stuntz

UTF-8 with no BOM currently works for me in Excel Mac 2011 14.3.2.

没有 BOM 的 UTF-8 目前适用于 Excel Mac 2011 14.3.2。

UTF-8 + BOM kind of works, but BOM rendered as gibberish.

UTF-8 + BOM 有点工作,但 BOM 呈现为胡言乱语。

UTF-16 works if you Import the file and complete the wizard, but not if you just double-click it.

如果您导入文件并完成向导,则 UTF-16 有效,但如果您只是双击它,则无效。

回答by Nuno

On my Mac OS, Text Wrangler identified a CSV file created with Excel as having "Western" encoding.

在我的 Mac OS 上,Text Wrangler 将使用 Excel 创建的 CSV 文件识别为具有“西方”编码。

After some googling I have made this small script (I am not sure about Windows availability, maybe with Cygwin?):

经过一番谷歌搜索后,我制作了这个小脚本(我不确定 Windows 的可用性,也许是Cygwin?):

$ cat /usr/local/bin/utf8.sh

$ cat /usr/local/bin/utf8.sh

#!/bin/bash

INPUTFILE=""

iconv -f macroman -c -t UTF-8 $INPUTFILE |tr '\r' '\n' >/tmp/file.$$.csv

mv $INPUTFILE ms_trash
mv /tmp/file.$$.csv $INPUTFILE

回答by Marek Dem?ák

In my case this worked (Mac, Excel 2011, both Cyrillic and Latin characters with Czech diacritics):

在我的情况下,这有效(Mac,Excel 2011,带有捷克变音符号的西里尔字母和拉丁字符):

  • Charset UTF-16LE (simply UTF-16 was not enough)
  • BOM "\xFF\xFE"
  • \t (tab) as separator
  • Don't forget to encode also separator and CRLFs :-)
  • Use iconv instead of mb_convert_encoding
  • 字符集 UTF-16LE(仅仅 UTF-16 是不够的)
  • 物料清单“\xFF\xFE”
  • \t (tab) 作为分隔符
  • 不要忘记对分隔符和 CRLF 进行编码 :-)
  • 使用 iconv 而不是 mb_convert_encoding