Python pandas to_csv 输出引用问题

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/21147058/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 22:10:48  来源:igfitidea点击:

pandas to_csv output quoting issue

pythonfile-iopandas

提问by user3199761

I'm having trouble getting the pandas dataframe.to_csv(...)output quoting strings right.

我无法让 Pandasdataframe.to_csv(...)输出正确地引用字符串。

import pandas as pd

text = 'this is "out text"'
df = pd.DataFrame(index=['1'],columns=['1','2'])
df.loc['1','1']=123
df.loc['1','2']=text
df.to_csv('foo.txt',index=False,header=False)

The output is:

输出是:

123,"this is ""out text"""

123,"这是""输出文本"""

But I would like:

但我想:

123,this is "out text"

123,这是“外文”

Does anyone know how to get this right?

有谁知道如何做到这一点?

采纳答案by DSM

You could pass quoting=csv.QUOTE_NONE, for example:

你可以通过quoting=csv.QUOTE_NONE,例如:

>>> df.to_csv('foo.txt',index=False,header=False)
>>> !cat foo.txt
123,"this is ""out text"""
>>> import csv
>>> df.to_csv('foo.txt',index=False,header=False, quoting=csv.QUOTE_NONE)
>>> !cat foo.txt
123,this is "out text"

but in my experience it's better to quote more, rather than less.

但根据我的经验,引用更多而不是更少更好。

回答by ericmjl

As opposed to writing 'foo.txt', write 'foo.csv'. That solved the issue. When the CSV file is read in Excel, the extra quotation marks are absent.

相对于写作'foo.txt',写作'foo.csv'。那解决了这个问题。在 Excel 中读取 CSV 文件时,没有多余的引号。

回答by Owen

Note: there is currently a small error in the Pandas to_string documentation. It says:

注意:目前 Pandas to_string 文档中存在一个小错误。它说:

  • quoting : int, Controls whether quotes should be recognized. Values are taken from csv.QUOTE_* values. Acceptable values are 0, 1, 2, and 3 for QUOTE_MINIMAL, QUOTE_ALL, QUOTE_NONE, and QUOTE_NONNUMERIC,
    respectively.
  • quoting : int,控制是否应该识别引号。值取自 csv.QUOTE_* 值。对于 QUOTE_MINIMAL、QUOTE_ALL、QUOTE_NONE 和 QUOTE_NONNUMERIC,可接受的值分别为 0、1、2 和 3

But this reverses how csv defines the QUOTE_NONE and QUOTE_NONNUMERIC variables.

但这颠倒了 csv 定义 QUOTE_NONE 和 QUOTE_NONNUMERIC 变量的方式。

In [13]: import csv
In [14]: csv.QUOTE_NONE
Out[14]: 3

回答by alvas

To use quoting=csv.QUOTE_NONE, you need to set the escapechar, e.g.

要使用quoting=csv.QUOTE_NONE,您需要设置escapechar,例如

# Create a tab-separated file with quotes
$ echo abc$'\t'defg$'\t'$'"xyz"' > in.tsv
$ cat in.tsv
abc defg    "xyz"

# Gotcha the quotes disappears in `"..."`
$ python3
>>> import pandas as pd
>>> import csv
>>> df = pd.read("in.tsv", sep="\t")
>>> df = pd.read_csv("in.tsv", sep="\t")
>>> df
Empty DataFrame
Columns: [abc, defg, xyz]
Index: []


# When reading in pandas, to read the `"..."` quotes,
# you have to explicitly say there's no `quotechar`
>>> df = pd.read_csv("in.tsv", sep="\t", quotechar='##代码##')
>>> df
Empty DataFrame
Columns: [abc, defg, "xyz"]
Index: []

# To print out without the quotes.
>> df.to_csv("out.tsv", , sep="\t", quoting=csv.QUOTE_NONE, quotechar="",  escapechar="\")

回答by penduDev

To use without escapechar:

要在没有转义符的情况下使用:

Replace comma char,(Unicode:U+002C) in your df with an single low-9 quotation markcharacter ?(Unicode: U+201A)

单个低 9 引号字符(Unicode:U+201A)替换df 中的逗号字符,(Unicode:U+002C )?

After this, you can simply use:

在此之后,您可以简单地使用:

import csv df.to_csv('foo.txt', index=False, header=False, quoting=csv.QUOTE_NONE)

import csv df.to_csv('foo.txt', index=False, header=False, quoting=csv.QUOTE_NONE)