Python 将 Pandas DataFrame 写入 CSV 文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/16923281/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Writing a pandas DataFrame to CSV file
提问by user7289
I have a dataframe in pandas which I would like to write to a CSV file. I am doing this using:
我在 Pandas 中有一个数据框,我想将其写入 CSV 文件。我正在使用:
df.to_csv('out.csv')
And getting the error:
并得到错误:
UnicodeEncodeError: 'ascii' codec can't encode character u'\u03b1' in position 20: ordinal not in range(128)
Is there any way to get around this easily (i.e. I have unicode characters in my data frame)? And is there a way to write to a tab delimited file instead of a CSV using e.g. a 'to-tab' method (that I dont think exists)?
有什么办法可以轻松解决这个问题(即我的数据框中有 unicode 字符)?有没有办法使用例如“to-tab”方法(我认为不存在)写入制表符分隔的文件而不是CSV?
采纳答案by Andy Hayden
回答by Harsha Komarraju
Sometimes you face these problems if you specify UTF-8 encoding also. I recommend you to specify encoding while reading file and same encoding while writing to file. This might solve your problem.
如果您也指定 UTF-8 编码,有时您会遇到这些问题。我建议您在读取文件时指定编码,并在写入文件时指定相同的编码。这可能会解决您的问题。
回答by Glen Thompson
Something else you can try if you are having issues encoding to 'utf-8' and want to go cell by cell you could try the following.
如果您在编码为 'utf-8' 时遇到问题并且想要逐个单元格地进行,您可以尝试其他方法,您可以尝试以下操作。
Python 2
蟒蛇 2
(Where "df" is your DataFrame object.)
(其中“df”是您的 DataFrame 对象。)
for column in df.columns:
for idx in df[column].index:
x = df.get_value(idx,column)
try:
x = unicode(x.encode('utf-8','ignore'),errors ='ignore') if type(x) == unicode else unicode(str(x),errors='ignore')
df.set_value(idx,column,x)
except Exception:
print 'encoding error: {0} {1}'.format(idx,column)
df.set_value(idx,column,'')
continue
Then try:
然后尝试:
df.to_csv(file_name)
You can check the encoding of the columns by:
您可以通过以下方式检查列的编码:
for column in df.columns:
print '{0} {1}'.format(str(type(df[column][0])),str(column))
Warning: errors='ignore' will just omit the character e.g.
警告:errors='ignore' 只会省略字符,例如
IN: unicode('Regenexx\xae',errors='ignore')
OUT: u'Regenexx'
Python 3
蟒蛇 3
for column in df.columns:
for idx in df[column].index:
x = df.get_value(idx,column)
try:
x = x if type(x) == str else str(x).encode('utf-8','ignore').decode('utf-8','ignore')
df.set_value(idx,column,x)
except Exception:
print('encoding error: {0} {1}'.format(idx,column))
df.set_value(idx,column,'')
continue
回答by Sayan Sil
When you are storing a DataFrameobject into a csv fileusing the to_csvmethod, you probably wont be needing to store the preceding indicesof each rowof the DataFrameobject.
当您使用该方法将DataFrame对象存储到csv 文件中时to_csv,您可能不需要存储该对象每一行的前面索引。DataFrame
You can avoidthat by passing a Falseboolean value to indexparameter.
您可以通过将布尔值传递给参数来避免这种情况。Falseindex
Somewhat like:
有点像:
df.to_csv(file_name, encoding='utf-8', index=False)
So if your DataFrame object is something like:
因此,如果您的 DataFrame 对象类似于:
Color Number
0 red 22
1 blue 10
The csv file will store:
csv 文件将存储:
Color,Number
red,22
blue,10
instead of (the case when the default valueTruewas passed)
而不是(传递默认值的情况True)
,Color,Number
0,red,22
1,blue,10
回答by Yury Wallet
it could be not the answer for this case, but as I had the same error-message with .to_csvI tried .toCSV('name.csv')and the error-message was different ("SparseDataFrame' object has no attribute 'toCSV'). So the problem was solved by turning dataframe to dense dataframe
这可能不是这种情况的答案,但是由于我.to_csv尝试了相同的错误消息.toCSV('name.csv')并且错误消息不同(“ SparseDataFrame' object has no attribute 'toCSV')。因此通过将数据帧转换为密集数据帧解决了问题
df.to_dense().to_csv("submission.csv", index = False, sep=',', encoding='utf-8')
回答by cs95
To write a pandas DataFrame to a CSV file, you will need DataFrame.to_csv. This function offers many arguments with reasonable defaults that you will more often than not need to override to suit your specific use case. For example, you might want to use a different separator, change the datetime format, or drop the index when writing. to_csvhas arguments you can pass to address these requirements.
要将 Pandas DataFrame 写入 CSV 文件,您需要DataFrame.to_csv. 此函数提供了许多具有合理默认值的参数,您通常需要覆盖这些参数以适合您的特定用例。例如,您可能希望在写入时使用不同的分隔符、更改日期时间格式或删除索引。to_csv有您可以传递的参数来满足这些要求。
Here's a table listing some common scenarios of writing to CSV files and the corresponding arguments you can use for them.
下表列出了写入 CSV 文件的一些常见场景以及您可以使用的相应参数。


Footnotes
- The default separator is assumed to be a comma (
','). Don't change this unless you know you need to.- By default, the index of
dfis written as the first column. If your DataFrame does not have an index (IOW, thedf.indexis the defaultRangeIndex), then you will want to setindex=Falsewhen writing. To explain this in a different way, if your data DOES have an index, you can (and should) useindex=Trueor just leave it out completely (as the default isTrue).- It would be wise to set this parameter if you are writing string data so that other applications know how to read your data. This will also avoid any potential
UnicodeEncodeErrors you might encounter while saving.- Compression is recommended if you are writing large DataFrames (>100K rows) to disk as it will result in much smaller output files. OTOH, it will mean the write time will increase (and consequently, the read time since the file will need to be decompressed).
脚注
- 默认分隔符假定为逗号 (
',')。除非您知道需要,否则不要更改此设置。- 默认情况下,索引
df被写入第一列。如果您的 DataFrame 没有索引(IOW,这df.index是默认值RangeIndex),那么您将需要index=False在写入时进行设置。以不同的方式解释这一点,如果您的数据确实有索引,您可以(并且应该)使用index=True或完全不使用它(默认为True)。- 如果您正在写入字符串数据,那么设置此参数是明智的,以便其他应用程序知道如何读取您的数据。这也将避免
UnicodeEncodeError您在保存时可能遇到的任何潜在s。- 如果您将大型 DataFrame(> 100K 行)写入磁盘,建议使用压缩,因为它会导致输出文件小得多。OTOH,这意味着写入时间会增加(因此,由于文件需要解压缩,因此读取时间会增加)。
回答by Harvey
Example of export in file with full path on Windowsand in case your file has headers:
在 Windows 上使用完整路径导出文件的示例,如果您的文件有标题:
df.to_csv (r'C:\Users\John\Desktop\export_dataframe.csv', index = None, header=True)
Example if you have want to store in folder in same directory where your script is, with utf-8 encodingand tab as separator:
例如,如果您想将脚本存储在与脚本所在目录相同的文件夹中,使用utf-8 编码和制表符作为分隔符:
df.to_csv(r'./export/dftocsv.csv', sep='\t', encoding='utf-8', header='true')

