pandas 熊猫在 to_csv 中转义回车

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34550120/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:26:58  来源:igfitidea点击:

Pandas escape carriage return in to_csv

pythonpandas

提问by Kamil Sindi

I have a string column that sometimes has carriage returns in the string:

我有一个字符串列,有时在字符串中有回车:

import pandas as pd
from io import StringIO

datastring = StringIO("""\
country  metric           2011   2012
USA      GDP              7      4
USA      Pop.             2      3
GB       GDP              8      7
""")
df = pd.read_table(datastring, sep='\s\s+')
df.metric = df.metric + '\r'  # append carriage return

print(df)
  country  metric  2011  2012
0     USA   GDP\r     7     4
1     USA  Pop.\r     2     3
2      GB   GDP\r     8     7

When writing to and reading from csv, the dataframe gets corrupted:

写入和读取 csv 时,数据帧被损坏:

df.to_csv('data.csv', index=None)

print(pd.read_csv('data.csv'))
  country metric  2011  2012
0     USA    GDP   NaN   NaN
1     NaN      7     4   NaN
2     USA   Pop.   NaN   NaN
3     NaN      2     3   NaN
4      GB    GDP   NaN   NaN
5     NaN      8     7   NaN

Question

What's the best way to fix this? The one obvious method is to just clean the data first:

解决此问题的最佳方法是什么?一个明显的方法是先清理数据:

df.metric = df.metric.str.replace('\r', '')

回答by Mike Müller

Specify the line_terminator:

指定line_terminator

print(pd.read_csv('data.csv', line_terminator='\n'))

  country  metric  2011  2012
0     USA   GDP\r     7     4
1     USA  Pop.\r     2     3
2      GB   GDP\r     8     7

UPDATE:

更新:

In more recent versions of pandas (the original answer is from 2015) the name of the argument changed to lineterminator.

在最新版本的Pandas(原始答案来自 2015 年)中,参数名称更改为lineterminator.