pandas 熊猫数据框 to_csv 与 sep='\n' 一起使用,但不适用于 sep='\t'
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/47388570/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas dataframe to_csv works with sep='\n' but not sep='\t'
提问by Jan
I try to print my large dataframe to csv file but the tab separation sep='\t'
does not work. I then test with newline sep='\n'
, it seems work ok, break all the elements by newline. What are possibly wrong here?
我尝试将我的大数据框打印到 csv 文件,但制表符分隔sep='\t'
不起作用。然后我用 newline 测试sep='\n'
,它似乎工作正常,用换行符打破所有元素。这里可能有什么问题?
The code is so simple like
代码很简单
df_M.to_csv('report'+filename, header=True, sep='\t', index=False)
The example of data (the protein column is very long), I mark where to separate by |
数据的例子(蛋白质栏很长),我标记了在哪里分隔 |
"protein | cl | pept | [M] | [M+1H+]1+ | [M+2H+]2+"
"ALBU_HUMAN_UPS Serum albumin (Chain 26-609) - Homo sapiens (Human)| 0| AWAVAR| 672.37072| 673.378| out-of-range"
"ALBU_HUMAN_UPS Serum albumin (Chain 26-609) - Homo sapiens (Human)| 0| TPVSDR| 673.3394900000002| 674.3467700000002| out-of-range"
"ALBU_HUMAN_UPS Serum albumin (Chain 26-609) - Homo sapiens (Human)| 0| NYAEAK| 694.3285900000001| 695.3358700000001| out-of-range"
"
采纳答案by T_cat
Are you saving the data as .tsv format? your data is tsv file as you are separating the data with '\t' which is tab. csv file must be separated by ",".
您是否将数据保存为 .tsv 格式?您的数据是 tsv 文件,因为您使用制表符 '\t' 分隔数据。csv 文件必须以“,”分隔。
If you wanna save data with .csv format you need to separate by ","
如果您想以 .csv 格式保存数据,您需要用“,”分隔
Link for .csv RFC. http://www.ietf.org/rfc/rfc4180.txt
.csv RFC 的链接。http://www.ietf.org/rfc/rfc4180.txt
回答by nsaura
You can try with
你可以试试
df_M.to_csv('report'+filename, header=True, sep='\t', index=False, encoding='utf-8')
You can also see here Pandas Data Frame to_csv with more separator
您还可以在这里看到Pandas Data Frame to_csv with more separator
Or it is likely to be a version problem since I couldn't reproduce the problem, see pd.__version__
provided that the last one is '0.21.0'
或者它可能是版本问题,因为我无法重现该问题,请参阅pd.__version__
最后一个是'0.21.0'
Hope this is useful
希望这是有用的
回答by jezrael
There is problem all rows are in "
and then get one column DataFrame
.
有问题所有行都在"
,然后得到一列DataFrame
。
So need quoting=3
for QUOTE_NONE
and then remove trailing "
by strip
:
因此需要quoting=3
对QUOTE_NONE
,然后删除尾随"
的strip
:
df_M= pd.read_csv('test.csv', sep='|', quoting=3, skipinitialspace=True)
df_M.iloc[:, 0] = df_M.iloc[:, 0].str.strip('"')
df_M.iloc[:, -1] = df_M.iloc[:, -1].str.strip('"')
df_M.columns = df_M.columns.str.strip('"')
print (df_M)
protein cl pept \
0 ALBU_HUMAN_UPS Serum albumin (Chain 26-609) - ... 0 AWAVAR
1 ALBU_HUMAN_UPS Serum albumin (Chain 26-609) - ... 0 TPVSDR
2 ALBU_HUMAN_UPS Serum albumin (Chain 26-609) - ... 0 NYAEAK
[M] [M+1H+]1+ [M+2H+]2+
0 672.37072 673.37800 out-of-range
1 673.33949 674.34677 out-of-range
2 694.32859 695.33587 out-of-range
Another solution is read data to one column and then split
:
另一种解决方案是将数据读取到一列,然后split
:
df = pd.read_csv('test.csv', sep='^')
cols = df.columns.str.split('|').tolist()
df_M = df.iloc[:, 0].str.split('|', expand=True)
df_M.columns = cols
print (df_M)
protein cl pept \
0 ALBU_HUMAN_UPS Serum albumin (Chain 26-609) - ... 0 AWAVAR
1 ALBU_HUMAN_UPS Serum albumin (Chain 26-609) - ... 0 TPVSDR
2 ALBU_HUMAN_UPS Serum albumin (Chain 26-609) - ... 0 NYAEAK
[M] [M+1H+]1+ [M+2H+]2+
0 672.37072 673.378 out-of-range
1 673.3394900000002 674.3467700000002 out-of-range
2 694.3285900000001 695.3358700000001 out-of-range
And last to_csv
working nice:
最后to_csv
工作得很好:
df_M.to_csv('report'+filename, header=True, sep='\t', index=False)