pandas 熊猫数据框 to_csv 与 sep='\n' 一起使用,但不适用于 sep='\t'

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/47388570/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:47:32  来源:igfitidea点击:

pandas dataframe to_csv works with sep='\n' but not sep='\t'

pythonpython-2.7pandascsv

提问by Jan

I try to print my large dataframe to csv file but the tab separation sep='\t'does not work. I then test with newline sep='\n', it seems work ok, break all the elements by newline. What are possibly wrong here?

我尝试将我的大数据框打印到 csv 文件,但制表符分隔sep='\t'不起作用。然后我用 newline 测试sep='\n',它似乎工作正常,用换行符打破所有元素。这里可能有什么问题?

The code is so simple like

代码很简单

df_M.to_csv('report'+filename, header=True, sep='\t', index=False)

The example of data (the protein column is very long), I mark where to separate by |

数据的例子(蛋白质栏很长),我标记了在哪里分隔 |

"protein |  cl      | pept |    [M] |  [M+1H+]1+ |  [M+2H+]2+"      
"ALBU_HUMAN_UPS Serum albumin (Chain 26-609) - Homo sapiens (Human)|    0|   AWAVAR|        672.37072|            673.378| out-of-range"        
"ALBU_HUMAN_UPS Serum albumin (Chain 26-609) - Homo sapiens (Human)|    0|  TPVSDR| 673.3394900000002|  674.3467700000002|  out-of-range"       
"ALBU_HUMAN_UPS Serum albumin (Chain 26-609) - Homo sapiens (Human)|    0|  NYAEAK| 694.3285900000001|  695.3358700000001|  out-of-range"       
"

采纳答案by T_cat

Are you saving the data as .tsv format? your data is tsv file as you are separating the data with '\t' which is tab. csv file must be separated by ",".

您是否将数据保存为 .tsv 格式?您的数据是 tsv 文件,因为您使用制表符 '\t' 分隔数据。csv 文件必须以“,”分隔。

If you wanna save data with .csv format you need to separate by ","

如果您想以 .csv 格式保存数据,您需要用“,”分隔

Link for .csv RFC. http://www.ietf.org/rfc/rfc4180.txt

.csv RFC 的链接。http://www.ietf.org/rfc/rfc4180.txt

回答by nsaura

You can try with

你可以试试

df_M.to_csv('report'+filename, header=True, sep='\t', index=False, encoding='utf-8')

You can also see here Pandas Data Frame to_csv with more separator

您还可以在这里看到Pandas Data Frame to_csv with more separator

Or it is likely to be a version problem since I couldn't reproduce the problem, see pd.__version__provided that the last one is '0.21.0'

或者它可能是版本问题,因为我无法重现该问题,请参阅pd.__version__最后一个是'0.21.0'

Hope this is useful

希望这是有用的

回答by jezrael

There is problem all rows are in "and then get one column DataFrame.

有问题所有行都在",然后得到一列DataFrame

So need quoting=3for QUOTE_NONEand then remove trailing "by strip:

因此需要quoting=3QUOTE_NONE,然后删除尾随"strip

df_M= pd.read_csv('test.csv', sep='|', quoting=3, skipinitialspace=True)
df_M.iloc[:, 0] = df_M.iloc[:, 0].str.strip('"')
df_M.iloc[:, -1] = df_M.iloc[:, -1].str.strip('"')
df_M.columns = df_M.columns.str.strip('"')
print (df_M)

                                            protein   cl         pept   \
0  ALBU_HUMAN_UPS Serum albumin (Chain 26-609) - ...         0  AWAVAR   
1  ALBU_HUMAN_UPS Serum albumin (Chain 26-609) - ...         0  TPVSDR   
2  ALBU_HUMAN_UPS Serum albumin (Chain 26-609) - ...         0  NYAEAK   

        [M]   [M+1H+]1+      [M+2H+]2+  
0  672.37072   673.37800  out-of-range  
1  673.33949   674.34677  out-of-range  
2  694.32859   695.33587  out-of-range  

Another solution is read data to one column and then split:

另一种解决方案是将数据读取到一列,然后split

df = pd.read_csv('test.csv', sep='^')
cols = df.columns.str.split('|').tolist() 
df_M = df.iloc[:, 0].str.split('|', expand=True)
df_M.columns = cols
print (df_M)

                                            protein    cl            pept   \
0  ALBU_HUMAN_UPS Serum albumin (Chain 26-609) - ...          0     AWAVAR   
1  ALBU_HUMAN_UPS Serum albumin (Chain 26-609) - ...          0     TPVSDR   
2  ALBU_HUMAN_UPS Serum albumin (Chain 26-609) - ...          0     NYAEAK   

                 [M]            [M+1H+]1+        [M+2H+]2+  
0           672.37072              673.378    out-of-range  
1   673.3394900000002    674.3467700000002    out-of-range  
2   694.3285900000001    695.3358700000001    out-of-range  

And last to_csvworking nice:

最后to_csv工作得很好:

df_M.to_csv('report'+filename, header=True, sep='\t', index=False)