pandas 熊猫数据框 to_csv 与 sep='\n' 一起使用，但不适用于 sep='\t'

Question

提问by Jan

I try to print my large dataframe to csv file but the tab separation sep='\t'does not work. I then test with newline sep='\n', it seems work ok, break all the elements by newline. What are possibly wrong here?

我尝试将我的大数据框打印到 csv 文件，但制表符分隔sep='\t'不起作用。然后我用 newline 测试sep='\n'，它似乎工作正常，用换行符打破所有元素。这里可能有什么问题？

The code is so simple like

代码很简单

df_M.to_csv('report'+filename, header=True, sep='\t', index=False)

The example of data (the protein column is very long), I mark where to separate by |

数据的例子（蛋白质栏很长），我标记了在哪里分隔 |

"protein |  cl      | pept |    [M] |  [M+1H+]1+ |  [M+2H+]2+"      
"ALBU_HUMAN_UPS Serum albumin (Chain 26-609) - Homo sapiens (Human)|    0|   AWAVAR|        672.37072|            673.378| out-of-range"        
"ALBU_HUMAN_UPS Serum albumin (Chain 26-609) - Homo sapiens (Human)|    0|  TPVSDR| 673.3394900000002|  674.3467700000002|  out-of-range"       
"ALBU_HUMAN_UPS Serum albumin (Chain 26-609) - Homo sapiens (Human)|    0|  NYAEAK| 694.3285900000001|  695.3358700000001|  out-of-range"       
"

Answer 1

采纳答案by T_cat

Are you saving the data as .tsv format? your data is tsv file as you are separating the data with '\t' which is tab. csv file must be separated by ",".

您是否将数据保存为 .tsv 格式？您的数据是 tsv 文件，因为您使用制表符 '\t' 分隔数据。csv 文件必须以“,”分隔。

If you wanna save data with .csv format you need to separate by ","

如果您想以 .csv 格式保存数据，您需要用“,”分隔

Link for .csv RFC. http://www.ietf.org/rfc/rfc4180.txt

.csv RFC 的链接。http://www.ietf.org/rfc/rfc4180.txt

Answer 2

回答by nsaura

You can try with

你可以试试

df_M.to_csv('report'+filename, header=True, sep='\t', index=False, encoding='utf-8')

You can also see here Pandas Data Frame to_csv with more separator

您还可以在这里看到Pandas Data Frame to_csv with more separator

Or it is likely to be a version problem since I couldn't reproduce the problem, see pd.__version__provided that the last one is '0.21.0'

或者它可能是版本问题，因为我无法重现该问题，请参阅pd.__version__最后一个是'0.21.0'

Hope this is useful

希望这是有用的

Answer 3

回答by jezrael

There is problem all rows are in "and then get one column DataFrame.

有问题所有行都在"，然后得到一列DataFrame。

So need quoting=3for QUOTE_NONEand then remove trailing "by strip:

因此需要quoting=3对QUOTE_NONE，然后删除尾随"的strip：

df_M= pd.read_csv('test.csv', sep='|', quoting=3, skipinitialspace=True)
df_M.iloc[:, 0] = df_M.iloc[:, 0].str.strip('"')
df_M.iloc[:, -1] = df_M.iloc[:, -1].str.strip('"')
df_M.columns = df_M.columns.str.strip('"')
print (df_M)

                                            protein   cl         pept   \
0  ALBU_HUMAN_UPS Serum albumin (Chain 26-609) - ...         0  AWAVAR   
1  ALBU_HUMAN_UPS Serum albumin (Chain 26-609) - ...         0  TPVSDR   
2  ALBU_HUMAN_UPS Serum albumin (Chain 26-609) - ...         0  NYAEAK   

        [M]   [M+1H+]1+      [M+2H+]2+  
0  672.37072   673.37800  out-of-range  
1  673.33949   674.34677  out-of-range  
2  694.32859   695.33587  out-of-range

Another solution is read data to one column and then split:

另一种解决方案是将数据读取到一列，然后split：

df = pd.read_csv('test.csv', sep='^')
cols = df.columns.str.split('|').tolist() 
df_M = df.iloc[:, 0].str.split('|', expand=True)
df_M.columns = cols
print (df_M)

                                            protein    cl            pept   \
0  ALBU_HUMAN_UPS Serum albumin (Chain 26-609) - ...          0     AWAVAR   
1  ALBU_HUMAN_UPS Serum albumin (Chain 26-609) - ...          0     TPVSDR   
2  ALBU_HUMAN_UPS Serum albumin (Chain 26-609) - ...          0     NYAEAK   

                 [M]            [M+1H+]1+        [M+2H+]2+  
0           672.37072              673.378    out-of-range  
1   673.3394900000002    674.3467700000002    out-of-range  
2   694.3285900000001    695.3358700000001    out-of-range

And last to_csvworking nice:

最后to_csv工作得很好：

df_M.to_csv('report'+filename, header=True, sep='\t', index=False)

pandas 熊猫数据框 to_csv 与 sep='\n' 一起使用，但不适用于 sep='\t'

提问by Jan

采纳答案by T_cat

回答by nsaura

回答by jezrael

相关推荐

最近更新

标签

pandas 熊猫数据框 to_csv 与 sep='\n' 一起使用，但不适用于 sep='\t'

提问by Jan

采纳答案by T_cat

回答by nsaura

回答by jezrael

相关推荐

转置 Pandas DataFrame 并将列标题更改为列表

pandas 按组与熊猫相加唯一值

在 Pandas 数据框中计算滚动 z 分数

Pandas groupby 将非空值计数为百分比

相关推荐

最近更新

标签