pandas:用不带引号的文字制表符编写制表符分隔的数据框
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37357727/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas: write tab-separated dataframe with literal tabs with no quotes
提问by Dima Lituiev
I have to reformat my data for a genetics software which requires to split each column into two, e.g 0-> G G; 1-> A G; 2 -> A A;
. The output file is supposed to be tab-delimited. I am trying to do it in pandas:
我必须为遗传学软件重新格式化我的数据,该软件需要将每列分成两列,例如0-> G G; 1-> A G; 2 -> A A;
. 输出文件应该是制表符分隔的。我想在Pandas中做到这一点:
import csv
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,3, size = (10,5)),
columns=[ chr(c) for c in range(97, 97+5) ])
def fake_alleles(x):
if x==0:
return "A\tA"
if x==1:
return "A\tG"
if x==2:
return "G\tG"
plinkpast6 = df.applymap(fake_alleles)
plinkpast6.to_csv("test.ped", sep="\t", quoting=csv.QUOTE_NONE)
Which gives me an error Error: need to escape, but no escapechar set
. Are there other ways to do it with pandas
?
这给了我一个错误Error: need to escape, but no escapechar set
。还有其他方法可以做到pandas
吗?
回答by piRSquared
sep="\t"
is trying to take each element of the dataframe row and insert a "\t"
in between. Problem is there are "\t"
in the elements and it's confusing it. It wants you to escape those "\t"
s in the elements and you haven't. I suspect you want your final output to be 6 columns.
sep="\t"
正在尝试获取数据帧行的每个元素并"\t"
在其间插入一个。问题是"\t"
元素中存在并且令人困惑。它希望你逃避"\t"
元素中的那些s 而你没有。我怀疑您希望最终输出为 6 列。
Try this:
尝试这个:
import csv
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,3, size = (10,20)))
def fake_alleles(x):
if x==0:
return "A\tA"
if x==1:
return "A\tG"
if x==2:
return "G\tG"
plinkpast6 = df.iloc[:,:3].applymap(fake_alleles)
plinkpast6 = plinkpast6.stack().str.split('\t', expand=True).unstack()
plinkpast6.to_csv("test.ped", sep="\t", quoting=csv.QUOTE_NONE)