pandas：用不带引号的文字制表符编写制表符分隔的数据框

Question

提问by Dima Lituiev

I have to reformat my data for a genetics software which requires to split each column into two, e.g 0-> G G; 1-> A G; 2 -> A A;. The output file is supposed to be tab-delimited. I am trying to do it in pandas:

我必须为遗传学软件重新格式化我的数据，该软件需要将每列分成两列，例如0-> G G; 1-> A G; 2 -> A A;. 输出文件应该是制表符分隔的。我想在Pandas中做到这一点：

import csv
import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randint(0,3, size = (10,5)), 
                  columns=[ chr(c) for c in range(97, 97+5) ])

def fake_alleles(x):
    if x==0:
        return "A\tA"
    if x==1:
        return "A\tG"
    if x==2:
        return "G\tG"

plinkpast6 = df.applymap(fake_alleles)
plinkpast6.to_csv("test.ped", sep="\t", quoting=csv.QUOTE_NONE)

Which gives me an error Error: need to escape, but no escapechar set. Are there other ways to do it with pandas?

这给了我一个错误Error: need to escape, but no escapechar set。还有其他方法可以做到pandas吗？

Answer 1

回答by piRSquared

sep="\t"is trying to take each element of the dataframe row and insert a "\t"in between. Problem is there are "\t"in the elements and it's confusing it. It wants you to escape those "\t"s in the elements and you haven't. I suspect you want your final output to be 6 columns.

sep="\t"正在尝试获取数据帧行的每个元素并"\t"在其间插入一个。问题是"\t"元素中存在并且令人困惑。它希望你逃避"\t"元素中的那些s 而你没有。我怀疑您希望最终输出为 6 列。

Try this:

尝试这个：

import csv
import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randint(0,3, size = (10,20)))

def fake_alleles(x):
    if x==0:
        return "A\tA"
    if x==1:
        return "A\tG"
    if x==2:
        return "G\tG"

plinkpast6 = df.iloc[:,:3].applymap(fake_alleles)
plinkpast6 = plinkpast6.stack().str.split('\t', expand=True).unstack()
plinkpast6.to_csv("test.ped", sep="\t", quoting=csv.QUOTE_NONE)

pandas：用不带引号的文字制表符编写制表符分隔的数据框

提问by Dima Lituiev

回答by piRSquared

相关推荐

最近更新

标签

pandas：用不带引号的文字制表符编写制表符分隔的数据框

提问by Dima Lituiev

回答by piRSquared

相关推荐

Pandas 数据透视表百分比计算

使用其他行中的值将函数应用于 Pandas 数据帧行

使用枢轴的 Pandas KeyError

# pandas DataFrame ValueError: 传递值的形状是 (1, 3)，索引意味着 (3, 3)

相关推荐

最近更新

标签