Pandas：在 ID 上拆分数据帧并使用生成的文件名写入 csv

Question

提问by user2165857

I have a pandas dataframe I would like to iterate over. For instance a simplified version of my dataframe can be:

我有一个想要迭代的Pandas数据框。例如，我的数据框的简化版本可以是：

chr    start    end    Gene    Value   MoreData
chr1    123    123    HAPPY    41.1    3.4
chr1    125    129    HAPPY    45.9    4.5
chr1    140    145    HAPPY    39.3   4.1
chr1    342    355    SAD    34.2    9.0
chr1    360    361    SAD    44.3    8.1
chr1    390    399    SAD    29.0   7.2
chr1    400    411    SAD    35.6   6.5
chr1    462    470    LEG    20.0    2.7

I would like to iterate over each unique gene and create a new file named:

我想遍历每个独特的基因并创建一个名为：

for Gene in df: ## this is where I need the most help

    OutFileName = Gene+".pdf"

For the above example I should get three iterations with 3 outfiles and 3 dataframes:

对于上面的例子，我应该得到 3 个输出文件和 3 个数据帧的三个迭代：

HAPPY.pdf

快乐.pdf

chr1    123    123    HAPPY    41.1    3.4 
chr1    125    129    HAPPY    45.9    4.5 
chr1    140    145    HAPPY    39.3   4.1

SAD.pdf

悲伤.pdf

chr1    342    355    SAD    34.2    9.0 
chr1    360    361    SAD  44.3    8.1 
chr1    390    399    SAD    29.0   7.2 
chr1    400    411    SAD    35.6   6.5

Leg.pdf

腿.pdf

chr1    462    470    LEG    20.0    2.7

the resulting data frame contents split up by chunks will be sent to another function that will perform the analysis and return the contents to be written to file.

由块分割的结果数据帧内容将被发送到另一个函数，该函数将执行分析并返回要写入文件的内容。

Answer 1

回答by EdChum

You can obtain the unique values calling unique, iterate over this, build the filename and write this out to csv:

您可以获得调用的唯一值unique，对其进行迭代，构建文件名并将其写入 csv：

In [78]:
genes = df['Gene'].unique()
for gene in genes:
    outfilename = gene + '.pdf'
    print(outfilename)
    df[df['Gene'] == gene].to_csv(outfilename)
HAPPY.pdf
SAD.pdf
LEG.pdf

A more pandas-thonic method is to groupby on 'Gene' and then iterate over the groups:

一种更像Pandas的方法是对“基因”进行分组，然后迭代这些组：

In [93]:

gp = df.groupby('Gene')
# groups() returns a dict with 'Gene':indices as k:v pair
for g in gp.groups.items():
    print(df.loc[g[1]])   

    chr  start  end   Gene  Value  MoreData
0  chr1    123  123  HAPPY   41.1       3.4
1  chr1    125  129  HAPPY   45.9       4.5
2  chr1    140  145  HAPPY   39.3       4.1
    chr  start  end Gene  Value  MoreData
3  chr1    342  355  SAD   34.2       9.0
4  chr1    360  361  SAD   44.3       8.1
5  chr1    390  399  SAD   29.0       7.2
6  chr1    400  411  SAD   35.6       6.5
    chr  start  end Gene  Value  MoreData
7  chr1    462  470  LEG     20       2.7

Pandas：在 ID 上拆分数据帧并使用生成的文件名写入 csv

提问by user2165857

回答by EdChum

相关推荐

最近更新

标签

Pandas：在 ID 上拆分数据帧并使用生成的文件名写入 csv

提问by user2165857

回答by EdChum

相关推荐

pandas 对熊猫数据框中的每一行进行排序的最快方法

在 Pandas 中将相同键的字典加入数据框

Pandas concat ValueError：缓冲区数据类型不匹配，预期为“Python 对象”但得到“长长”

Python：在多张工作表上将 Pandas DataFrame 写入 Excel 的最快方法

相关推荐

最近更新

标签