Pandas:在 ID 上拆分数据帧并使用生成的文件名写入 csv
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/26103676/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas: split dataframe on ID and write to csv with generated filenames
提问by user2165857
I have a pandas dataframe I would like to iterate over. For instance a simplified version of my dataframe can be:
我有一个想要迭代的Pandas数据框。例如,我的数据框的简化版本可以是:
chr start end Gene Value MoreData
chr1 123 123 HAPPY 41.1 3.4
chr1 125 129 HAPPY 45.9 4.5
chr1 140 145 HAPPY 39.3 4.1
chr1 342 355 SAD 34.2 9.0
chr1 360 361 SAD 44.3 8.1
chr1 390 399 SAD 29.0 7.2
chr1 400 411 SAD 35.6 6.5
chr1 462 470 LEG 20.0 2.7
I would like to iterate over each unique gene and create a new file named:
我想遍历每个独特的基因并创建一个名为:
for Gene in df: ## this is where I need the most help
OutFileName = Gene+".pdf"
For the above example I should get three iterations with 3 outfiles and 3 dataframes:
对于上面的例子,我应该得到 3 个输出文件和 3 个数据帧的三个迭代:
HAPPY.pdf
快乐.pdf
chr1 123 123 HAPPY 41.1 3.4
chr1 125 129 HAPPY 45.9 4.5
chr1 140 145 HAPPY 39.3 4.1
SAD.pdf
悲伤.pdf
chr1 342 355 SAD 34.2 9.0
chr1 360 361 SAD 44.3 8.1
chr1 390 399 SAD 29.0 7.2
chr1 400 411 SAD 35.6 6.5
Leg.pdf
腿.pdf
chr1 462 470 LEG 20.0 2.7
the resulting data frame contents split up by chunks will be sent to another function that will perform the analysis and return the contents to be written to file.
由块分割的结果数据帧内容将被发送到另一个函数,该函数将执行分析并返回要写入文件的内容。
回答by EdChum
You can obtain the unique values calling unique, iterate over this, build the filename and write this out to csv:
您可以获得调用 的唯一值unique,对其进行迭代,构建文件名并将其写入 csv:
In [78]:
genes = df['Gene'].unique()
for gene in genes:
outfilename = gene + '.pdf'
print(outfilename)
df[df['Gene'] == gene].to_csv(outfilename)
HAPPY.pdf
SAD.pdf
LEG.pdf
A more pandas-thonic method is to groupby on 'Gene' and then iterate over the groups:
一种更像Pandas的方法是对“基因”进行分组,然后迭代这些组:
In [93]:
gp = df.groupby('Gene')
# groups() returns a dict with 'Gene':indices as k:v pair
for g in gp.groups.items():
print(df.loc[g[1]])
chr start end Gene Value MoreData
0 chr1 123 123 HAPPY 41.1 3.4
1 chr1 125 129 HAPPY 45.9 4.5
2 chr1 140 145 HAPPY 39.3 4.1
chr start end Gene Value MoreData
3 chr1 342 355 SAD 34.2 9.0
4 chr1 360 361 SAD 44.3 8.1
5 chr1 390 399 SAD 29.0 7.2
6 chr1 400 411 SAD 35.6 6.5
chr start end Gene Value MoreData
7 chr1 462 470 LEG 20 2.7

