遍历 Dataframes Pandas 列表

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/21169362/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:34:50  来源:igfitidea点击:

Iterating through list of Dataframes Pandas

listpandas

提问by user2587593

I currently have a series of 18 DataFrames (each representing a different year) consisting of 3 Columns and varying amounts of rows representing the normalize mutual information scores for amino acid residue positions like:

我目前有一系列 18 个数据帧(每个代表不同的年份),由 3 列和不同数量的行组成,代表氨基酸残基位置的标准化互信息分数,例如:

Year1

第一年

Pos1   Pos2   MI_Score
40     40     1.00    
40     44     0.53
40     70     0.23
44     44     1.00    
44     70     0.90
...

I would like to iterate through this list of DataFrames and trim off the rows that have Mutual Information scores less than 0.50 as well as the ones that are mutual information scores for a residue paired with itself. Here is what I've tried so far:

我想遍历这个 DataFrame 列表,并修剪掉互信息分数小于 0.50 的行以及与自身配对的残基的互信息分数的行。这是我迄今为止尝试过的:

MIs = [MI_95,MI_96,MI_97,MI_98,MI_99,MI_00,MI_01,MI_02,MI_03,MI_04,MI_05,MI_06,MI_07,MI_08,MI_09,MI_10,MI_11,MI_12,MI_13] 
for MI in MIs:    
    p = []
    for q in range(0, len(MI)):
        if MI[0][q] != MI[1][q]:
            if MI[2][q] > 0.5:
                p.append([MI[0][q],MI[1][q],MI[2][q]])
    MI = pd.DataFrame(p) 

Yet this only trims the first item in MIs. Can someone help me find a way to iterate through the whole list and trim each dataframe?

然而,这只会修剪 MI 中的第一项。有人可以帮我找到一种方法来遍历整个列表并修剪每个数据框吗?

Thanks

谢谢

回答by Dan Allan

Avoid loops where possible. They are much slower, and usually less immediately easy to read, than "vectorized" methods that operate on all the data together. Here's one way.

尽可能避免循环。与同时对所有数据进行操作的“矢量化”方法相比,它们要慢得多,而且通常不太容易阅读。这是一种方法。

In [17]: self_paired = df['Pos1'] == df['Pos2']

In [18]: low_MI = df['MI_Score'] < 0.50

In [19]: df[~(low_MI | self_paired)]
Out[19]:
   Pos1  Pos2  MI_Score
1    40    44      0.53
4    44    70      0.90

[2 rows x 3 columns]