Pandas drop 函数：不可对齐的布尔系列

Question

提问by fred

I have two DataFrames. The first df0:

我有两个数据帧。第一个 df0：

Name       CHR  MAPINFO     PMG         APA 
cg13869341  1   15865   0.8954256   0.8409144
cg14008030  1   18827   0.5941512   0.712414
cg12045430  1   29407   0.1110794   0.1302404
cg20826792  1   29425   0.177532    0.1304049
cg00381604  1   29435   0.09003246  0.04180672
cg20253340  1   68849   0.4738799   0.444899

end the second df1:

结束第二个 df1：

probe   Chromosome  Gstart  Gend
A_23_P11744     1   4363    39806
A_33_P3365932   1   4363    39806
A_32_P923011    1   24554   46081

I would like to iterate over df0["MAPINFO"] and drop rows that don't match condition and append the means to another df. My code is as followed:

我想遍历 df0["MAPINFO"] 并删除不匹配条件的行并将平均值附加到另一个 df。我的代码如下：

for pos in df0['MAPINFO']:
    cond = (( pos < df1['Gstart']) & ( pos > df1['Gend']))
    print df0.drop(df0[cond].index.values).mean(axis=0, skipna=True, level=None)

which gives the following error message:

这给出了以下错误消息：

/usr/lib64/python2.7/site-packages/pandas-0.12.0-py2.7-linux-x86_64.egg/pandas/core/frame.py:2021: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
"DataFrame index.", UserWarning)
Traceback (most recent call last):
 File "/home/ferreirafm/bin/cpg_means.py", line 239, in <module>
main()
File "/home/ferreirafm/bin/cpg_means.py", line 231, in main
import2df(infprobe, infchrom)
File "/home/ferreirafm/bin/cpg_means.py", line 20, in import2df
df0.drop(df0[cond].index.values)#.mean(axis=0, skipna=True, level=None)
File "/usr/lib64/python2.7/site-packages/pandas-0.12.0-py2.7-linux-x86_64.egg/pandas/core/frame.py", line 1995, in __getitem__
return self._getitem_array(key)
File "/usr/lib64/python2.7/site-packages/pandas-0.12.0-py2.7-linux-x86_64.egg/pandas/core/frame.py", line 2027, in _getitem_array
key = _check_bool_indexer(self.index, key)
File "/usr/lib64/python2.7/site-packages/pandas-0.12.0-py2.7-linux-x86_64.egg/pandas/core/indexing.py", line 1017, in _check_bool_indexer
raise IndexingError('Unalignable boolean Series key provided')
pandas.core.indexing.IndexingError: Unalignable boolean Series key provided

I'm almost sure that such piece of code used to work in previous version of Pandas. However, I can't figure out whats going wrong. Any help is appreciated.

我几乎可以肯定这段代码曾经在 Pandas 的早期版本中工作过。但是，我无法弄清楚出了什么问题。任何帮助表示赞赏。

Expected results: Observe that the last row of df0 is gonna be dropped as df1 'MAPINFO' of the first line (15865) is outside the df1 range Gstart and Gend. So, the results is gonna be the means by columns of the non-dropped lines from df0 (means of PGM and APA). That is, the resulting df will be:

预期结果：观察到 df0 的最后一行将被删除，因为第一行 (15865) 的 df1 'MAPINFO' 在 df1 范围 Gstart 和 Gend 之外。因此，结果将是来自 df0（PGM 和 APA 的平均值）的非删除线的列的平均值。也就是说，生成的 df 将是：

Name       CHR  MAPINFO     PMG         APA 
cg13869341  1   15865   0.8954256   0.8409144
cg14008030  1   18827   0.5941512   0.712414
cg12045430  1   29407   0.1110794   0.1302404
cg20826792  1   29425   0.177532    0.1304049
cg00381604  1   29435   0.09003246  0.04180672

The last row from df0 "cg20253340 1 68849 0.4738799 0.444899" is removed and the means by row is taken.

df0 "cg20253340 1 68849 0.4738799 0.444899" 的最后一行被删除，并采用逐行的方法。

Answer 1

采纳答案by lowtech

My solution would be to make bool index which implements inclusion criteria then just use it:

我的解决方案是制作实现包含标准的 bool 索引，然后使用它：

import pandas as pd

df0 = pd.DataFrame.from_records([["cg13869341", 1, 15865, 0.8954256, 0.8409144],
                                 ["cg14008030", 1, 18827, 0.5941512, 0.712414],
                                 ["cg12045430", 1, 29407, 0.1110794, 0.1302404],
                                 ["cg20826792", 1, 29425, 0.177532, 0.1304049],
                                 ["cg00381604", 1, 29435, 0.09003246, 0.04180672],
                                 ["cg20253340", 1, 68849, 0.4738799, 0.444899]],
                                columns = ["Name", "CHR", "MAPINFO", "PMG", "APA"])

df1 = pd.DataFrame.from_records([["A_23_P11744", 1, 4363, 39806],
                                 ["A_33_P3365932", 1, 4363, 39806],
                                 ["A_32_P923011", 1, 24554, 46081]],
                                columns = ["probe", "Chromosome", "Gstart", "Gend"])

F = df0.MAPINFO.apply(lambda x: ((df1.Gstart <= x) & (x <= df1.Gend)).any())
print df0[F] ## as you exepected

# mean by rows
res = df0[F]
res['mean'] = df0[F][['PMG', 'APA']].mean(1)
print res

# mean by columns
print df0[F][['PMG', 'APA']].mean(0)

Pandas drop 函数：不可对齐的布尔系列

提问by fred

采纳答案by lowtech

相关推荐

最近更新

标签

Pandas drop 函数：不可对齐的布尔系列

提问by fred

采纳答案by lowtech

相关推荐

在 Pandas 中连接列作为索引

计算不包含一些字符串 Pandas DataFrames 的行

pandas 如何一次性删除多列

pandas 为什么在使用 matplotlib 绘制熊猫数据框时出现 KeyError？

相关推荐

最近更新

标签