pandas 在熊猫数据框中的每一行中查找非零值的列索引集

Question

提问by Qiang Li

Is there a good way to find the set of column indices for non-zero values in each row in pandas' data frame? Do I have to traverse the data frame row-by-row?

有没有一种好方法可以在Pandas数据框中的每一行中找到非零值的列索引集？我是否必须逐行遍历数据框？

For example, the data frame is

例如，数据框是

c1  c2  c3  c4 c5 c6 c7 c8  c9
 1   1   0   0  0  0  0  0   0
 1   0   0   0  0  0  0  0   0
 0   1   0   0  0  0  0  0   0
 1   0   0   0  0  0  0  0   0
 0   1   0   0  0  0  0  0   0
 0   0   0   0  0  0  0  0   0
 0   2   1   1  1  1  1  0   2
 1   5   5   0  0  1  0  4   6
 4   3   0   1  1  1  1  5  10
 3   5   2   4  1  2  2  1   3
 6   4   0   1  0  0  0  0   0
 3   9   1   0  1  0  2  1   0

The output is expected to be

输出预计为

['c1','c2']
['c1']
['c2']
...

Answer 1

采纳答案by Younggun Kim

It seems you have to traverse the DataFrame by row.

看来您必须逐行遍历 DataFrame。

cols = df.columns
bt = df.apply(lambda x: x > 0)
bt.apply(lambda x: list(cols[x.values]), axis=1)

and you will get:

你会得到：

0                                 [c1, c2]
1                                     [c1]
2                                     [c2]
3                                     [c1]
4                                     [c2]
5                                       []
6             [c2, c3, c4, c5, c6, c7, c9]
7                 [c1, c2, c3, c6, c8, c9]
8         [c1, c2, c4, c5, c6, c7, c8, c9]
9     [c1, c2, c3, c4, c5, c6, c7, c8, c9]
10                            [c1, c2, c4]
11                [c1, c2, c3, c5, c7, c8]
dtype: object

If performance is matter, try to pass raw=Trueto boolean DataFrame creation like below:

如果性能很重要，请尝试传递raw=True给布尔数据帧创建，如下所示：

%timeit df.apply(lambda x: x > 0, raw=True).apply(lambda x: list(cols[x.values]), axis=1)
1000 loops, best of 3: 812 μs per loop

It brings you a better performance gain. Following is raw=False(which is default) result:

它为您带来更好的性能增益。以下是raw=False（这是默认的）结果：

%timeit df.apply(lambda x: x > 0).apply(lambda x: list(cols[x.values]), axis=1)
100 loops, best of 3: 2.59 ms per loop

Answer 2

回答by Dickster

How about this approach?

这种方法怎么样？

#create a True / False data frame
df_boolean = df>0

#a little helper method that uses boolean slicing internally 
def bar(x,columns):
    return ','.join(list(columns[x]))

#use an apply along the column axis
df_boolean['result'] = df_boolean.apply(lambda x: bar(x,df_boolean.columns),axis=1)

# filter out the empty "rows" adn grab the result column
df_result =  df_boolean[df_boolean['result'] != '']['result']

#append an axis, just so each line will will output a list 
lst_result = df_result.values[:,np.newaxis]

print '\n'.join([ str(myelement) for myelement in lst_result])

and this produces:

这会产生：

['c1,c2']
['c1']
['c2']
['c1']
['c2']
['c2,c3,c4,c5,c6,c7,c9']
['c1,c2,c3,c6,c8,c9']
['c1,c2,c4,c5,c6,c7,c8,c9']
['c1,c2,c3,c4,c5,c6,c7,c8,c9']
['c1,c2,c4']
['c1,c2,c3,c5,c7,c8']

Answer 3

回答by Andy Hayden

Potentially a better data structure (rather than a Series of lists) is to stack:

潜在更好的数据结构（而不是一系列列表）是堆栈：

In [11]: res = df[df!=0].stack()

In [12]: res
Out[12]:
0   c1     1
    c2     1
1   c1     1
2   c2     1
3   c1     1
...

And you can iterate over the original rows:

您可以遍历原始行：

In [13]: res.loc[0]
Out[13]:
c1    1
c2    1
dtype: float64

In [14]: res.loc[0].index
Out[14]: Index(['c1', 'c2'], dtype='object')

Note: I thought you used to be able to return a list in an apply (to create a DataFrame which has list elements) this no longer appears to be the case.

注意：我认为您曾经能够在应用程序中返回一个列表（以创建一个具有列表元素的 DataFrame），但现在似乎不再如此。

pandas 在熊猫数据框中的每一行中查找非零值的列索引集

提问by Qiang Li

采纳答案by Younggun Kim

回答by Dickster

回答by Andy Hayden

相关推荐

最近更新

标签

pandas 在熊猫数据框中的每一行中查找非零值的列索引集

提问by Qiang Li

采纳答案by Younggun Kim

回答by Dickster

回答by Andy Hayden

相关推荐

pandas 在熊猫中分配线条颜色

pandas Python 熊猫相关 corr() TypeError：无法将 ['pearson'] 与块值进行比较

Python Pandas 在循环中创建新列

保留列顺序 - Python Pandas 和 Column Concat

相关推荐

最近更新

标签