Pandas:在列的应用函数中使用索引值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39151978/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:53:48  来源:igfitidea点击:

Pandas: Use index value in apply function over a column

pythonpandasdataframeapply

提问by Fizi

I was wondering if there's a way to use index value while using apply over a column of the dataframe. Suppose I have df like:

我想知道是否有办法在对数据框的列使用应用时使用索引值。假设我有 df 像:

  col1  col2
0    a    [0,1,2]
1    b    [0,2]
2    c    [0,1,2]

I want to write an apply function on df.col2 such that it removes the index values from the list in col2 leaving a df like:

我想在 df.col2 上编写一个应用函数,以便它从 col2 中的列表中删除索引值,留下一个 df 像:

  col1  col2
0    a    [1,2]
1    b    [0,2]
2    c    [0,1]

The index value may or may not be in the list. But if it does exist in the list it should be removed.Note that this isn't the actual use-case but similar to what I need. I have

索引值可能在也可能不在列表中。但如果它确实存在于列表中,则应将其删除。请注意,这不是实际用例,但与我需要的类似。我有

df.col2.apply(lambda x: f(x))

and in the f(x) I want to be able to access the index value of x if its possible or a workaround. I know that df.apply() can work on the column values and df.index.map() can work on the index. Is there a method in Pandas that combines the use-cases of the two in one single elegant solution. Thanks for the help.

在 f(x) 中,如果可能或解决方法,我希望能够访问 x 的索引值。我知道 df.apply() 可以处理列值,而 df.index.map() 可以处理索引。Pandas 中是否有一种方法可以将两者的用例结合到一个优雅的解决方案中。谢谢您的帮助。

UPDATE: the index is an integer value and it will be constrained in such a way that its consecutive whole numbers. The col2 will have a list for each index. I want to check if the index is in that list and remove it from the list if it exists. So lets say for row index 3 we have list [27,36,3,9,7]. I want to drop 3 from the list. The list is unordered

更新:索引是一个整数值,它将以连续整数的方式受到约束。col2 将为每个索引提供一个列表。我想检查索引是否在该列表中,如果存在,则将其从列表中删除。因此,对于行索引 3,我们有列表 [27,36,3,9,7]。我想从列表中删除 3 个。该列表是无序的

回答by fuglede

If I understand your question correctly, this should do the job:

如果我正确理解您的问题,这应该可以完成工作:

df.apply(lambda x: x.name in x.col2 and x.col2.remove(x.name), axis=1)

With the example from the original post:

使用原始帖子中的示例:

In [226]: df
Out[226]: 
  col1       col2
0    a  [0, 1, 2]
1    b     [0, 2]
2    c  [0, 1, 2]

In [227]: df.apply(lambda x: x.name in x.col2 and x.col2.remove(x.name), axis=1);

In [228]: df
Out[228]: 
  col1    col2
0    a  [1, 2]
1    b  [0, 2]
2    c  [0, 1]

回答by piRSquared

def name_drop(x):
    x_ = x.drop('col2')
    _x = pd.Series(x.col2)
    _x = _x[_x != x.name].tolist()
    x = x_.append(pd.Series({'col2': _x}))
    return x

df.apply(name_drop, axis=1)

enter image description here

在此处输入图片说明

回答by Siraj S.

maybe you can try this, this will not delete the index values from the list but will replace it with 'nan'

也许你可以试试这个,这不会从列表中删除索引值,而是用“nan”替换它

df = pd.DataFrame({'a':list('mno'),'b':[[1,2,3],[1,3,4],[5,6,2]]})
df1 = df.b.apply(pd.Series)
df['b'] = np.array(df1[df1.apply(lambda x: x!=df.index.values)]).tolist()

Out[111]: a b 0 m [1.0, 2.0, 3.0] 1 n [nan, 3.0, 4.0] 2 o [5.0, 6.0, nan]

Out[111]: a b 0 m [1.0, 2.0, 3.0] 1 n [nan, 3.0, 4.0] 2 o [5.0, 6.0, nan]