Python pandas.Series.apply 中的访问索引

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18316211/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 10:27:48  来源:igfitidea点击:

Access index in pandas.Series.apply

pythonpandas

提问by elyase

Lets say I have a MultiIndex Series s:

假设我有一个 MultiIndex 系列s

>>> s
     values
a b
1 2  0.1 
3 6  0.3
4 4  0.7

and I want to apply a function which uses the index of the row:

我想应用一个使用行索引的函数:

def f(x):
   # conditions or computations using the indexes
   if x.index[0] and ...: 
   other = sum(x.index) + ...
   return something

How can I do s.apply(f)for such a function? What is the recommended way to make this kind of operations? I expect to obtain a new Series with the values resulting from this function applied on each row and the same MultiIndex.

我该怎么做s.apply(f)这样的功能?进行此类操作的推荐方法是什么?我希望获得一个新的系列,该系列的值将此函数应用于每一行和相同的 MultiIndex。

采纳答案by Dan Allan

I don't believe applyhas access to the index; it treats each row as a numpy object, not a Series, as you can see:

我不相信apply可以访问索引;它将每一行视为一个 numpy 对象,而不是一个系列,如您所见:

In [27]: s.apply(lambda x: type(x))
Out[27]: 
a  b
1  2    <type 'numpy.float64'>
3  6    <type 'numpy.float64'>
4  4    <type 'numpy.float64'>

To get around this limitation, promote the indexes to columns, apply your function, and recreate a Series with the original index.

要解决此限制,请将索引提升到列,应用您的函数,然后使用原始索引重新创建一个系列。

Series(s.reset_index().apply(f, axis=1).values, index=s.index)

Other approaches might use s.get_level_values, which often gets a little ugly in my opinion, or s.iterrows(), which is likely to be slower -- perhaps depending on exactly what fdoes.

其他方法可能会使用s.get_level_values,在我看来,这通常会变得有点难看,或者s.iterrows(),这可能会更慢——也许取决于具体是什么f

回答by Jeff

Make it a frame, return scalars if you want (so the result is a series)

使它成为一个框架,如果需要,返回标量(因此结果是一个系列)

Setup

设置

In [11]: s = Series([1,2,3],dtype='float64',index=['a','b','c'])

In [12]: s
Out[12]: 
a    1
b    2
c    3
dtype: float64

Printing function

打印功能

In [13]: def f(x):
    print type(x), x
    return x
   ....: 

In [14]: pd.DataFrame(s).apply(f)
<class 'pandas.core.series.Series'> a    1
b    2
c    3
Name: 0, dtype: float64
<class 'pandas.core.series.Series'> a    1
b    2
c    3
Name: 0, dtype: float64
Out[14]: 
   0
a  1
b  2
c  3

Since you can return anything here, just return the scalars (access the index via the nameattribute)

由于您可以在此处返回任何内容,因此只需返回标量(通过name属性访问索引)

In [15]: pd.DataFrame(s).apply(lambda x: 5 if x.name == 'a' else x[0] ,1)
Out[15]: 
a    5
b    2
c    3
dtype: float64

回答by Andy Hayden

You mayfind it faster to use whererather than applyhere:

可能会发现它使用起来whereapply这里更快:

In [11]: s = pd.Series([1., 2., 3.], index=['a' ,'b', 'c'])

In [12]: s.where(s.index != 'a', 5)
Out[12]: 
a    5
b    2
c    3
dtype: float64

Also you can use numpy-style logic/functions to any of the parts:

您也可以对任何部分使用 numpy 风格的逻辑/函数:

In [13]: (2 * s + 1).where((s.index == 'b') | (s.index == 'c'), -s)
Out[13]: 
a   -1
b    5
c    7
dtype: float64

In [14]: (2 * s + 1).where(s.index != 'a', -s)
Out[14]: 
a   -1
b    5
c    7
dtype: float64

I recommend testing for speed (as efficiency against apply will depend on the function). Although, I find that applys are more readable...

我建议测试速度(因为应用的效率取决于功能)。虽然,我发现applys 更具可读性......

回答by Vladimir Leontiev

You can access the whole row as argument inside the fucntion if you use DataFrame.apply() instead of Series.apply().

如果您使用 DataFrame.apply() 而不是 Series.apply(),您可以访问整行作为函数内部的参数。

def f1(row):
    if row['I'] < 0.5:
        return 0
    else:
        return 1

def f2(row):
    if row['N1']==1:
        return 0
    else:
        return 1

import pandas as pd
import numpy as np
df4 = pd.DataFrame(np.random.rand(6,1), columns=list('I'))
df4['N1']=df4.apply(f1, axis=1)
df4['N2']=df4.apply(f2, axis=1)

回答by nehz

Convert to DataFrameand apply along row. You can access the index as x.name. xis also a Seriesnow with 1 value

转换为DataFrame并沿行应用。您可以将索引作为x.name. x也是Series1 个值的now

s.to_frame(0).apply(f, axis=1)[0]

回答by waterproof

Use reset_index()to convert the Series to a DataFrame and the index to a column, and then applyyour function to the DataFrame.

使用reset_index()该系列转换为数据框和索引列,然后apply你的函数的数据帧。

The tricky part is knowing how reset_index()names the columns, so here are a couple of examples.

棘手的部分是知道如何reset_index()命名列,所以这里有几个例子。

With a Singly Indexed Series

使用单索引系列

s=pd.Series({'idx1': 'val1', 'idx2': 'val2'})

def use_index_and_value(row):
    return 'I made this with index {} and value {}'.format(row['index'], row[0])

s2 = s.reset_index().apply(use_index_and_value, axis=1)

# The new Series has an auto-index;
# You'll want to replace that with the index from the original Series
s2.index = s.index
s2

Output:

输出:

idx1    I made this with index idx1 and value val1
idx2    I made this with index idx2 and value val2
dtype: object

With a Multi-Indexed Series

使用多索引系列

Same concept here, but you'll need to access the index values as row['level_*']because that's where they're placed by Series.reset_index().

这里的概念相同,但您需要访问索引值,row['level_*']因为它们是由Series.reset_index().

s=pd.Series({
    ('idx(0,0)', 'idx(0,1)'): 'val1',
    ('idx(1,0)', 'idx(1,1)'): 'val2'
})

def use_index_and_value(row):
    return 'made with index: {},{} & value: {}'.format(
        row['level_0'],
        row['level_1'],
        row[0]
    )

s2 = s.reset_index().apply(use_index_and_value, axis=1)

# Replace auto index with the index from the original Series
s2.index = s.index
s2

Output:

输出:

idx(0,0)  idx(0,1)    made with index: idx(0,0),idx(0,1) & value: val1
idx(1,0)  idx(1,1)    made with index: idx(1,0),idx(1,1) & value: val2
dtype: object

If your series or indexes have names, you will need to adjust accordingly.

如果您的系列或索引有名称,则需要相应地进行调整。