Python pandas.Series.apply 中的访问索引
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18316211/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Access index in pandas.Series.apply
提问by elyase
Lets say I have a MultiIndex Series s:
假设我有一个 MultiIndex 系列s:
>>> s
values
a b
1 2 0.1
3 6 0.3
4 4 0.7
and I want to apply a function which uses the index of the row:
我想应用一个使用行索引的函数:
def f(x):
# conditions or computations using the indexes
if x.index[0] and ...:
other = sum(x.index) + ...
return something
How can I do s.apply(f)for such a function? What is the recommended way to make this kind of operations? I expect to obtain a new Series with the values resulting from this function applied on each row and the same MultiIndex.
我该怎么做s.apply(f)这样的功能?进行此类操作的推荐方法是什么?我希望获得一个新的系列,该系列的值将此函数应用于每一行和相同的 MultiIndex。
采纳答案by Dan Allan
I don't believe applyhas access to the index; it treats each row as a numpy object, not a Series, as you can see:
我不相信apply可以访问索引;它将每一行视为一个 numpy 对象,而不是一个系列,如您所见:
In [27]: s.apply(lambda x: type(x))
Out[27]:
a b
1 2 <type 'numpy.float64'>
3 6 <type 'numpy.float64'>
4 4 <type 'numpy.float64'>
To get around this limitation, promote the indexes to columns, apply your function, and recreate a Series with the original index.
要解决此限制,请将索引提升到列,应用您的函数,然后使用原始索引重新创建一个系列。
Series(s.reset_index().apply(f, axis=1).values, index=s.index)
Other approaches might use s.get_level_values, which often gets a little ugly in my opinion, or s.iterrows(), which is likely to be slower -- perhaps depending on exactly what fdoes.
其他方法可能会使用s.get_level_values,在我看来,这通常会变得有点难看,或者s.iterrows(),这可能会更慢——也许取决于具体是什么f。
回答by Jeff
Make it a frame, return scalars if you want (so the result is a series)
使它成为一个框架,如果需要,返回标量(因此结果是一个系列)
Setup
设置
In [11]: s = Series([1,2,3],dtype='float64',index=['a','b','c'])
In [12]: s
Out[12]:
a 1
b 2
c 3
dtype: float64
Printing function
打印功能
In [13]: def f(x):
print type(x), x
return x
....:
In [14]: pd.DataFrame(s).apply(f)
<class 'pandas.core.series.Series'> a 1
b 2
c 3
Name: 0, dtype: float64
<class 'pandas.core.series.Series'> a 1
b 2
c 3
Name: 0, dtype: float64
Out[14]:
0
a 1
b 2
c 3
Since you can return anything here, just return the scalars (access the index via the nameattribute)
由于您可以在此处返回任何内容,因此只需返回标量(通过name属性访问索引)
In [15]: pd.DataFrame(s).apply(lambda x: 5 if x.name == 'a' else x[0] ,1)
Out[15]:
a 5
b 2
c 3
dtype: float64
回答by Andy Hayden
You mayfind it faster to use whererather than applyhere:
您可能会发现它使用起来where比apply这里更快:
In [11]: s = pd.Series([1., 2., 3.], index=['a' ,'b', 'c'])
In [12]: s.where(s.index != 'a', 5)
Out[12]:
a 5
b 2
c 3
dtype: float64
Also you can use numpy-style logic/functions to any of the parts:
您也可以对任何部分使用 numpy 风格的逻辑/函数:
In [13]: (2 * s + 1).where((s.index == 'b') | (s.index == 'c'), -s)
Out[13]:
a -1
b 5
c 7
dtype: float64
In [14]: (2 * s + 1).where(s.index != 'a', -s)
Out[14]:
a -1
b 5
c 7
dtype: float64
I recommend testing for speed (as efficiency against apply will depend on the function). Although, I find that applys are more readable...
我建议测试速度(因为应用的效率取决于功能)。虽然,我发现applys 更具可读性......
回答by Vladimir Leontiev
You can access the whole row as argument inside the fucntion if you use DataFrame.apply() instead of Series.apply().
如果您使用 DataFrame.apply() 而不是 Series.apply(),您可以访问整行作为函数内部的参数。
def f1(row):
if row['I'] < 0.5:
return 0
else:
return 1
def f2(row):
if row['N1']==1:
return 0
else:
return 1
import pandas as pd
import numpy as np
df4 = pd.DataFrame(np.random.rand(6,1), columns=list('I'))
df4['N1']=df4.apply(f1, axis=1)
df4['N2']=df4.apply(f2, axis=1)
回答by nehz
Convert to DataFrameand apply along row. You can access the index as x.name. xis also a Seriesnow with 1 value
转换为DataFrame并沿行应用。您可以将索引作为x.name. x也是Series1 个值的now
s.to_frame(0).apply(f, axis=1)[0]
回答by waterproof
Use reset_index()to convert the Series to a DataFrame and the index to a column, and then applyyour function to the DataFrame.
使用reset_index()该系列转换为数据框和索引列,然后apply你的函数的数据帧。
The tricky part is knowing how reset_index()names the columns, so here are a couple of examples.
棘手的部分是知道如何reset_index()命名列,所以这里有几个例子。
With a Singly Indexed Series
使用单索引系列
s=pd.Series({'idx1': 'val1', 'idx2': 'val2'})
def use_index_and_value(row):
return 'I made this with index {} and value {}'.format(row['index'], row[0])
s2 = s.reset_index().apply(use_index_and_value, axis=1)
# The new Series has an auto-index;
# You'll want to replace that with the index from the original Series
s2.index = s.index
s2
Output:
输出:
idx1 I made this with index idx1 and value val1
idx2 I made this with index idx2 and value val2
dtype: object
With a Multi-Indexed Series
使用多索引系列
Same concept here, but you'll need to access the index values as row['level_*']because that's where they're placed by Series.reset_index().
这里的概念相同,但您需要访问索引值,row['level_*']因为它们是由Series.reset_index().
s=pd.Series({
('idx(0,0)', 'idx(0,1)'): 'val1',
('idx(1,0)', 'idx(1,1)'): 'val2'
})
def use_index_and_value(row):
return 'made with index: {},{} & value: {}'.format(
row['level_0'],
row['level_1'],
row[0]
)
s2 = s.reset_index().apply(use_index_and_value, axis=1)
# Replace auto index with the index from the original Series
s2.index = s.index
s2
Output:
输出:
idx(0,0) idx(0,1) made with index: idx(0,0),idx(0,1) & value: val1
idx(1,0) idx(1,1) made with index: idx(1,0),idx(1,1) & value: val2
dtype: object
If your series or indexes have names, you will need to adjust accordingly.
如果您的系列或索引有名称,则需要相应地进行调整。

