Python 在 DataFrame 索引上应用函数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/20025325/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Apply Function on DataFrame Index
提问by Alex Rothberg
What is the best way to apply a function over the index of a Pandas DataFrame?
Currently I am using this verbose approach:
在 Pandas 的索引上应用函数的最佳方法是什么DataFrame?目前我正在使用这种冗长的方法:
pd.DataFrame({"Month": df.reset_index().Date.apply(foo)})
where Dateis the name of the index and foois the name of the function that I am applying.
其中Date是索引foo的名称,是我正在应用的函数的名称。
回答by firelynx
As already suggested by HYRY in the comments, Series.mapis the way to go here. Just set the index to the resulting series.
正如 HYRY 在评论中所建议的那样,Series.map是通往这里的方式。只需将索引设置为结果系列。
Simple example:
简单的例子:
df = pd.DataFrame({'d': [1, 2, 3]}, index=['FOO', 'BAR', 'BAZ'])
df
d
FOO 1
BAR 2
BAZ 3
df.index = df.index.map(str.lower)
df
d
foo 1
bar 2
baz 3
Index != Series
索引 != 系列
As pointed out by @OP. the df.index.map(str.lower)call returns a numpy array.
This is because dataframe indices arebased on numpy arrays, not Series.
正如@OP所指出的那样。该df.index.map(str.lower)调用返回一个 numpy 数组。这是因为数据框指数是基于numpy的阵列,而不是系列。
The only way of making the index into a Series is to create a Series from it.
将索引变成系列的唯一方法是从它创建一个系列。
pd.Series(df.index.map(str.lower))
Caveat
警告
The Indexclass now subclasses the StringAccessorMixin, which means that you can do the above operation as follows
在Index类现在的子类StringAccessorMixin,这意味着你可以做以上操作如下
df.index.str.lower()
This still produces an Index object, not a Series.
这仍然产生一个索引对象,而不是一个系列。
回答by suraj747
Assuming that you want to make a column in you're current DataFrame by applying your function "foo" to the index. You could write...
假设您想通过将函数“foo”应用于索引来在当前 DataFrame 中创建一列。你可以写...
df['Month'] = df.index.map(foo)
To generate the series alone you could instead do ...
要单独生成系列,您可以改为执行...
pd.Series({x: foo(x) for x in foo.index})
回答by choldgraf
A lot of answers are returning the Index as an array, which loses information about the index name etc (though you could do pd.Series(index.map(myfunc), name=index.name)). It also won't work for a MultiIndex.
很多答案都将索引作为数组返回,这会丢失有关索引名称等的信息(尽管您可以这样做pd.Series(index.map(myfunc), name=index.name))。它也不适用于 MultiIndex。
The way that I worked with this is to use "rename":
我处理这个的方式是使用“重命名”:
mix = pd.MultiIndex.from_tuples([[1, 'hi'], [2, 'there'], [3, 'dude']], names=['num', 'name'])
data = np.random.randn(3)
df = pd.Series(data, index=mix)
print(df)
num name
1 hi 1.249914
2 there -0.414358
3 dude 0.987852
dtype: float64
# Define a few dictionaries to denote the mapping
rename_dict = {i: i*100 for i in df.index.get_level_values('num')}
rename_dict.update({i: i+'_yeah!' for i in df.index.get_level_values('name')})
df = df.rename(index=rename_dict)
print(df)
num name
100 hi_yeah! 1.249914
200 there_yeah! -0.414358
300 dude_yeah! 0.987852
dtype: float64
The only trick with this is that your index needs to have unique labels b/w different multiindex levels, but maybe someone more clever than me knows how to get around that. For my purposes this works 95% of the time.
唯一的技巧是你的索引需要有不同的多索引级别的唯一标签,但也许比我更聪明的人知道如何解决这个问题。就我而言,这在 95% 的情况下都有效。
回答by normanius
You can always convert an index using its to_series()method, and then either applyor map, according to your preferences/needs.
您始终可以使用其to_series()方法转换索引,然后根据您的偏好/需要转换为apply或map。
ret = df.index.map(foo) # Returns pd.Index
ret = df.index.to_series().map(foo) # Returns pd.Series
ret = df.index.to_series().apply(foo) # Returns pd.Series
All of the above can be assigned directly to a new or existing column of df:
以上所有内容都可以直接分配给新的或现有的列df:
df["column"] = ret
Just for completeness: pd.Index.map, pd.Series.mapand pd.Series.applyall operate element-wise. I often use mapto apply lookups represented by dictsor pd.Series. applyis more generic because you can pass any function along with additional argsor kwargs. The differences between applyand mapare further discussed in this SO thread. I don't know why pd.Index.applywas omitted.
只是为了完整性:pd.Index.map,pd.Series.map并且pd.Series.apply都按元素操作。我经常使用或map表示的查找。更通用,因为您可以将任何函数与附加或. 和之间的区别在此 SO 线程中进一步讨论。不知道为什么被省略了。dictspd.Seriesapplyargskwargsapplymappd.Index.apply

