Python 在 DataFrame 索引上应用函数

Question

提问by Alex Rothberg

What is the best way to apply a function over the index of a Pandas DataFrame? Currently I am using this verbose approach:

在 Pandas 的索引上应用函数的最佳方法是什么DataFrame？目前我正在使用这种冗长的方法：

pd.DataFrame({"Month": df.reset_index().Date.apply(foo)})

where Dateis the name of the index and foois the name of the function that I am applying.

其中Date是索引foo的名称，是我正在应用的函数的名称。

Answer 1

回答by firelynx

As already suggested by HYRY in the comments, Series.mapis the way to go here. Just set the index to the resulting series.

正如 HYRY 在评论中所建议的那样，Series.map是通往这里的方式。只需将索引设置为结果系列。

Simple example:

简单的例子：

df = pd.DataFrame({'d': [1, 2, 3]}, index=['FOO', 'BAR', 'BAZ'])
df
        d
FOO     1
BAR     2
BAZ     3

df.index = df.index.map(str.lower)
df
        d
foo     1
bar     2
baz     3

Index != Series

索引 != 系列

As pointed out by @OP. the df.index.map(str.lower)call returns a numpy array. This is because dataframe indices arebased on numpy arrays, not Series.

正如@OP所指出的那样。该df.index.map(str.lower)调用返回一个 numpy 数组。这是因为数据框指数是基于numpy的阵列，而不是系列。

The only way of making the index into a Series is to create a Series from it.

将索引变成系列的唯一方法是从它创建一个系列。

pd.Series(df.index.map(str.lower))

Caveat

警告

The Indexclass now subclasses the StringAccessorMixin, which means that you can do the above operation as follows

在Index类现在的子类StringAccessorMixin，这意味着你可以做以上操作如下

df.index.str.lower()

This still produces an Index object, not a Series.

这仍然产生一个索引对象，而不是一个系列。

Answer 2

回答by suraj747

Assuming that you want to make a column in you're current DataFrame by applying your function "foo" to the index. You could write...

假设您想通过将函数“foo”应用于索引来在当前 DataFrame 中创建一列。你可以写...

df['Month'] = df.index.map(foo)

To generate the series alone you could instead do ...

要单独生成系列，您可以改为执行...

pd.Series({x: foo(x) for x in foo.index})

Answer 3

回答by choldgraf

A lot of answers are returning the Index as an array, which loses information about the index name etc (though you could do pd.Series(index.map(myfunc), name=index.name)). It also won't work for a MultiIndex.

很多答案都将索引作为数组返回，这会丢失有关索引名称等的信息（尽管您可以这样做pd.Series(index.map(myfunc), name=index.name)）。它也不适用于 MultiIndex。

The way that I worked with this is to use "rename":

我处理这个的方式是使用“重命名”：

mix = pd.MultiIndex.from_tuples([[1, 'hi'], [2, 'there'], [3, 'dude']], names=['num', 'name'])
data = np.random.randn(3)
df = pd.Series(data, index=mix)
print(df)
num  name 
1    hi       1.249914
2    there   -0.414358
3    dude     0.987852
dtype: float64

# Define a few dictionaries to denote the mapping
rename_dict = {i: i*100 for i in df.index.get_level_values('num')}
rename_dict.update({i: i+'_yeah!' for i in df.index.get_level_values('name')})
df = df.rename(index=rename_dict)
print(df)
num  name       
100  hi_yeah!       1.249914
200  there_yeah!   -0.414358
300  dude_yeah!     0.987852
dtype: float64

The only trick with this is that your index needs to have unique labels b/w different multiindex levels, but maybe someone more clever than me knows how to get around that. For my purposes this works 95% of the time.

唯一的技巧是你的索引需要有不同的多索引级别的唯一标签，但也许比我更聪明的人知道如何解决这个问题。就我而言，这在 95% 的情况下都有效。

Answer 4

回答by normanius

You can always convert an index using its to_series()method, and then either applyor map, according to your preferences/needs.

您始终可以使用其to_series()方法转换索引，然后根据您的偏好/需要转换为apply或map。

ret = df.index.map(foo)                # Returns pd.Index
ret = df.index.to_series().map(foo)    # Returns pd.Series
ret = df.index.to_series().apply(foo)  # Returns pd.Series

All of the above can be assigned directly to a new or existing column of df:

以上所有内容都可以直接分配给新的或现有的列df：

df["column"] = ret

Just for completeness: pd.Index.map, pd.Series.mapand pd.Series.applyall operate element-wise. I often use mapto apply lookups represented by dictsor pd.Series. applyis more generic because you can pass any function along with additional argsor kwargs. The differences between applyand mapare further discussed in this SO thread. I don't know why pd.Index.applywas omitted.

只是为了完整性：pd.Index.map，pd.Series.map并且pd.Series.apply都按元素操作。我经常使用或map表示的查找。更通用，因为您可以将任何函数与附加或. 和之间的区别在此 SO 线程中进一步讨论。不知道为什么被省略了。dictspd.Seriesapplyargskwargsapplymappd.Index.apply

Python 在 DataFrame 索引上应用函数

提问by Alex Rothberg

回答by firelynx

Index != Series

索引 != 系列

Caveat

警告

回答by suraj747

回答by choldgraf

回答by normanius

相关推荐

最近更新

标签

Python 在 DataFrame 索引上应用函数

提问by Alex Rothberg

回答by firelynx

Index != Series

索引 != 系列

Caveat

警告

回答by suraj747

回答by choldgraf

回答by normanius

相关推荐

Python 仅从此元素中提取文本，而不是其子元素

在 Python 中，我如何知道进程何时完成？

Python 如何创建一个旋转的命令行光标？

正则表达式上的Python拆分字符串

相关推荐

最近更新

标签