在 Pandas 中将 lambda 函数应用于列失败

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19782586/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:18:28  来源:igfitidea点击:

Applying a lambda function to a column got failed in pandas

pythonpython-2.7pandas

提问by u2130573

I don't know why the index method has inconsistent behavior while doing column-wise apply function.

我不知道为什么 index 方法在执行按列应用函数时具有不一致的行为。

The data frame is:

数据框是:

df = pd.DataFrame( [(1, 'Hello'), (2, "World")])
df.columns=['A', 'B']

And I want to apply lambda to the second columns, it it saying the Series object can not be apply?

我想将 lambda 应用于第二列,它说不能应用 Series 对象?

print df.iloc[:, 1:2].apply(lambda x: x.upper()).head()
 **AttributeError**:("'Series' object has no attribute 'upper'", u'occurred at index B')
print df.loc[:, ['B']].apply(lambda x: x.upper()).head()
 **AttributeError**:("'Series' object has no attribute 'upper'", u'occurred at index B')

But rather the following indexing method works well.

而是以下索引方法效果很好。

print df.loc[:, 'B'].apply(lambda x: x.upper()).head()

Why? I think the three index methods are equivalent? All above three indexing method has almostly the same result if print out that is:

为什么?我认为这三种索引方法是等效的吗?如果打印出来,以上三种索引方法的结果几乎相同:

   B
0  Hello
1  World

and print df.loc[:, 'B'] gets

并打印 df.loc[:, 'B'] 得到

0  Hello
1  World
Name: B, dtype: object

What do the differences mean?

这些差异意味着什么?

回答by BrenBarn

When you index with 'B'you get a series. When you index with 1:2or with ['B'], you get a DataFrame with one column. When you use applyon a series, your function is called on each element. When you use applyon a DataFrame, your function is called on each column.

当你索引时,'B'你会得到一个系列。当您使用1:2或 进行索引时['B'],您会得到一个包含一列的 DataFrame。当您apply在系列上使用时,您的函数会在每个元素上调用。当您apply在 DataFrame 上使用时,您的函数会在每一上调用。

So no, they aren't equivalent. When you have a Series you can use your function as you want. When you have a one-column DataFrame, you can't, because it gets passed the column as its argument, and the column is a Series that doesn't have an uppermethod.

所以不,它们不是等价的。当您拥有系列时,您可以根据需要使用您的功能。当您有一个单列 DataFrame 时,您不能,因为它将列作为其参数传递,并且该列是一个没有upper方法的系列。

You can see that they aren't the same because the results are different when you print them out. Yes, they're almostthe same, but not the same. The first one has a column header, indicating that it's a DataFrame; the second has no column header but has the "Name" at the bottom, indicating it's a Series.

您可以看到它们并不相同,因为打印出来的结果是不同的。是的,它们几乎相同,但不相同。第一个有列标题,表示它是一个DataFrame;第二个没有列标题,但底部有“名称”,表明它是一个系列。

回答by Roman Pekar

As @BrenBarn mentioned, the difference is that in case of df.iloc[:, 1:2]you have DataFrame with one column, while in case of df.loc[:, 'B']you have a Series. Just a little addition, to convert DataFrame with one column into series you can use pandas.squeeze()method:

正如@BrenBarn 所提到的,不同之处在于,如果df.iloc[:, 1:2]您的 DataFrame 有一列,而如果df.loc[:, 'B']您有一个系列。只是一点点补充,要将具有一列的 DataFrame 转换为系列,您可以使用pandas.squeeze()方法:

>>> df.iloc[:, 1:2]
       B
0  Hello
1  World
>>> df.iloc[:, 1:2].squeeze()
0    Hello
1    World
Name: B, dtype: object

and then you can use apply (you don't have to use lambda, BTW):

然后你可以使用 apply (你不必使用lambda,顺便说一句):

>>> df.iloc[:, 1:2].squeeze().apply(str.upper)
0    HELLO
1    WORLD
Name: B, dtype: object

If you want to apply upperto DataFrame, you can use pandas.applymap():

如果要申请upperDataFrame,可以使用pandas.applymap()

>>> df.iloc[:, 1:2].applymap(str.upper)
       B
0  HELLO
1  WORLD