在 Pandas 中将 lambda 函数应用于列失败

Question

提问by u2130573

I don't know why the index method has inconsistent behavior while doing column-wise apply function.

我不知道为什么 index 方法在执行按列应用函数时具有不一致的行为。

The data frame is:

数据框是：

df = pd.DataFrame( [(1, 'Hello'), (2, "World")])
df.columns=['A', 'B']

And I want to apply lambda to the second columns, it it saying the Series object can not be apply?

我想将 lambda 应用于第二列，它说不能应用 Series 对象？

print df.iloc[:, 1:2].apply(lambda x: x.upper()).head()
 **AttributeError**:("'Series' object has no attribute 'upper'", u'occurred at index B')
print df.loc[:, ['B']].apply(lambda x: x.upper()).head()
 **AttributeError**:("'Series' object has no attribute 'upper'", u'occurred at index B')

But rather the following indexing method works well.

而是以下索引方法效果很好。

print df.loc[:, 'B'].apply(lambda x: x.upper()).head()

Why? I think the three index methods are equivalent? All above three indexing method has almostly the same result if print out that is:

为什么？我认为这三种索引方法是等效的吗？如果打印出来，以上三种索引方法的结果几乎相同：

   B
0  Hello
1  World

and print df.loc[:, 'B'] gets

并打印 df.loc[:, 'B'] 得到

0  Hello
1  World
Name: B, dtype: object

What do the differences mean?

这些差异意味着什么？

Answer 1

回答by BrenBarn

When you index with 'B'you get a series. When you index with 1:2or with ['B'], you get a DataFrame with one column. When you use applyon a series, your function is called on each element. When you use applyon a DataFrame, your function is called on each column.

当你索引时，'B'你会得到一个系列。当您使用1:2或进行索引时['B']，您会得到一个包含一列的 DataFrame。当您apply在系列上使用时，您的函数会在每个元素上调用。当您apply在 DataFrame 上使用时，您的函数会在每一列上调用。

So no, they aren't equivalent. When you have a Series you can use your function as you want. When you have a one-column DataFrame, you can't, because it gets passed the column as its argument, and the column is a Series that doesn't have an uppermethod.

所以不，它们不是等价的。当您拥有系列时，您可以根据需要使用您的功能。当您有一个单列 DataFrame 时，您不能，因为它将列作为其参数传递，并且该列是一个没有upper方法的系列。

You can see that they aren't the same because the results are different when you print them out. Yes, they're almostthe same, but not the same. The first one has a column header, indicating that it's a DataFrame; the second has no column header but has the "Name" at the bottom, indicating it's a Series.

您可以看到它们并不相同，因为打印出来的结果是不同的。是的，它们几乎相同，但不相同。第一个有列标题，表示它是一个DataFrame；第二个没有列标题，但底部有“名称”，表明它是一个系列。

Answer 2

回答by Roman Pekar

As @BrenBarn mentioned, the difference is that in case of df.iloc[:, 1:2]you have DataFrame with one column, while in case of df.loc[:, 'B']you have a Series. Just a little addition, to convert DataFrame with one column into series you can use pandas.squeeze()method:

正如@BrenBarn 所提到的，不同之处在于，如果df.iloc[:, 1:2]您的 DataFrame 有一列，而如果df.loc[:, 'B']您有一个系列。只是一点点补充，要将具有一列的 DataFrame 转换为系列，您可以使用pandas.squeeze()方法：

>>> df.iloc[:, 1:2]
       B
0  Hello
1  World
>>> df.iloc[:, 1:2].squeeze()
0    Hello
1    World
Name: B, dtype: object

and then you can use apply (you don't have to use lambda, BTW):

然后你可以使用 apply （你不必使用lambda，顺便说一句）：

>>> df.iloc[:, 1:2].squeeze().apply(str.upper)
0    HELLO
1    WORLD
Name: B, dtype: object

If you want to apply upperto DataFrame, you can use pandas.applymap():

如果要申请upperDataFrame，可以使用pandas.applymap()：

>>> df.iloc[:, 1:2].applymap(str.upper)
       B
0  HELLO
1  WORLD

在 Pandas 中将 lambda 函数应用于列失败

提问by u2130573

回答by BrenBarn

回答by Roman Pekar

相关推荐

最近更新

标签

在 Pandas 中将 lambda 函数应用于列失败

提问by u2130573

回答by BrenBarn

回答by Roman Pekar

相关推荐

Pandas groupby boxplots 的样式

我可以用 Jython 运行 numpy 和 pandas

Python Pandas：分组日期，并按时间戳访问每个组

pandas 从 Python 中的颜色字典绘制不同颜色的线条

相关推荐

最近更新

标签