pandas 熊猫：用一些 numpy 数组填充一列

Question

提问by Nic

I am using python2.7 and pandas 0.11.0.

我正在使用 python2.7 和 Pandas 0.11.0。

I try to fill a column of a dataframe using DataFrame.apply(func). The func() function is supposed to return a numpy array (1x3).

我尝试使用 DataFrame.apply(func) 填充数据框的一列。func() 函数应该返回一个 numpy 数组（1x3）。

import pandas as pd
import numpy as np

df= pd.DataFrame(np.random.randn(4, 3), columns=list('ABC'))
print(df)

              A         B         C
    0  0.910142  0.788300  0.114164
    1 -0.603282 -0.625895  2.843130
    2  1.823752 -0.091736 -0.107781
    3  0.447743 -0.163605  0.514052

The function used for testing purpose:

用于测试目的的函数：

def test(row):
   # some complex calc here 
   # based on the values from different columns 
   return np.array((1,2,3))

df['D'] = df.apply(test, axis=1)

[...]
ValueError: Wrong number of items passed 1, indices imply 3

The funny is that when I create the dataframe from scratch, it works pretty well, and returns as expected:

有趣的是，当我从头开始创建数据框时，它运行良好，并按预期返回：

dic = {'A': {0: 0.9, 1: -0.6, 2: 1.8, 3: 0.4}, 
     'C': {0: 0.1, 1: 2.8, 2: -0.1, 3: 0.5}, 
     'B': {0: 0.7, 1: -0.6, 2: -0.1, 3: -0.1},
     'D': {0:np.array((1,2,3)), 
          1:np.array((1,2,3)), 
          2:np.array((1,2,3)), 
          3:np.array((1,2,3))}}

df= pd.DataFrame(dic)
print(df)
         A    B    C          D
    0  0.9  0.7  0.1  [1, 2, 3]
    1 -0.6 -0.6  2.8  [1, 2, 3]
    2  1.8 -0.1 -0.1  [1, 2, 3]
    3  0.4 -0.1  0.5  [1, 2, 3]

Thanks in advance

提前致谢

Answer 1

回答by Viktor Kerkez

If you try to return multiple values from the function that is passed to apply, and the DataFrame you call the applyon has the same number of item along the axis (in this case columns) as the number of values you returned, Pandas will create a DataFrame from the return values with the same labels as the original DataFrame. You can see this if you just do:

如果您尝试从传递给的函数返回多个值apply，并且您调用的 DataFrameapply沿轴（在本例中为列）的项目数与您返回的值数相同，Pandas 将创建一个 DataFrame来自与原始 DataFrame 具有相同标签的返回值。如果你只是这样做，你可以看到这一点：

>>> def test(row):
        return [1, 2, 3]
>>> df= pd.DataFrame(np.random.randn(4, 3), columns=list('ABC'))
>>> df.apply(test, axis=1)
   A  B  C
0  1  2  3
1  1  2  3
2  1  2  3
3  1  2  3

And that is why you get the error, since you cannot assign a DataFrame to DataFrame column.

这就是您收到错误的原因，因为您无法将 DataFrame 分配给 DataFrame 列。

If you return any other number of values, it will return just a series object, that can be assigned:

如果您返回任何其他数量的值，它将只返回一个可以分配的系列对象：

>>> def test(row):
       return [1, 2]
>>> df= pd.DataFrame(np.random.randn(4, 3), columns=list('ABC'))
>>> df.apply(test, axis=1)
0    [1, 2]
1    [1, 2]
2    [1, 2]
3    [1, 2]
>>> df['D'] = df.apply(test, axis=1)
>>> df
          A         B         C       D
0  0.333535  0.209745 -0.972413  [1, 2]
1  0.469590  0.107491 -1.248670  [1, 2]
2  0.234444  0.093290 -0.853348  [1, 2]
3  1.021356  0.092704 -0.406727  [1, 2]

I'm not sure why Pandas does this, and why it does it only when the return value is a listor an ndarray, since it won't do it if you return a tuple:

我不确定 Pandas 为什么要这样做，以及为什么它只在返回值为 alist或 an时才ndarray这样做，因为如果您返回 a ，它就不会这样做tuple：

>>> def test(row):
        return (1, 2, 3)
>>> df= pd.DataFrame(np.random.randn(4, 3), columns=list('ABC'))
>>> df['D'] = df.apply(test, axis=1)
>>> df
          A         B         C          D
0  0.121136  0.541198 -0.281972  (1, 2, 3)
1  0.569091  0.944344  0.861057  (1, 2, 3)
2 -1.742484 -0.077317  0.181656  (1, 2, 3)
3 -1.541244  0.174428  0.660123  (1, 2, 3)

pandas 熊猫：用一些 numpy 数组填充一列

提问by Nic

回答by Viktor Kerkez

相关推荐

最近更新

标签

pandas 熊猫：用一些 numpy 数组填充一列

提问by Nic

回答by Viktor Kerkez

相关推荐

Pandas drop 函数：不可对齐的布尔系列

为 pandas.DataFrame 复制 GROUP_CONCAT

将 fill_between() 与 Pandas 数据系列一起使用

pandas 熊猫平均函数的 NaN 结果

相关推荐

最近更新

标签