Pandas Dataframe ValueError:传递值的形状是 (X, ),索引意味着 (X, Y)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19666904/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas Dataframe ValueError: Shape of passed values is (X, ), indices imply (X, Y)
提问by user1367204
I am getting an error and I'm not sure how to fix it.
我收到一个错误,我不知道如何解决它。
The following seems to work:
以下似乎有效:
def random(row):
return [1,2,3,4]
df = pandas.DataFrame(np.random.randn(5, 4), columns=list('ABCD'))
df.apply(func = random, axis = 1)
and my output is:
我的输出是:
[1,2,3,4]
[1,2,3,4]
[1,2,3,4]
[1,2,3,4]
However, when I change one of the of the columns to a value such as 1 or None:
但是,当我将其中一列更改为 1 或 None 等值时:
def random(row):
return [1,2,3,4]
df = pandas.DataFrame(np.random.randn(5, 4), columns=list('ABCD'))
df['E'] = 1
df.apply(func = random, axis = 1)
I get the the error:
我得到错误:
ValueError: Shape of passed values is (5,), indices imply (5, 5)
I've been wrestling with this for a few days now and nothing seems to work. What is interesting is that when I change
我已经为此挣扎了几天,但似乎没有任何效果。有趣的是,当我改变
def random(row):
return [1,2,3,4]
to
到
def random(row):
print [1,2,3,4]
everything seems to work normally.
一切似乎都正常工作。
This question is a clearer way of asking this question, which I feel may have been confusing.
这个问题是问这个问题的更清晰的方式,我觉得这可能令人困惑。
My goal is to compute a list for each row and then create a column out of that.
我的目标是为每一行计算一个列表,然后从中创建一列。
EDIT: I originally start with a dataframe that hase one column. I add 4 columns in 4 difference apply steps, and then when I try to add another column I get this error.
编辑:我最初从一个包含一列的数据框开始。我在 4 个差异应用步骤中添加了 4 列,然后当我尝试添加另一列时出现此错误。
采纳答案by Roman Pekar
If your goal is add new column to DataFrame, just write your function as function returning scalar value (not list), something like this:
如果您的目标是向 DataFrame 添加新列,只需将您的函数编写为返回标量值(不是列表)的函数,如下所示:
>>> def random(row):
... return row.mean()
and then use apply:
然后使用应用:
>>> df['new'] = df.apply(func = random, axis = 1)
>>> df
A B C D new
0 0.201143 -2.345828 -2.186106 -0.784721 -1.278878
1 -0.198460 0.544879 0.554407 -0.161357 0.184867
2 0.269807 1.132344 0.120303 -0.116843 0.351403
3 -1.131396 1.278477 1.567599 0.483912 0.549648
4 0.288147 0.382764 -0.840972 0.838950 0.167222
I don't know if it possible for your new column to contain lists, but it deinitely possible to contain tuples ((...)
instead of [...]
):
我不知道您的新列是否可能包含列表,但绝对可能包含元组((...)
而不是[...]
):
>>> def random(row):
... return (1,2,3,4,5)
...
>>> df['new'] = df.apply(func = random, axis = 1)
>>> df
A B C D new
0 0.201143 -2.345828 -2.186106 -0.784721 (1, 2, 3, 4, 5)
1 -0.198460 0.544879 0.554407 -0.161357 (1, 2, 3, 4, 5)
2 0.269807 1.132344 0.120303 -0.116843 (1, 2, 3, 4, 5)
3 -1.131396 1.278477 1.567599 0.483912 (1, 2, 3, 4, 5)
4 0.288147 0.382764 -0.840972 0.838950 (1, 2, 3, 4, 5)
回答by KeepLearning
I use the code below it is just fine
我使用下面的代码就好了
import numpy as np
df = pd.DataFrame(np.array(your_data), columns=columns)