Python 使用可迭代对象进行设置时必须具有相等的 len 键和值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/48000225/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Must have equal len keys and value when setting with an iterable
提问by user3806649
I have two dataframe as flow:
我有两个数据框作为流程:
leader:
0 11
1 8
2 5
3 9
4 8
5 6
[6065 rows x 2 columns]
DatasetLabel:
Unnamed: 0 0 1 .... 7 8 9 10 11 12
0 A J .... 1 2 5 NaN NaN NaN
1 B K .... 3 4 NaN NaN NaN NaN
[4095 rows x 14 columns]
The Information dataset column names 0 to 6 are DatasetLabel about data and 7 to 12 are indexes where refer to first column of leader Dataframe.
信息数据集列名 0 到 6 是关于数据的 DatasetLabel,7 到 12 是索引,其中引用了领导者 Dataframe 的第一列。
I want to create dataset where instead of the indexes in DatasetLabel Dataset I have the value of each index from the leader dataset which is leader.iloc[index,1]
我想创建数据集,而不是 DatasetLabel 数据集中的索引,我有领导数据集中每个索引的值,它是 leader.iloc[index,1]
How can I do it using python features?
我如何使用 python 功能来做到这一点?
The output should look like:
输出应如下所示:
DatasetLabel:
Unnamed: 0 0 1 .... 7 8 9 10 11 12
0 A J .... 8 5 6 NaN NaN NaN
1 B K .... 9 8 NaN NaN NaN NaN
I have came up with following, but I get error:
我想出了以下内容,但出现错误:
for column in DatasetLabel.ix[:,8:13]:
DatasetLabel[DatasetLabel[column].notnull ()]=leader.iloc[DatasetLabel[DatasetLabel[column].notnull ()][column].values,1]
Error:
错误:
ValueError: Must have equal len keys and value when setting with an iterable
回答by andrew_reece
You can use apply
to index into leader
and exchange values with DatasetLabel
, although it's not very pretty.
您可以使用apply
来索引leader
和交换值DatasetLabel
,尽管它不是很漂亮。
One issue is that Pandas won't let us index with NaN
. Converting to str
provides a workaround. But that creates a second issue, namely, column 9
is of type float
(because NaN
is float
), so 5
becomes 5.0
. Once it's a string, that's "5.0"
, which will fail to match the index values in leader
. We can remove the .0
, and then this solution will work - but it's a bit of a hack.
一个问题是 Pandas 不会让我们用NaN
. 转换为str
提供了一种解决方法。但这会产生第二个问题,即 column9
是类型float
(因为NaN
是float
),所以5
变成5.0
. 一旦它是一个字符串,那就是"5.0"
,它将无法匹配 中的索引值leader
。我们可以删除.0
,然后此解决方案将起作用 - 但这有点麻烦。
With DatasetLabel
as:
与DatasetLabel
:
Unnamed:0 0 1 7 8 9 10 11 12
0 0 A J 1 2 5.0 NaN NaN NaN
1 1 B K 3 4 NaN NaN NaN NaN
And leader
as:
并leader
作为:
0 1
0 0 11
1 1 8
2 2 5
3 3 9
4 4 8
5 5 6
Then:
然后:
cols = ["7","8","9","10","11","12"]
updated = DatasetLabel[cols].apply(
lambda x: leader.loc[x.astype(str).str.split(".").str[0], 1].values, axis=1)
updated
7 8 9 10 11 12
0 8.0 5.0 6.0 NaN NaN NaN
1 9.0 8.0 NaN NaN NaN NaN
Now we can concat
the unmodified columns (which we'll call original
) with updated
:
现在我们可以concat
将未修改的列(我们称之为original
)updated
:
original_cols = DatasetLabel.columns[~DatasetLabel.columns.isin(cols)]
original = DatasetLabel[original_cols]
pd.concat([original, updated], axis=1)
Output:
输出:
Unnamed:0 0 1 7 8 9 10 11 12
0 0 A J 8.0 5.0 6.0 NaN NaN NaN
1 1 B K 9.0 8.0 NaN NaN NaN NaN
Note: It may be clearer to use concat
here, but here's another, cleaner way of merging original
and updated
, using assign
:
注意:concat
在这里使用可能更清晰,但这里有另一种更简洁的合并original
和updated
使用方式assign
:
DatasetLabel.assign(**updated)