无法执行 Python Pandas set_value

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37533200/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:19:12  来源:igfitidea点击:

Can't execute Python Pandas set_value

pythoncsvpandas

提问by Windtalker

Got a problem with Pandas in Python 3.5

Python 3.5 中的 Pandas 有问题

I read local csv using Pandas, the csv contains pure data, no header involved. Then I assigned column name using

我使用 Pandas 读取本地 csv,csv 包含纯数据,不涉及标题。然后我使用指定的列名

df= pd.read_csv(filePath, header=None)
df.columns=['XXX', 'XXX'] #for short, totally 11 cols

The csv has 11 columns, one of them is string, others are integer.

csv 有 11 列,其中一列是字符串,其他是整数。

Then I tried to replace string column by integer value in a loop, cell by cell

然后我尝试在循环中逐个单元格地用整数值替换字符串列

for i, row in df.iterrows():
    print(i, row['Name'])
    df.set_value(i, 'Name', 123)

intrger 123 is an example, not every cell under this column is 123. print function works well if I remove set_value, but with

intrger 123 是一个例子,并非此列下的每个单元格都是 123。如果我删除 set_value,打印功能运行良好,但使用

df.set_value(i, 'Name', 123)

Then error info:

然后错误信息:

Traceback (most recent call last): File "D:/xxx/test.py", line 20, in df.set_value(i, 'Name', 233)

File "E:\Users\XXX\Anaconda3\lib\site-packages\pandas\core\frame.py", line 1862, in set_value series = self._get_item_cache(col)

File "E:\Users\XXX\Anaconda3\lib\site-packages\pandas\core\generic.py", line 1351, in _get_item_cache res = self._box_item_values(item, values)

File "E:\Users\XXX\Anaconda3\lib\site-packages\pandas\core\frame.py", line 2334, in _box_item_values

return self._constructor(values.T, columns=items, index=self.index)

AttributeError: 'BlockManager' object has no attribute 'T'

回溯(最近一次调用最后一次):文件“D:/xxx/test.py”,第 20 行,在 df.set_value(i, 'Name', 233)

文件“E:\Users\XXX\Anaconda3\lib\site-packages\pandas\core\frame.py”,第 1862 行,在 set_value 系列中 = self._get_item_cache(col)

文件“E:\Users\XXX\Anaconda3\lib\site-packages\pandas\core\generic.py”,第 1351 行,在 _get_item_cache res = self._box_item_values(item, values)

文件“E:\Users\XXX\Anaconda3\lib\site-packages\pandas\core\frame.py”,第 2334 行,在 _box_item_values

返回 self._constructor(values.T, columns=items, index=self.index)

AttributeError: 'BlockManager' 对象没有属性 'T'

But if I create a dataframe manually in code

但是如果我在代码中手动创建一个数据框

df = pd.DataFrame(index=[0, 1, 2], columns=['x', 'y'])
df['x'] = 2
df['y'] = 'BBB'
print(df)
for i, row in df.iterrows():
    df.set_value(i, 'y', 233)


print('\n')
print(df)

It worked. I am wondering maybe there is something I am missing?

有效。我想知道也许我遗漏了什么?

Thanks!

谢谢!

回答by TheRoman

The cause of the original error:

原错误原因:

Pandas DataFrame set_value(index, col, value) method will return the posted obscure AttributeError: 'BlockManager' object has no attribute 'T' when the dataframe being modified has duplicate column names.

Pandas DataFrame set_value(index, col, value) 方法将返回发布的晦涩的 AttributeError: 'BlockManager' object has no attribute 'T' 当正在修改的数据帧具有重复的列名时。

The error can be reproduced using the code above by @Windtalker where the only change made is that the column names are now both 'x' rather than 'x' and 'y'.

可以使用@Windtalker 使用上面的代码重现该错误,其中唯一的更改是列名现在都是“x”而不是“x”和“y”。

import pandas as pd
df = pd.DataFrame(index=[0, 1, 2], columns=['x', 'x'])
df['x'] = 2
df['y'] = 'BBB'
print(df)
for i, row in df.iterrows():
    df.set_value(i, 'y', 233)

print('\n')
print(df)

Hopefully this helps someone else diagnose the same issue.

希望这有助于其他人诊断相同的问题。

回答by MaxU

well, now when you made it lot clearer, it's easier to answer your question...

好吧,现在当你说得更清楚了,就更容易回答你的问题了......

assuming your DF looks like this:

假设您的 DF 如下所示:

In [164]: df
Out[164]:
    a   b   c   d   e          city
0   6  55   3  48  11          Kiev
1   5  29  42  95  69        Munich
2  53  79  60  80  89        Berlin
3   6  70  87   6  85      New York
4  97  23  94  43  31         Paris
5  15  17  56  34  77  Zaporizhzhia
6  28  35  58  82  33        Warsaw
7  41  93  60  54  21      Hurghada
8  68  23  80  39  66          Bern
9  15  17  30  26  98          Lviv

and you hasve another DF with city-id's:

你有另一个带有城市 ID 的 DF:

In [165]: cities
Out[165]:
              id
city
Warsaw         6
Kiev           0
New York       3
Hurghada       7
Munich         1
Paris          4
Berlin         2
Zaporizhzhia   5
Lviv           9
Bern           8

you can map city to city-id like this:

您可以像这样将城市映射到城市 ID:

In [168]: df['city_id'] = df['city'].map(cities['id'])

In [169]: df
Out[169]:
    a   b   c   d   e          city  city_id
0   6  55   3  48  11          Kiev        0
1   5  29  42  95  69        Munich        1
2  53  79  60  80  89        Berlin        2
3   6  70  87   6  85      New York        3
4  97  23  94  43  31         Paris        4
5  15  17  56  34  77  Zaporizhzhia        5
6  28  35  58  82  33        Warsaw        6
7  41  93  60  54  21      Hurghada        7
8  68  23  80  39  66          Bern        8
9  15  17  30  26  98          Lviv        9

PS when working with Pandas in 95% you don't really need to loop through your DF's in order to achieve your goals

PS 在 95% 中使用 Pandas 时,您实际上并不需要遍历 DF 来实现您的目标