Pandas df.at() 引发 AttributeError: 'BlockManager' 对象没有属性 'T'
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/54415673/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas df.at() raising AttributeError: 'BlockManager' object has no attribute 'T'
提问by JCm
I have a relatively huge dataframe. Im trying to iterate to each row and update a column base on certain column value (basically trying to loop a lookup until no further column can be updated)
我有一个相对巨大的数据框。我试图迭代每一行并根据特定的列值更新一列(基本上是尝试循环查找,直到无法更新其他列)
I have the following:
我有以下几点:
df = the huge dataframe (1K to 10K+ rows x 51 cols)
has_update = True
while has_update:
has_update = False
for_procdf = df.loc[df['Incident Group ID'] == '-']
for i, row in for_procdf.iterrows():
#Check if the row's parent ticket id is an existing ticket id in the bigger df
resultRow = df.loc[df['Ticket ID'] == row['Parent Ticket ID']]
resultCount = len(resultRow.index)
if resultCount == 1:
IncidentGroupID = resultRow.iloc[0]['Incident Group ID']
if IncidentGroupID != '-':
df.at[i, "Incident Group ID"] = IncidentGroupID
has_update = True
When I execute the script, an error occurs with the following traceback:
当我执行脚本时,出现以下回溯错误:
Traceback (most recent call last):
File "./sdm.etl.py", line 76, in <module>
main()
File "./sdm.etl.py", line 28, in main
fillIncidentGroupID(sdmdf.df)
File "./sdm.etl.py", line 47, in fillIncidentGroupID
df.at[i, "Incident Group ID"] = IncidentGroupID
File "/usr/local/lib/python3.6/site-packages/pandas/core/indexing.py", line 2159, in __setitem__
self.obj._set_value(*key, takeable=self._takeable)
File "/usr/local/lib/python3.6/site-packages/pandas/core/frame.py", line 2580, in _set_value
series = self._get_item_cache(col)
File "/usr/local/lib/python3.6/site-packages/pandas/core/generic.py", line 2490, in _get_item_cache
res = self._box_item_values(item, values)
File "/usr/local/lib/python3.6/site-packages/pandas/core/frame.py", line 3096, in _box_item_values
return self._constructor(values.T, columns=items, index=self.index)
AttributeError: 'BlockManager' object has no attribute 'T'
However creating a similar scenario returns no error
但是,创建类似的场景不会返回错误
>>> qdf = pd.DataFrame([[0, 2, 3], [0, 4, 1], [10, 20, 30], [10, 13, 17]], index=[0,1,2,3], columns=['Ab 1', 'Bc 2', 'Cd 3'])
>>> qdf
Ab 1 Bc 2 Cd 3
0 0 2 3
1 0 4 1
2 10 20 30
3 10 13 17
>>>
>>> qdf1 = qdf.loc[qdf['Ab 1'] == 0]
>>> qdf1
Ab 1 Bc 2 Cd 3
0 0 2 3
1 0 4 1
>>>
>>> for i, row in qdf1.iterrows():
... qdf.at[i, 'Ab 1'] = 10
...
>>>
>>> qdf
Ab 1 Bc 2 Cd 3
0 10 2 3
1 10 4 1
2 10 20 30
3 10 13 17
What seems to be the problem with my implementation?
我的实施似乎有什么问题?
回答by JCm
Found out that, Nihalis right, the error is caused by a duplicate column name. My dataframe was too big, that I accidentally had a duplicate column name. Everything works fine now. A little time away from the code, rest and eat made me see the duplicate column. Cheers!
发现,Nihal是对的,错误是由重复的列名引起的。我的数据框太大,我不小心有重复的列名。现在一切正常。离开代码一点时间,休息和吃饭让我看到重复的列。干杯!
Below are the columns of my dataframe. "RCA Group ID"has duplicate near the end.
以下是我的数据框的列。“RCA Group ID”在结尾处有重复。
['Incident Group ID', 'RCA Group ID', 'Parent Ticket ID', 'Ticket ID', ..., 'RCA Group ID', 'Is Sector Down', 'Relationship Type']
回答by Serhii Kushchenko
the error is caused by a duplicate column name
该错误是由重复的列名引起的
That was true in my case.
在我的情况下确实如此。
You can use the following function to quickly determine which column names are duplicates.
您可以使用以下函数快速确定哪些列名称是重复的。
def get_duplicate_cols(df: pd.DataFrame) -> pd.Series:
return pd.Series(df.columns).value_counts()[lambda x: x>1]