pandas 类型错误：“系列”对象是可变的，因此它们不能被散列

Question

提问by Mayeul sgc

I know this error is common, I tried some solutions I looked up and still can't understand what is wrong. I guess it is due to the mutable form of row and row1, but i can't figure it out

我知道这个错误很常见，我尝试了一些我查过的解决方案，但仍然不明白出了什么问题。我想这是由于 row 和 row1 的可变形式，但我想不通

What am I trying to do ? I have 2 dataframes. I need to iterate over the rows of the first 1, and for each line of the first one iterate through the second and check the value of the cell for some columns. My code and different attempts :

我想做什么？我有 2 个数据框。我需要遍历第一个 1 的行，并且对于第一个的每一行遍历第二个并检查某些列的单元格的值。我的代码和不同的尝试：

a=0
b=0
  for row in Correction.iterrows():
        b+=1
        for row1 in dataframe.iterrows():
            c+=1
            a=0
            print('Handling correction '+str(b)+' and deal '+str(c))
            if (Correction.loc[row,['BO Branch Code']]==dataframe.loc[row1,['wings Branch']] and Correction.loc[row,['Profit Center']]==dataframe.loc[row1,['Profit Center']] and Correction.loc[row,['Back Office']]==dataframe.loc[row1,['Back Office']]
                and Correction.loc[row,['BO System Code']]==dataframe.loc[row1,['BO System Code']]):

I also tried

我也试过

a=0
b=0
 for row in Correction.iterrows():
        b+=1
        for row1 in dataframe.iterrows():
            c+=1
            a=0
            print('Handling correction '+str(b)+' and deal '+str(c))
            if (Correction[row]['BO Branch Code']==dataframe[row1]['wings Branch'] and Correction[row]['Profit Center']==dataframe[row1]['Profit Center'] and Correction[row]['Back Office']==dataframe[row1]['Back Office']
                and Correction[row]['BO System Code']==dataframe[row1]['BO System Code']):

And

和

a=0
b=0
 for row in Correction.iterrows():
        b+=1
        for row1 in dataframe.iterrows():
            c+=1
            a=0
            print('Handling correction '+str(b)+' and deal '+str(c))
            if (Correction.loc[row,['BO Branch Code']]==dataframe[row1,['wings Branch']] and Correction[row,['Profit Center']]==dataframe[row1,['Profit Center']] and Correction[row,['Back Office']]==dataframe[row1,['Back Office']]
                and Correction[row,['BO System Code']]==dataframe[row1,['BO System Code']]):

Answer 1

回答by Mayeul sgc

I found a way around by changing my for loop now my code is :

我通过更改 for 循环找到了解决方法，现在我的代码是：

a=0
b=0
 for index in Correction.index:
        b+=1
        for index1 in dataframe.index:
            c+=1
            a=0
            print('Handling correction '+str(b)+' and deal '+str(c))
            if (Correction.loc[row,'BO Branch Code']==dataframe.loc[row1,'Wings Branch]] and Correction.loc[row,'Profit Center']==dataframe.loc[row1,'Profit Center'] and Correction.loc[row,'Back Office']==dataframe.loc[row1,'Back Office']
                and Correction.loc[row,'BO System Code']==dataframe.loc[row1,'BO System Code']):

Answer 2

回答by Vikash Singh

I think you are iterating your df wrong

我认为你在迭代你的 df 错误

for row in Correction.itertuples():
    bo_branch_code = row['BO Branch Code']
    for row1 in dataframe.itertuples():
        if row1['wings Branch'] == bo_branch_code:
            # do stuff here

reference how to iterate DataFrame: https://github.com/vi3k6i5/pandas_basics/blob/master/2.A%20Iterate%20over%20a%20dataframe.ipynb

参考如何迭代 DataFrame：https: //github.com/vi3k6i5/pandas_basics/blob/master/2.A%20Iterate%20over%20a%20dataframe.ipynb

I timed your index approach and iteraterows approach. Here are the results:

我为您的索引方法和 iteraterows 方法计时。结果如下：

import pandas as pd
import numpy as np
import time

df = pd.DataFrame(np.random.randint(0,100,size=(10, 4)), columns=list('ABCD'))

df_2 = pd.DataFrame(np.random.randint(0,100,size=(10, 4)), columns=list('ABCD'))

def test_time():
    for index in df.index:
        for index1 in df_2.index:
            if (df.loc[index, 'A'] == df_2.loc[index1, 'A']):
                continue

def test_time_2():
    for idx, row in df.iterrows():
        a_val = row['A']
        for idy, row_1 in df_2.iterrows():
            if (a_val == row_1['A']):
                continue

start= time.clock()
test_time()
end= time.clock()
print(end-start)
# 0.038514999999999855

start= time.clock()
test_time_2()
end= time.clock()
print(end-start)
# 0.009272000000000169

Simply saying iterrows is way faster than your approach.

简单地说 iterrows 比你的方法快得多。

Reference on good approaches to loop over a dataframe What is the most efficient way to loop through dataframes with pandas?

关于循环数据帧的好方法的参考使用 Pandas 循环数据帧的最有效方法是什么？

pandas 类型错误：“系列”对象是可变的，因此它们不能被散列

提问by Mayeul sgc

回答by Mayeul sgc

回答by Vikash Singh

相关推荐

最近更新

标签

pandas 类型错误：“系列”对象是可变的，因此它们不能被散列

提问by Mayeul sgc

回答by Mayeul sgc

回答by Vikash Singh

相关推荐

在 Pandas DataFrame 中设置新列以避免 SettingWithCopyWarning 的正确方法

pandas Python 向数组中添加项

在 Pandas 中将对象转换为字符串

将 Pandas DataFrame 索引转换为时间戳格式

相关推荐

最近更新

标签