pandas Python:在 pd.DataFrame 中循环遍历行时,“ValueError:只能将大小为 1 的数组转换为 Python 标量”

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/51500889/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:50:11  来源:igfitidea点击:

Python: 'ValueError: can only convert an array of size 1 to a Python scalar' when looping over rows in pd.DataFrame

pythonpandasdataframe

提问by parno

I would like to loop over the rows of a DataFrame, in my case to calculate strength ratings for a number of sports teams.

我想遍历 DataFrame 的行,在我的例子中计算多个运动队的强度等级。

The DataFrame columns 'home_elo'and 'away_elo'contain the pre-match strength rating (ELO score) of the teams involved and are updated in the row of the next home / away match after the match (each team has two strength ratings at any point in time, for home and away games), with what update_elo(a,b,c)returns.

DataFrame 列'home_elo''away_elo'包含所涉及球队的赛前实力评级(ELO 分数),并在比赛结束后在下一场主/客场比赛的行中更新(每支球队在任何时间点都有两个实力评级,对于主场和客场比赛),什么update_elo(a,b,c)回报。

The respective code snippet looks as follows:

相应的代码片段如下所示:

for index in df.index:

    counter = counter + 1
    # Calculation of post-match ELO scores for home and away teams
    if df.at[index,'updated'] == 2: # Update next match ELO scores if not yet updated but pre-match ELO scores available

        try:
            all_home_fixtures = df.date_rank[df['localteam_id'] == df.at[index,'localteam_id']]
            next_home_fixture = all_home_fixtures[all_home_fixtures > df.at[index,'date_rank']].min()
            next_home_index = df[(df['date_rank'] == next_home_fixture) & (df['localteam_id'] == df.at[index,'localteam_id'])].index.item()
        except ValueError:
            print('ERROR 1 at' + str(index))
            df.at[index,'updated'] = 4

        try:
            all_away_fixtures = df.date_rank[df['visitorteam_id'] == df.at[index,'visitorteam_id']]
            next_away_fixture = all_away_fixtures[all_away_fixtures > df.at[index,'date_rank']].min()
            next_away_index = df[(df['date_rank'] == next_away_fixture) & (df['visitorteam_id'] == df.at[index,'visitorteam_id'])].index.item()
        except ValueError:
            print('ERROR 2 at' + str(index))
            df.at[index,'updated'] = 4

        # print('Current: ' + str(df.at[index,'fixture_id']) + '; Followed by: ' + str(next_home_fixture))
        # print('Current date rank: ' + str(df.at[index,'date']) + ' ' + str(df.at[index,'date_rank']) + '; Next home date rank: ' + str(df.at[next_home_index,'date_rank']) + '; Next away date rank: ' + str(df.at[next_away_index,'date_rank']))

        df.at[next_home_index, 'home_elo'] = update_elo(df.at[index,'home_elo'],df.at[index,'away_elo'],df.at[index,'actual_score'])
        df.at[next_away_index, 'away_elo'] = update_elo(df.at[index,'away_elo'],df.at[index,'home_elo'],1 - df.at[index,'actual_score']) # Swap function inputs for away team


        df.at[next_home_index, 'updated'] = df.at[next_home_index, 'updated'] + 1
        df.at[next_away_index, 'updated'] = df.at[next_away_index, 'updated'] + 1

        df.at[index,'updated'] = 3

The code works fine for the first couple of rows. I then, however, encounter errors, always for the same rows, even though I cannot see how the rows would differ from others.

该代码适用于前几行。然而,我遇到错误,总是针对相同的行,即使我看不出这些行与其他行有何不同。

  1. If I do not handle the ValueErroras shown above, I receive the error message ValueError: can only convert an array of size 1 to a Python scalarfor the first time after about 250 rows.
  2. If I do handle the ValueErroras shown above, I capture four such errors, two for each of the error-handling blocks (the code works fine otherwise), but the code stops updating any further strength ratings after about 18% of all rows, without throwing any error message.
  1. 如果我不处理ValueError如上所示,我ValueError: can only convert an array of size 1 to a Python scalar在大约 250 行后第一次收到错误消息。
  2. 如果我处理ValueError如上所示,我会捕获四个这样的错误,每个错误处理块两个(否则代码工作正常),但代码在所有行的大约 18% 后停止更新任何进一步的强度评级,没有抛出任何错误信息。

I would very much appreciate it if you could help me (a) understand what causes the error and (b) how to handle them.

如果您能帮助我 (a) 了解导致错误的原因以及 (b) 如何处理它们,我将不胜感激。

Since this is my first post on StackOverflow, I am not yet fully aware of the common posting practices of the forum. Please let me know if there is anything I can improve about my post.

由于这是我在 StackOverflow 上的第一篇文章,我还没有完全了解论坛的常见发帖习惯。如果我的帖子有什么可以改进的地方,请告诉我。

Thank you very much!

非常感谢!

采纳答案by sundance

pd.Series.itemrequires at least one item in the Series to return a scalar. If:

pd.Series.item需要系列中的至少一项来返回标量。如果:

df[(df['date_rank'] == next_home_fixture) & (df['localteam_id'] == df.at[index,'localteam_id'])]

is a Series with length 0, then the .index.item()will throw a ValueError.

是长度为 0 的系列,.index.item()则将抛出 ValueError。

回答by Wei Chen

FYI,

供参考,

You will get similar error if you are applying .itemto a numpy array.

如果您应用.item到 numpy 数组,您将收到类似的错误。

You can solve it with .tolist()in that case.

.tolist()在这种情况下,您可以解决它。