pandas ValueError:DataFrame 的真值不明确

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/48481223/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:05:55  来源:igfitidea点击:

ValueError: The truth value of a DataFrame is ambiguous

pythonpython-3.xpandasvalueerror

提问by Ken Wallace

I have a dataframe that looks like this:

我有一个看起来像这样的数据框:

        total   downloaded  avg_rating
id          
1        2      2           5.0
2       12     12           4.5
3        1      1           5.0
4        1      1           4.0
5        0      0           0.0

I'm trying to add a new column with the percent difference of two of these columns, but only for columns that do not have a 0 in 'downloaded'.

我正在尝试添加一个新列,其中两个列的百分比差异,但仅适用于“已下载”中没有 0 的列。

I'm trying to use a function for this that looks like:

我正在尝试为此使用一个函数,如下所示:

def diff(ratings):
    if ratings[ratings.downloaded > 0]:
        val = (ratings['total'] - ratings['downloaded']) / ratings['downloaded']
    else:
        val = 0
    return val

ratings['Pct Diff'] = diff(ratings)

I'm getting an error:

我收到一个错误:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-129-729c09bf14e8> in <module>()
      6     return val
      7 
----> 8 ratings['Pct Diff'] = diff(ratings)

<ipython-input-129-729c09bf14e8> in diff(ratings)
      1 def diff(ratings):
----> 2     if ratings[ratings.downloaded > 0]:
      3         val = (ratings['total'] - ratings['downloaded']) / 
ratings['downloaded']
      4     else:
      5         val = 0

~\Anaconda3\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
    953         raise ValueError("The truth value of a {0} is ambiguous. "
    954                          "Use a.empty, a.bool(), a.item(), a.any() or 
a.all()."
--> 955                          .format(self.__class__.__name__))
    956 
    957     __bool__ = __nonzero__

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Can someone please help me understand what this error means?

有人可以帮我理解这个错误是什么意思吗?

Also, would this be a good application for an apply function? Can I use conditions in an apply? How would I use it in this case?

另外,这是否是应用功能的好应用?我可以在申请中使用条件吗?在这种情况下我将如何使用它?

采纳答案by jpp

The reason for your error is you are attempting to do a row-wise (vectorised calculation), but in fact in your function diff()ratings[ratings.downloaded > 0]returns a subset of the dataframe and preceding it by ifis ambiguous. The error message reflect this.

出现错误的原因是您正在尝试按行进行(向量化计算),但实际上在您的函数中diff()ratings[ratings.downloaded > 0]返回数据帧的一个子集,并且在其前面if是不明确的。错误消息反映了这一点。

You may wish to review Indexing and Selecting Data. The below solution sets the default value 0 by setting it at the beginning.

您可能希望查看索引和选择数据。下面的解决方案通过在开始时设置它来设置默认值 0。

import pandas as pd

df = pd.DataFrame([[2, 2, 5.0], [12, 12, 4.5], [10, 5, 3.0],
                   [20, 2, 3.5], [3, 0, 0.0], [0, 0, 0.0]],
                  columns=['total', 'downloaded', 'avg_rating'])

df['Pct Diff'] = 0
df.loc[df['downloaded'] > 0, 'Pct Diff'] = (df['total'] - df['downloaded']) / df['total']

#   total   downloaded  avg_rating  Pct Diff
# 0 2   2   5.0 0.0
# 1 12  12  4.5 0.0
# 2 10  5   3.0 0.5
# 3 20  2   3.5 0.9
# 4 3   0   0.0 0.0
# 5 0   0   0.0 0.0