Python ValueError：必须仅传递带有布尔值的 DataFrame

Question

提问by Umang Mistry

Question

题

In this datafile, the United States is broken up into four regions using the "REGION" column.

在此数据文件中，使用“REGION”列将美国分为四个区域。

Create a query that finds the counties that belong to regions 1 or 2, whose name starts with 'Washington', and whose POPESTIMATE2015 was greater than their POPESTIMATE 2014.

创建一个查询，查找属于区域 1 或 2、名称以“Washington”开头且 POPESTIMATE2015 大于其 POPESTIMATE 2014 的县。

This function should return a 5x2 DataFrame with the columns = ['STNAME', 'CTYNAME'] and the same index ID as the census_df (sorted ascending by index).

此函数应返回一个 5x2 数据帧，其中列 = ['STNAME', 'CTYNAME'] 和与 census_df 相同的索引 ID（按索引升序排序）。

CODE

代码

    def answer_eight():
    counties=census_df[census_df['SUMLEV']==50]
    regions = counties[(counties[counties['REGION']==1]) | (counties[counties['REGION']==2])]
    washingtons = regions[regions[regions['COUNTY']].str.startswith("Washington")]
    grew = washingtons[washingtons[washingtons['POPESTIMATE2015']]>washingtons[washingtons['POPESTIMATES2014']]]
    return grew[grew['STNAME'],grew['COUNTY']]

outcome = answer_eight()
assert outcome.shape == (5,2)
assert list (outcome.columns)== ['STNAME','CTYNAME']
print(tabulate(outcome, headers=["index"]+list(outcome.columns),tablefmt="orgtbl"))

ERROR

错误

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-77-546e58ae1c85> in <module>()
      6     return grew[grew['STNAME'],grew['COUNTY']]
      7 
----> 8 outcome = answer_eight()
      9 assert outcome.shape == (5,2)
     10 assert list (outcome.columns)== ['STNAME','CTYNAME']

<ipython-input-77-546e58ae1c85> in answer_eight()
      1 def answer_eight():
      2     counties=census_df[census_df['SUMLEV']==50]
----> 3     regions = counties[(counties[counties['REGION']==1]) | (counties[counties['REGION']==2])]
      4     washingtons = regions[regions[regions['COUNTY']].str.startswith("Washington")]
      5     grew = washingtons[washingtons[washingtons['POPESTIMATE2015']]>washingtons[washingtons['POPESTIMATES2014']]]

/opt/conda/lib/python3.5/site-packages/pandas/core/frame.py in __getitem__(self, key)
   1991             return self._getitem_array(key)
   1992         elif isinstance(key, DataFrame):
-> 1993             return self._getitem_frame(key)
   1994         elif is_mi_columns:
   1995             return self._getitem_multilevel(key)

/opt/conda/lib/python3.5/site-packages/pandas/core/frame.py in _getitem_frame(self, key)
   2066     def _getitem_frame(self, key):
   2067         if key.values.size and not com.is_bool_dtype(key.values):
-> 2068             raise ValueError('Must pass DataFrame with boolean values only')
   2069         return self.where(key)
   2070 

ValueError: Must pass DataFrame with boolean values only

I am clueless. Where am I going wrong?

我一窍不通。我哪里错了？

Thanks

谢谢

Answer 1

采纳答案by EdChum

You're trying to use a different shaped df to mask your df, this is wrong, additionally the way you're passing the conditions is being used incorrectly. When you compare a column or series in a df with a scalar to produce a boolean mask you should pass just the condition, not use this successively.

您正在尝试使用不同形状的 df 来掩盖您的 df，这是错误的，另外您传递条件的方式使用不正确。当您将 df 中的列或系列与标量进行比较以生成布尔掩码时，您应该只传递条件，而不是连续使用它。

def answer_eight():
    counties=census_df[census_df['SUMLEV']==50]
    # this is wrong you're passing the df here multiple times
    regions = counties[(counties[counties['REGION']==1]) | (counties[counties['REGION']==2])]
    # here you're doing it again
    washingtons = regions[regions[regions['COUNTY']].str.startswith("Washington")]
    # here you're doing here again also
    grew = washingtons[washingtons[washingtons['POPESTIMATE2015']]>washingtons[washingtons['POPESTIMATES2014']]]
    return grew[grew['STNAME'],grew['COUNTY']]

you want:

你要：

def answer_eight():
    counties=census_df[census_df['SUMLEV']==50]
    regions = counties[(counties['REGION']==1]) | (counties['REGION']==2])]
    washingtons = regions[regions['COUNTY'].str.startswith("Washington")]
    grew = washingtons[washingtons['POPESTIMATE2015']>washingtons['POPESTIMATES2014']]
    return grew[['STNAME','COUNTY']]

Answer 2

回答by gourav chatterjee

def answer_eight():
    df=census_df[census_df['SUMLEV']==50]
    #df=census_df
    df=df[(df['REGION']==1) | (df['REGION']==2)]
    df=df[df['CTYNAME'].str.startswith('Washington')]
    df=df[df['POPESTIMATE2015'] > df['POPESTIMATE2014']]
    df=df[['STNAME','CTYNAME']]
    print(df.shape)
    return df.head(5)

Answer 3

回答by yogs


def answer_eight():
    county = census_df[census_df['SUMLEV']==50]
    req_col = ['STNAME','CTYNAME']

    region = county[(county['REGION']<3) & (county['POPESTIMATE2015']>county['POPESTIMATE2014']) & (county['CTYNAME'].str.startswith('Washington'))]
    region = region[req_col]

    return region
answer_eight()

Python ValueError：必须仅传递带有布尔值的 DataFrame

提问by Umang Mistry

采纳答案by EdChum

回答by gourav chatterjee

回答by yogs

相关推荐

最近更新

标签

Python ValueError：必须仅传递带有布尔值的 DataFrame

提问by Umang Mistry

采纳答案by EdChum

回答by gourav chatterjee

回答by yogs

相关推荐

拆分数据集中的Python随机状态

Python 类型错误：列表索引必须是整数，而不是 str（实际上是布尔转换）

Python 如何使用 `enumerate` 迭代 `dict` 并随着迭代解压索引、键和值

Python仅枚举反向索引

相关推荐

最近更新

标签