使用 np.where() 构建 Pandas 列

Question

提问by ajrenold

I'm working through an assignment with Pandas and am using np.where() to create add a column to a Pandas DataFrame with three possible values:

我正在完成与 Pandas 的分配工作，并使用 np.where() 创建向 Pandas DataFrame 添加一个列，其中包含三个可能的值：

fips_df['geog_type'] = np.where(fips_df.fips.str[-3:] != '000', 'county', np.where(fips_df.fips.str[:] == '00000', 'country', 'state'))

The state of the DataFrame after adding the column is like this:

添加列后DataFrame的状态是这样的：

print fips_df[:5]

    fips         geog_entity fips_prefix geog_type
0  00000       UNITED STATES          00   country
1  01000             ALABAMA          01     state
2  01001  Autauga County, AL          01    county
3  01003  Baldwin County, AL          01    county
4  01005  Barbour County, AL          01    county

This column construction is tested by two asserts. The first passes and the second fails.

此列构造由两个断言测试。第一次通过，第二次失败。

## check the numbers of geog_type

assert set(fips_df['geog_type'].value_counts().iteritems()) == set([('state', 51), ('country', 1), ('county', 3143)])

assert set(fips_df.geog_type.value_counts().iteritems()) == set([('state', 51), ('country', 1), ('county', 3143)])

What is the difference between calling columns as fips_df.geog_type and fips_df['geog_type'] that causes my second assert to fail?

将列调用为 fips_df.geog_type 和 fips_df['geog_type'] 导致我的第二个断言失败有什么区别？

Answer 1

回答by Maxim Egorushkin

Just in case, you can create a new column with much less effort. E.g.:

以防万一，您可以轻松创建一个新列。例如：

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: df = pd.DataFrame(np.random.uniform(size=10))

In [4]: df
Out[4]: 
          0
0  0.366489
1  0.697744
2  0.570066
3  0.756647
4  0.036149
5  0.817588
6  0.884244
7  0.741609
8  0.628303
9  0.642807

In [5]: categorize = lambda value: "ABC"[int(value > 0.3) + int(value > 0.6)]

In [6]: df["new_col"] = df[0].apply(categorize)

In [7]: df
Out[7]: 
          0 new_col
0  0.366489       B
1  0.697744       C
2  0.570066       B
3  0.756647       C
4  0.036149       A
5  0.817588       C
6  0.884244       C
7  0.741609       C
8  0.628303       C
9  0.642807       C

Answer 2

回答by Andy Hayden

It shouldbe the same (and will be most of the time)...

它应该是一样的（并且大部分时间都是一样的）......

One situation it's not is when you already have an attribute or method set with that value (in which case it won't be overridden and hence the column won't be accessible with dot notation):

一种情况不是，当您已经使用该值设置了一个属性或方法时（在这种情况下它不会被覆盖，因此该列将无法使用点表示法访问）：

In [1]: df = pd.DataFrame([[1, 2] ,[3 ,4]])

In [2]: df.A = 7

In [3]: df.B = lambda: 42

In [4]: df.columns = list('AB')

In [5]: df.A
Out[5]: 7

In [6]: df.B()
Out[6]: 42

In [7]: df['A']
Out[7]: 
0    1
1    3
Name: A

Interestingly, dot notation for accessing columnsisn't mentioned in the selection syntax.

有趣的是，选择语法中没有提到用于访问列的点符号。

使用 np.where() 构建 Pandas 列

提问by ajrenold

回答by Maxim Egorushkin

回答by Andy Hayden

相关推荐

最近更新

标签

使用 np.where() 构建 Pandas 列

提问by ajrenold

回答by Maxim Egorushkin

回答by Andy Hayden

相关推荐

pandas 将 Int64Index 更改为 Index 并将 dtype=int64 更改为 dtype=object

pandas 在 ipython 中绘图时抑制对象的输出

将 csv 文件转换为 Pandas 数据框

Pandas 删除时间范围之外的行

相关推荐

最近更新

标签