使用 np.where() 构建 Pandas 列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14974459/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 20:39:51  来源:igfitidea点击:

Pandas Column Construction with np.where()

pandas

提问by ajrenold

I'm working through an assignment with Pandas and am using np.where() to create add a column to a Pandas DataFrame with three possible values:

我正在完成与 Pandas 的分配工作,并使用 np.where() 创建向 Pandas DataFrame 添加一个列,其中包含三个可能的值:

fips_df['geog_type'] = np.where(fips_df.fips.str[-3:] != '000', 'county', np.where(fips_df.fips.str[:] == '00000', 'country', 'state'))

The state of the DataFrame after adding the column is like this:

添加列后DataFrame的状态是这样的:

print fips_df[:5]

    fips         geog_entity fips_prefix geog_type
0  00000       UNITED STATES          00   country
1  01000             ALABAMA          01     state
2  01001  Autauga County, AL          01    county
3  01003  Baldwin County, AL          01    county
4  01005  Barbour County, AL          01    county

This column construction is tested by two asserts. The first passes and the second fails.

此列构造由两个断言测试。第一次通过,第二次失败。

## check the numbers of geog_type

assert set(fips_df['geog_type'].value_counts().iteritems()) == set([('state', 51), ('country', 1), ('county', 3143)])

assert set(fips_df.geog_type.value_counts().iteritems()) == set([('state', 51), ('country', 1), ('county', 3143)])

What is the difference between calling columns as fips_df.geog_type and fips_df['geog_type'] that causes my second assert to fail?

将列调用为 fips_df.geog_type 和 fips_df['geog_type'] 导致我的第二个断言失败有什么区别?

回答by Maxim Egorushkin

Just in case, you can create a new column with much less effort. E.g.:

以防万一,您可以轻松创建一个新列。例如:

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: df = pd.DataFrame(np.random.uniform(size=10))

In [4]: df
Out[4]: 
          0
0  0.366489
1  0.697744
2  0.570066
3  0.756647
4  0.036149
5  0.817588
6  0.884244
7  0.741609
8  0.628303
9  0.642807

In [5]: categorize = lambda value: "ABC"[int(value > 0.3) + int(value > 0.6)]

In [6]: df["new_col"] = df[0].apply(categorize)

In [7]: df
Out[7]: 
          0 new_col
0  0.366489       B
1  0.697744       C
2  0.570066       B
3  0.756647       C
4  0.036149       A
5  0.817588       C
6  0.884244       C
7  0.741609       C
8  0.628303       C
9  0.642807       C

回答by Andy Hayden

It shouldbe the same (and will be most of the time)...

应该是一样的(并且大部分时间都是一样的)......

One situation it's not is when you already have an attribute or method set with that value (in which case it won't be overridden and hence the column won't be accessible with dot notation):

一种情况不是,当您已经使用该值设置了一个属性或方法时(在这种情况下它不会被覆盖,因此该列将无法使用点表示法访问):

In [1]: df = pd.DataFrame([[1, 2] ,[3 ,4]])

In [2]: df.A = 7

In [3]: df.B = lambda: 42

In [4]: df.columns = list('AB')

In [5]: df.A
Out[5]: 7

In [6]: df.B()
Out[6]: 42

In [7]: df['A']
Out[7]: 
0    1
1    3
Name: A

Interestingly, dot notation for accessing columnsisn't mentioned in the selection syntax.

有趣的是,选择语法中没有提到用于访问列的点符号