使用 np.where() 构建 Pandas 列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14974459/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas Column Construction with np.where()
提问by ajrenold
I'm working through an assignment with Pandas and am using np.where() to create add a column to a Pandas DataFrame with three possible values:
我正在完成与 Pandas 的分配工作,并使用 np.where() 创建向 Pandas DataFrame 添加一个列,其中包含三个可能的值:
fips_df['geog_type'] = np.where(fips_df.fips.str[-3:] != '000', 'county', np.where(fips_df.fips.str[:] == '00000', 'country', 'state'))
The state of the DataFrame after adding the column is like this:
添加列后DataFrame的状态是这样的:
print fips_df[:5]
fips geog_entity fips_prefix geog_type
0 00000 UNITED STATES 00 country
1 01000 ALABAMA 01 state
2 01001 Autauga County, AL 01 county
3 01003 Baldwin County, AL 01 county
4 01005 Barbour County, AL 01 county
This column construction is tested by two asserts. The first passes and the second fails.
此列构造由两个断言测试。第一次通过,第二次失败。
## check the numbers of geog_type
assert set(fips_df['geog_type'].value_counts().iteritems()) == set([('state', 51), ('country', 1), ('county', 3143)])
assert set(fips_df.geog_type.value_counts().iteritems()) == set([('state', 51), ('country', 1), ('county', 3143)])
What is the difference between calling columns as fips_df.geog_type and fips_df['geog_type'] that causes my second assert to fail?
将列调用为 fips_df.geog_type 和 fips_df['geog_type'] 导致我的第二个断言失败有什么区别?
回答by Maxim Egorushkin
Just in case, you can create a new column with much less effort. E.g.:
以防万一,您可以轻松创建一个新列。例如:
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: df = pd.DataFrame(np.random.uniform(size=10))
In [4]: df
Out[4]:
0
0 0.366489
1 0.697744
2 0.570066
3 0.756647
4 0.036149
5 0.817588
6 0.884244
7 0.741609
8 0.628303
9 0.642807
In [5]: categorize = lambda value: "ABC"[int(value > 0.3) + int(value > 0.6)]
In [6]: df["new_col"] = df[0].apply(categorize)
In [7]: df
Out[7]:
0 new_col
0 0.366489 B
1 0.697744 C
2 0.570066 B
3 0.756647 C
4 0.036149 A
5 0.817588 C
6 0.884244 C
7 0.741609 C
8 0.628303 C
9 0.642807 C
回答by Andy Hayden
It shouldbe the same (and will be most of the time)...
它应该是一样的(并且大部分时间都是一样的)......
One situation it's not is when you already have an attribute or method set with that value (in which case it won't be overridden and hence the column won't be accessible with dot notation):
一种情况不是,当您已经使用该值设置了一个属性或方法时(在这种情况下它不会被覆盖,因此该列将无法使用点表示法访问):
In [1]: df = pd.DataFrame([[1, 2] ,[3 ,4]])
In [2]: df.A = 7
In [3]: df.B = lambda: 42
In [4]: df.columns = list('AB')
In [5]: df.A
Out[5]: 7
In [6]: df.B()
Out[6]: 42
In [7]: df['A']
Out[7]:
0 1
1 3
Name: A
Interestingly, dot notation for accessing columnsisn't mentioned in the selection syntax.

