用户警告:Pandas 不允许通过新的属性名称创建列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/52129876/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
UserWarning: Pandas doesn't allow columns to be created via a new attribute name
提问by Bushra Ghazal
I am stuck with my pandas script.
我被我的Pandas脚本困住了。
Actually , i am working with two csv file(one input and the other output file). i want to copy all the rows of two column and want to make calculation and then copy it to another dataframe (output file).
实际上,我正在使用两个 csv 文件(一个输入文件和另一个输出文件)。我想复制两列的所有行并进行计算,然后将其复制到另一个数据帧(输出文件)。
The columns are as follows :
各列如下:
'lat', 'long','PHCount', 'latOffset_1', 'longOffset_1','PH_Lat_1', 'PH_Long_1', 'latOffset_2', 'longOffset_2', 'PH_Lat_2', 'PH_Long_2', 'latOffset_3', 'longOffset_3','PH_Lat_3', 'PH_Long_3', 'latOffset_4', 'longOffset_4','PH_Lat_4', 'PH_Long_4'.
i want to take the column 'lat' and 'latOffset_1' , do some calculation and put it in another new column('PH_Lat_1') which i have already created.
我想取列 'lat' 和 'latOffset_1' ,做一些计算并将它放在我已经创建的另一个新列('PH_Lat_1')中。
My function is :
我的功能是:
def calculate_latoffset(latoffset): #Calculating Lat offset.
a=(df2['lat']-(2*latoffset))
return a
The main code :
主要代码:
for i in range(1,5):
print(i)
a='PH_lat_%d' % i
print (a)
b='latOffset_%d' % i
print (b)
df2.a = df2.apply(lambda x: calculate_latoffset(x[b]), axis=1)
Since the column name just differ by (1,2,3,4). so i want to call the function calculate_latoffset and calculate the all the rows of all the columns(PH_Lat_1, PH_Lat_2, PH_Lat_3,PH_Lat_4) in one go.
由于列名仅相差 (1,2,3,4)。所以我想调用函数calculate_latoffset 并一次性计算所有列(PH_Lat_1、PH_Lat_2、PH_Lat_3、PH_Lat_4)的所有行。
When using the above code i am getting this error :
使用上述代码时,我收到此错误:
basic_conversion.py:46: UserWarning: Pandas doesn't allow columns to be created via a new attribute name - see https://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute-access
df2.a = df2.apply(lambda x: calculate_latoffset(x[b]), axis=1)
is it possible ? Please kindly help
是否可以 ?请帮助
回答by AaronDT
Simply use df2['a']
instead of df2.a
简单地使用df2['a']
代替df2.a
回答by Ying Cai
The solution I can think of is to use .loc
to get the column. You can try df.loc[:,a]
instead of df.a
.
Pandas dataframe columns cannot be created using the dot method to avoid potential conflicts with the dataframe attributes. Hope this helps
我能想到的解决方案是使用.loc
获取列。您可以尝试df.loc[:,a]
代替df.a
. Pandas 数据框列不能使用 dot 方法创建,以避免与数据框属性的潜在冲突。希望这可以帮助
回答by YaOzI
This is a Warning not an Error, so your code could still run through, but probably not following your intention.
这是警告而不是错误,因此您的代码仍然可以运行,但可能不符合您的意图。
Short answer: To create a new column for DataFrame, never use attribute access, the correct way is to use either
[]
or.loc
indexing:>>> df a b 0 7 6 1 5 8 >>> df['c'] = df.a + df.b >>> # OR >>> df.loc[:, 'c'] = df.a + df.b >>> df # c is an new added column a b c 0 7 6 13 1 5 8 13
简短回答:要为 DataFrame 创建新列,切勿使用属性 access,正确的方法是使用
[]
或.loc
indexing:>>> df a b 0 7 6 1 5 8 >>> df['c'] = df.a + df.b >>> # OR >>> df.loc[:, 'c'] = df.a + df.b >>> df # c is an new added column a b c 0 7 6 13 1 5 8 13
More explaination, Seires and DataFrame are core classes and data structures in pandas, and of course they are Python classes too, so there are some minor distinction when involving attribute access between pandas DataFrame and normal Python objects. But it's well documentedand can be easily understood. Just few points to note:
多解释一下,Seires和DataFrame是pandas中的核心类和数据结构,当然它们也是Python类,所以在涉及pandas DataFrame和普通Python对象的属性访问时,会有一些细微的区别。但它有据可查,很容易理解。只需注意几点:
In Python, users may dynamically add data attributes of their own to an instance object using attribute access.
>>> class Dog(object): ... pass >>> dog = Dog() >>> vars(dog) {} >>> superdog = Dog() >>> vars(superdog) {} >>> dog.legs = 'I can run.' >>> superdog.wings = 'I can fly.' >>> vars(dog) {'legs': 'I can run.'} >>> vars(superdog) {'wings': 'I can fly.'}
In pandas, indexand columnare closely related to the data structure, you may accessan index on a Series, column on a DataFrame as an attribute.
>>> import pandas as pd >>> import numpy as np >>> data = np.random.randint(low=0, high=10, size=(2,2)) >>> df = pd.DataFrame(data, columns=['a', 'b']) >>> df a b 0 7 6 1 5 8 >>> vars(df) {'_is_copy': None, '_data': BlockManager Items: Index(['a', 'b'], dtype='object') Axis 1: RangeIndex(start=0, stop=2, step=1) IntBlock: slice(0, 2, 1), 2 x 2, dtype: int64, '_item_cache': {}}
But, pandas attribute access is mainly a convinience for reading from and modifying an existing elementof a Series or column of a DataFrame.
>>> df.a 0 7 1 5 Name: a, dtype: int64 >>> df.b = [1, 1] >>> df a b 0 7 1 1 5 1
And, the convinience is a tradeoff for full functionality. E.g. you can create a DataFrame object with column names
['space bar', '1', 'loc', 'min', 'index']
, but you can't access them as an attribute, because they are either not a valid Python identifier1
,space bar
or conflicts with an existing method name.>>> data = np.random.randint(0, 10, size=(2, 5)) >>> df_special_col_names = pd.DataFrame(data, columns=['space bar', '1', 'loc', 'min', 'index']) >>> df_special_col_names space bar 1 loc min index 0 4 4 4 8 9 1 3 0 1 2 3
In these cases, the
.loc
,.iloc
and[]
indexing is the defined wayto fullly access/operate index and columns of Series and DataFrame objects.>>> df_special_col_names['space bar'] 0 4 1 3 Name: space bar, dtype: int64 >>> df_special_col_names.loc[:, 'min'] 0 8 1 2 Name: min, dtype: int64 >>> df_special_col_names.iloc[:, 1] 0 4 1 0 Name: 1, dtype: int64
As to the topic, to create a new column for DataFrame, as you can see,
df.c = df.a + df.b
just created an new attribute along side to the core data structure, so starting from version0.21.0
and later, this behavior will raise aUserWarning
(silent no more).>>> df a b 0 7 1 1 5 1 >>> df.c = df.a + df.b __main__:1: UserWarning: Pandas doesn't allow columns to be created via a new attribute name - see https://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute-access >>> df['d'] = df.a + df.b >>> df a b d 0 7 1 8 1 5 1 6 >>> df.c 0 8 1 6 dtype: int64 >>> vars(df) {'_is_copy': None, '_data': BlockManager Items: Index(['a', 'b', 'd'], dtype='object') Axis 1: RangeIndex(start=0, stop=2, step=1) IntBlock: slice(0, 2, 1), 2 x 2, dtype: int64 IntBlock: slice(2, 3, 1), 1 x 2, dtype: int64, '_item_cache': {}, 'c': 0 8 1 6 dtype: int64}
Finally, back to the Short answer.
在 Python 中,用户可以使用属性访问将自己的数据属性动态添加到实例对象中。
>>> class Dog(object): ... pass >>> dog = Dog() >>> vars(dog) {} >>> superdog = Dog() >>> vars(superdog) {} >>> dog.legs = 'I can run.' >>> superdog.wings = 'I can fly.' >>> vars(dog) {'legs': 'I can run.'} >>> vars(superdog) {'wings': 'I can fly.'}
在Pandas,索引和列密切相关的数据结构,您可以访问在数据帧上的系列指数,列作为属性。
>>> import pandas as pd >>> import numpy as np >>> data = np.random.randint(low=0, high=10, size=(2,2)) >>> df = pd.DataFrame(data, columns=['a', 'b']) >>> df a b 0 7 6 1 5 8 >>> vars(df) {'_is_copy': None, '_data': BlockManager Items: Index(['a', 'b'], dtype='object') Axis 1: RangeIndex(start=0, stop=2, step=1) IntBlock: slice(0, 2, 1), 2 x 2, dtype: int64, '_item_cache': {}}
但是,pandas 属性访问主要是方便读取和修改数据帧的系列或列的现有元素。
>>> df.a 0 7 1 5 Name: a, dtype: int64 >>> df.b = [1, 1] >>> df a b 0 7 1 1 5 1
而且,便利性是对完整功能的权衡。例如,您可以创建一个带有列名的 DataFrame 对象
['space bar', '1', 'loc', 'min', 'index']
,但您不能将它们作为属性访问,因为它们要么不是有效的 Python 标识符1
,space bar
要么与现有的方法名称冲突。>>> data = np.random.randint(0, 10, size=(2, 5)) >>> df_special_col_names = pd.DataFrame(data, columns=['space bar', '1', 'loc', 'min', 'index']) >>> df_special_col_names space bar 1 loc min index 0 4 4 4 8 9 1 3 0 1 2 3
在这些情况下,
.loc
,.iloc
和[]
索引是完全访问/操作索引和 Series 和 DataFrame 对象的列的定义方式。>>> df_special_col_names['space bar'] 0 4 1 3 Name: space bar, dtype: int64 >>> df_special_col_names.loc[:, 'min'] 0 8 1 2 Name: min, dtype: int64 >>> df_special_col_names.iloc[:, 1] 0 4 1 0 Name: 1, dtype: int64
至于主题,要为 DataFrame 创建一个新列,如您所见,
df.c = df.a + df.b
只是在核心数据结构旁边创建了一个新属性,因此从 version0.21.0
和更高版本开始,此行为将引发UserWarning
(不再沉默)。>>> df a b 0 7 1 1 5 1 >>> df.c = df.a + df.b __main__:1: UserWarning: Pandas doesn't allow columns to be created via a new attribute name - see https://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute-access >>> df['d'] = df.a + df.b >>> df a b d 0 7 1 8 1 5 1 6 >>> df.c 0 8 1 6 dtype: int64 >>> vars(df) {'_is_copy': None, '_data': BlockManager Items: Index(['a', 'b', 'd'], dtype='object') Axis 1: RangeIndex(start=0, stop=2, step=1) IntBlock: slice(0, 2, 1), 2 x 2, dtype: int64 IntBlock: slice(2, 3, 1), 1 x 2, dtype: int64, '_item_cache': {}, 'c': 0 8 1 6 dtype: int64}
最后,回到简短的回答。
回答by bowei
In df2.apply(lambda x: calculate_latoffset(x[b]), axis=1)
you are creating a 5 column dataframe and you were trying to assign the value to a single field. Do df2[a] = calculate_latoffset(df2[b])
instead should deliver the desired output.
在df2.apply(lambda x: calculate_latoffset(x[b]), axis=1)
您创建一个 5 列数据框时,您试图将值分配给单个字段。难道df2[a] = calculate_latoffset(df2[b])
不是应该提供所需的输出。