Python 将值设置为熊猫数据框的整列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/44723183/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-20 00:23:47  来源:igfitidea点击:

Set value to an entire column of a pandas dataframe

pythonpandasdataframe

提问by Ledger Yu

I'm trying to set the entire column of a dataframe to a specific value.

我正在尝试将数据框的整个列设置为特定值。

In  [1]: df
Out [1]: 
     issueid   industry
0        001        xxx
1        002        xxx
2        003        xxx
3        004        xxx
4        005        xxx

From what I've seen, locis the best practice when replacing values in a dataframe (or isn't it?):

从我所看到的,loc是替换数据帧中的值时的最佳实践(或者不是?):

In  [2]: df.loc[:,'industry'] = 'yyy'

However, I still received this much talked-about warning message:

但是,我仍然收到了这条备受关注的警告消息:

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead

If I do

如果我做

In  [3]: df['industry'] = 'yyy'

I got the same warning message.

我收到了同样的警告信息。

Any ideas? Working with Python 3.5.2 and pandas 0.18.1.

有任何想法吗?使用 Python 3.5.2 和 Pandas 0.18.1。

采纳答案by Alex P. Miller

Python can do unexpected things when new objects are defined from existing ones. You stated in a comment above that your dataframe is defined along the lines of df = df_all.loc[df_all['issueid']==specific_id,:]. In this case, dfis really just a stand-in for the rows stored in the df_allobject: a new object is NOT created in memory.

当从现有对象定义新对象时,Python 可能会做意想不到的事情。您在上面的评论中指出,您的数据框是按照df = df_all.loc[df_all['issueid']==specific_id,:]. 在这种情况下,df实际上只是存储在df_all对象中的行的替代:不在内存中创建新对象。

To avoid these issues altogether, I often have to remind myself to use the copymodule, which explicitly forces objects to be copied in memory so that methods called on the new objects are not applied to the source object. I had the same problem as you, and avoided it using the deepcopyfunction.

为了完全避免这些问题,我经常不得不提醒自己使用该copy模块,该模块明确强制将对象复制到内存中,以便对新对象调用的方法不会应用于源对象。我和你有同样的问题,并使用该deepcopy功能避免了它。

In your case, this should get rid of the warning message:

在您的情况下,这应该消除警告消息:

from copy import deepcopy
df = deepcopy(df_all.loc[df_all['issueid']==specific_id,:])
df['industry'] = 'yyy'


EDIT: Also see David M.'s excellent comment below!

编辑:另请参阅下面 David M. 的精彩评论!

df = df_all.loc[df_all['issueid']==specific_id,:].copy()
df['industry'] = 'yyy'

回答by Mina HE

You can use the assignfunction:

您可以使用该assign功能:

df = df.assign(industry='yyy')

回答by Nwoye CID

df.loc[:,'industry'] = 'yyy'

This does the magic. You are to add '.loc' with ':' for all rows. Hope it helps

这很神奇。您要为所有行添加带有 ':' 的 '.loc'。希望能帮助到你

回答by HH1

You can do :

你可以做 :

df['industry'] = 'yyy'

回答by Daniel González Cortés

Assuming your Data frame is like 'Data' you have to consider if your data is a string or an integer. Both are treated differently. So in this case you need be specific about that.

假设您的数据框类似于“数据”,您必须考虑您的数据是字符串还是整数。两者的处理方式不同。因此,在这种情况下,您需要对此进行具体说明。

import pandas as pd

data = [('001','xxx'), ('002','xxx'), ('003','xxx'), ('004','xxx'), ('005','xxx')]

df = pd.DataFrame(data,columns=['issueid', 'industry'])

print("Old DataFrame")
print(df)

df.loc[:,'industry'] = str('yyy')

print("New DataFrame")
print(df)

Now if want to put numbers instead of letters you must create and array

现在,如果要放置数字而不是字母,则必须创建和排列

list_of_ones = [1,1,1,1,1]
df.loc[:,'industry'] = list_of_ones
print(df)

Or if you are using Numpy

或者,如果您使用的是 Numpy

import numpy as np
n = len(df)
df.loc[:,'industry'] = np.ones(n)
print(df)

回答by John Mutuma

I had a similar issue before even with this approach df.loc[:,'industry'] = 'yyy', but once I refreshed the notebook, it ran well.

即使使用这种方法df.loc[:,'industry'] = 'yyy',我之前也遇到过类似的问题,但是一旦我刷新了笔记本,它就运行良好。

You may want to try refreshing the cells after you have df.loc[:,'industry'] = 'yyy'.

您可能想尝试在完成后刷新单元格df.loc[:,'industry'] = 'yyy'

回答by hukai916

Seems to me that:

在我看来:

df1 = df[df['col1']==some_value] WILL NOTcreate a new DataFrame, basically, changes in df1 will be reflected in the parent df. This leads to the warning. Whereas, df1 = df[df['col1]]==some_value].copy() WILLcreate a new DataFrame, and changes in df1 will not be reflected in df. the copy() method is recommended if you don't want to make changes to your original df.

df1 = df[df['col1']==some_value]不会创建新的 DataFrame,基本上,df1 中的更改将反映在父 df 中。这导致警告。而 df1 = df[df['col1]]==some_value].copy()创建一个新的 DataFrame,并且 df1 中的更改不会反映在 df 中。如果您不想更改原始 df,建议使用 copy() 方法。

回答by Azim

This provides you with the possibility of adding conditions on the rows and then change all the cells of a specific column corresponding to those rows:

这使您可以在行上添加条件,然后更改与这些行对应的特定列的所有单元格:

df.loc[(df['issueid'] == '001'), 'industry'] = str('yyy')

回答by Andy

Change your .locline to:

将您的.loc线路更改为:

df['industry'] = 'yyy'

Example output

示例输出

>>> df
   issueid industry
0        1      xxx
1        2      xxx
2        3      xxx
3        4      xxx
4        5      xxx
>>> df['industry'] = 'yyy'
>>> df
   issueid industry
0        1      yyy
1        2      yyy
2        3      yyy
3        4      yyy
4        5      yyy