Python 通过添加其他列的值在 Panda 数据框中创建新列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34023918/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 14:21:24  来源:igfitidea点击:

Make new column in Panda dataframe by adding values from other columns

pythonpython-2.7pandas

提问by n00b

I have a dataframe with values like

我有一个数据框,其值类似于

A B
1 4
2 6
3 9

I need to add a new column by adding values from column A and B, like

我需要通过添加列 A 和 B 中的值来添加一个新列,例如

A B C
1 4 5
2 6 8
3 9 12

I believe this can be done using lambda function, but I can't figure out how to do it.

我相信这可以使用 lambda 函数来完成,但我不知道该怎么做。

采纳答案by DeepSpace

Very simple:

很简单:

df['C'] = df['A'] + df['B']

回答by efajardo

The simplest way would be to use DeepSpace answer. However, if you really want to use an anonymous function you can use apply:

最简单的方法是使用 DeepSpace 答案。但是,如果您真的想使用匿名函数,则可以使用 apply:

df['C'] = df.apply(lambda row: row['A'] + row['B'], axis=1)

回答by Anton Protopopov

You could use sumfunction to achieve that as @EdChum mentioned in the comment:

您可以使用sum函数来实现,正如评论中提到的@EdChum:

df['C'] =  df[['A', 'B']].sum(axis=1)

In [245]: df
Out[245]: 
   A  B   C
0  1  4   5
1  2  6   8
2  3  9  12

回答by steveb

As of Pandas version 0.16.0 you can use assignas follows:

从 Pandas 0.16.0 版开始,您可以assign按如下方式使用:

df = pd.DataFrame({"A": [1,2,3], "B": [4,6,9]})
df.assign(C = df.A + df.B)

# Out[383]: 
#    A  B   C
# 0  1  4   5
# 1  2  6   8
# 2  3  9  12

You can add multiple columns this way as follows:

您可以通过这种方式添加多个列,如下所示:

df.assign(C = df.A + df.B,
          Diff = df.B - df.A,
          Mult = df.A * df.B)
# Out[379]: 
#    A  B   C  Diff  Mult
# 0  1  4   5     3     4
# 1  2  6   8     4    12
# 2  3  9  12     6    27

回答by sparrow

Building a little more on Anton's answer, you can add all the columns like this:

在 Anton 的回答基础上再多做一点,您可以像这样添加所有列:

df['sum'] = df[list(df.columns)].sum(axis=1)

回答by Manuel Martinez

You could do:

你可以这样做:

df['C'] = df.sum(axis=1)

If you only want to do numerical values:

如果你只想做数值:

df['C'] = df.sum(axis=1, numeric_only=True)

回答by tgraybam

I wanted to add a comment responding to the error message n00b was getting but I don't have enough reputation. So my comment is an answer in case it helps anyone...

我想添加一条评论来响应 n00b 收到的错误消息,但我没有足够的声誉。所以我的评论是一个答案,以防它帮助任何人......

n00b said:

n00b 说:

I get the following warning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

我收到以下警告:正在尝试在来自 DataFrame 的切片副本上设置值。尝试使用 .loc[row_indexer,col_indexer] = value 代替

He got this error because whatever manipulations he did to his dataframe prior to creating df['C']created a view into the dataframe rather than a copy of it. The error didn't arise form the simple calculation df['C'] = df['A'] + df['B']suggested by DeepSpace.

他得到这个错误是因为他在创建之前对他的数据帧所做的任何操作都创建df['C']了一个数据帧的视图,而不是它的副本。该错误不是df['C'] = df['A'] + df['B']由 DeepSpace 建议的简单计算产生的。

Have a look at the Returning a view versus a copydocs.

查看返回视图与副本文档。

回答by firefly

Concerning n00b's comment: "I get the following warning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead"

关于 n00b 的评论:“我收到以下警告:正在尝试在 DataFrame 切片的副本上设置值。尝试使用 .loc[row_indexer,col_indexer] = value 代替”

I was getting the same error. In my case it was because I was trying to perform the column addition on a dataframe that was created like this:

我遇到了同样的错误。就我而言,这是因为我试图在这样创建的数据帧上执行列添加:

df_b = df[['colA', 'colB', 'colC']]

instead of:

代替:

df_c = pd.DataFrame(df, columns=['colA', 'colB', 'colC'])

df_b is a copy of a slice from df
df_c is an new dataframe. So

df_b 是来自 df 的切片的副本
df_c 是一个新的数据帧。所以

df_c['colD'] = df['colA'] + df['colB']+ df['colC']

will add the columns and won't raise any warning. Same if .sum(axis=1) is used.

将添加列并且不会引发任何警告。如果使用 .sum(axis=1),则相同。

回答by Roushan

Can do using loc

可以使用loc

In [37]:  df = pd.DataFrame({"A":[1,2,3],"B":[4,6,9]})

In [38]: df
Out[38]:
   A  B
0  1  4
1  2  6
2  3  9

In [39]: df['C']=df.loc[:,['A','B']].sum(axis=1)

In [40]: df
Out[40]:
   A  B   C
0  1  4   5
1  2  6   8
2  3  9  12

回答by Rohit Kamboj

You can solve it by adding simply: df['C'] = df['A'] + df['B']

您可以通过简单地添加来解决它:df['C'] = df['A'] + df['B']