在 Pandas DataFrame Python 中添加新列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18942506/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 12:20:40  来源:igfitidea点击:

Add new column in Pandas DataFrame Python

pythonpandasdataframe

提问by Santiago Munez

I have dataframe in Pandas for example:

例如,我在 Pandas 中有数据框:

Col1 Col2
A     1 
B     2
C     3

Now if I would like to add one more column named Col3 and the value is based on Col2. In formula, if Col2 > 1, then Col3 is 0, otherwise would be 1. So, in the example above. The output would be:

现在,如果我想再添加一个名为 Col3 的列,并且该值基于 Col2。在公式中,如果 Col2 > 1,则 Col3 为 0,否则为 1。因此,在上面的示例中。输出将是:

Col1 Col2 Col3
A    1    1
B    2    0
C    3    0

Any idea on how to achieve this?

关于如何实现这一目标的任何想法?

采纳答案by Viktor Kerkez

You just do an opposite comparison. if Col2 <= 1. This will return a boolean Series with Falsevalues for those greater than 1 and Truevalues for the other. If you convert it to an int64dtype, Truebecomes 1and Falsebecome 0,

你只是做一个相反的比较。if Col2 <= 1. 这将返回一个布尔系列,False其中包含大于 1 的True值和另一个的值。如果您将其转换为int64dtype,则True变成1False变成0

df['Col3'] = (df['Col2'] <= 1).astype(int)

If you want a more general solution, where you can assign any number to Col3depending on the value of Col2you should do something like:

如果你想要一个更通用的解决方案,你可以Col3根据Col2你的值分配任何数字,你应该这样做:

df['Col3'] = df['Col2'].map(lambda x: 42 if x > 1 else 55)

Or:

或者:

df['Col3'] = 0
condition = df['Col2'] > 1
df.loc[condition, 'Col3'] = 42
df.loc[~condition, 'Col3'] = 55

回答by Tony Rollett

The easiest way that I found for adding a column to a DataFrame was to use the "add" function. Here's a snippet of code, also with the output to a CSV file. Note that including the "columns" argument allows you to set the name of the column (which happens to be the same as the name of the np.array that I used as the source of the data).

我发现向 DataFrame 添加列的最简单方法是使用“添加”函数。这是一段代码,还有输出到 CSV 文件。请注意,包含“columns”参数允许您设置列的名称(恰好与我用作数据源的 np.array 的名称相同)。

#  now to create a PANDAS data frame
df = pd.DataFrame(data = FF_maxRSSBasal, columns=['FF_maxRSSBasal'])
# from here on, we use the trick of creating a new dataframe and then "add"ing it
df2 = pd.DataFrame(data = FF_maxRSSPrism, columns=['FF_maxRSSPrism'])
df = df.add( df2, fill_value=0 )
df2 = pd.DataFrame(data = FF_maxRSSPyramidal, columns=['FF_maxRSSPyramidal'])
df = df.add( df2, fill_value=0 )
df2 = pd.DataFrame(data = deltaFF_strainE22, columns=['deltaFF_strainE22'])
df = df.add( df2, fill_value=0 )
df2 = pd.DataFrame(data = scaled, columns=['scaled'])
df = df.add( df2, fill_value=0 )
df2 = pd.DataFrame(data = deltaFF_orientation, columns=['deltaFF_orientation'])
df = df.add( df2, fill_value=0 )
#print(df)
df.to_csv('FF_data_frame.csv')