Python 将具有常量值的列添加到 Pandas 数据框

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/24039023/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 03:52:29  来源:igfitidea点击:

Add column with constant value to pandas dataframe

pythonpandas

提问by yemu

Given a DataFrame:

给定一个数据帧:

np.random.seed(0)
df = pd.DataFrame(np.random.randn(3, 3), columns=list('ABC'), index=[1, 2, 3])
df

          A         B         C
1  1.764052  0.400157  0.978738
2  2.240893  1.867558 -0.977278
3  0.950088 -0.151357 -0.103219

What is the simplest way to add a new column containing a constant value eg 0?

添加包含常量值(例如 0)的新列的最简单方法是什么?

          A         B         C  new
1  1.764052  0.400157  0.978738    0
2  2.240893  1.867558 -0.977278    0
3  0.950088 -0.151357 -0.103219    0


This is my solution, but I don't know why this puts NaN into 'new' column?

这是我的解决方案,但我不知道为什么这会将 NaN 放入“新”列?

df['new'] = pd.Series([0 for x in range(len(df.index))])

          A         B         C  new
1  1.764052  0.400157  0.978738  0.0
2  2.240893  1.867558 -0.977278  0.0
3  0.950088 -0.151357 -0.103219  NaN

采纳答案by Phillip Cloud

The reason this puts NaNinto a column is because df.indexand the Indexof your right-hand-side object are different. @zach shows the proper way to assign a new column of zeros. In general, pandastries to do as much alignment of indices as possible. One downside is that when indices are not aligned you get NaNwherever they aren'taligned. Play around with the reindexand alignmethods to gain some intuition for alignment works with objects that have partially, totally, and not-aligned-all aligned indices. For example here's how DataFrame.align()works with partially aligned indices:

这放入NaN一列的原因是因为df.indexIndex您右侧对象的 不同。@zach 显示了分配新的零列的正确方法。通常,pandas尝试尽可能多地对齐索引。一个缺点是,当索引未对齐时,您会到达NaN它们对齐的任何地方。玩转reindexalign方法以获得一些对齐方式的直觉,适用于具有部分、完全和非全部对齐索引的对象。例如,这里是如何DataFrame.align()使用部分对齐的索引:

In [7]: from pandas import DataFrame

In [8]: from numpy.random import randint

In [9]: df = DataFrame({'a': randint(3, size=10)})

In [10]:

In [10]: df
Out[10]:
   a
0  0
1  2
2  0
3  1
4  0
5  0
6  0
7  0
8  0
9  0

In [11]: s = df.a[:5]

In [12]: dfa, sa = df.align(s, axis=0)

In [13]: dfa
Out[13]:
   a
0  0
1  2
2  0
3  1
4  0
5  0
6  0
7  0
8  0
9  0

In [14]: sa
Out[14]:
0     0
1     2
2     0
3     1
4     0
5   NaN
6   NaN
7   NaN
8   NaN
9   NaN
Name: a, dtype: float64

回答by cs95

Super simple in-place assignment: df['new'] = 0

超级简单的就地分配: df['new'] = 0

For in-place modification, perform direct assignment. This assignment is broadcasted by pandas for each row.

对于就地修改,执行直接分配。此分配由熊猫为每一行广播。

df = pd.DataFrame('x', index=range(4), columns=list('ABC'))
df

   A  B  C
0  x  x  x
1  x  x  x
2  x  x  x
3  x  x  x

df['new'] = 'y'
# Same as,
# df.loc[:, 'new'] = 'y'
df

   A  B  C new
0  x  x  x   y
1  x  x  x   y
2  x  x  x   y
3  x  x  x   y

Note for object columns

对象列的注意事项

If you want to add an column of empty lists, here is my advice:

如果你想添加一列空列表,这是我的建议:

  • Consider not doing this. objectcolumns are bad news in terms of performance. Rethink how your data is structured.
  • Consider storing your data in a sparse data structure. More information: sparse data structures
  • If you must store a column of lists, ensure not to copy the same reference multiple times.

    # Wrong
    df['new'] = [[]] * len(df)
    # Right
    df['new'] = [[] for _ in range(len(df))]
    
  • 考虑不这样做。object列在性能方面是个坏消息。重新思考数据的结构。
  • 考虑将数据存储在稀疏数据结构中。更多信息:稀疏数据结构
  • 如果您必须存储一列列表,请确保不要多次复制相同的引用。

    # Wrong
    df['new'] = [[]] * len(df)
    # Right
    df['new'] = [[] for _ in range(len(df))]
    


Generating a copy: df.assign(new=0)

生成副本: df.assign(new=0)

If you need a copy instead, use DataFrame.assign:

如果您需要副本,请使用DataFrame.assign

df.assign(new='y')

   A  B  C new
0  x  x  x   y
1  x  x  x   y
2  x  x  x   y
3  x  x  x   y

And, if you need to assign multiple such columns with the same value, this is as simple as,

而且,如果您需要为多个这样的列分配相同的值,这很简单,

c = ['new1', 'new2', ...]
df.assign(**dict.fromkeys(c, 'y'))

   A  B  C new1 new2
0  x  x  x    y    y
1  x  x  x    y    y
2  x  x  x    y    y
3  x  x  x    y    y

Multiple column assignment

多列分配

Finally, if you need to assign multiple columns with different values, you can use assignwith a dictionary.

最后,如果您需要为多个列分配不同的值,您可以使用assign字典。

c = {'new1': 'w', 'new2': 'y', 'new3': 'z'}
df.assign(**c)

   A  B  C new1 new2 new3
0  x  x  x    w    y    z
1  x  x  x    w    y    z
2  x  x  x    w    y    z
3  x  x  x    w    y    z

回答by Grant Shannon

Here is another one liner using lambdas(create column with constant value = 10)

这是使用 lambdas 的另一个班轮(创建具有常量值 = 10 的列)

df['newCol'] = df.apply(lambda x: 10, axis=1)

before

df
    A           B           C
1   1.764052    0.400157    0.978738
2   2.240893    1.867558    -0.977278
3   0.950088    -0.151357   -0.103219

after

df
        A           B           C           newCol
    1   1.764052    0.400157    0.978738    10
    2   2.240893    1.867558    -0.977278   10
    3   0.950088    -0.151357   -0.103219   10

回答by Roko Mijic

Some of these answers might be out of date as of 2020. With modern pandas you can just do:

其中一些答案可能会在 2020 年过时。使用现代熊猫,您可以这样做:

df['new'] = 0