Python 将具有常量值的列添加到 Pandas 数据框

Question

提问by yemu

Given a DataFrame:

给定一个数据帧：

np.random.seed(0)
df = pd.DataFrame(np.random.randn(3, 3), columns=list('ABC'), index=[1, 2, 3])
df

          A         B         C
1  1.764052  0.400157  0.978738
2  2.240893  1.867558 -0.977278
3  0.950088 -0.151357 -0.103219

What is the simplest way to add a new column containing a constant value eg 0?

添加包含常量值（例如 0）的新列的最简单方法是什么？

          A         B         C  new
1  1.764052  0.400157  0.978738    0
2  2.240893  1.867558 -0.977278    0
3  0.950088 -0.151357 -0.103219    0

This is my solution, but I don't know why this puts NaN into 'new' column?

这是我的解决方案，但我不知道为什么这会将 NaN 放入“新”列？

df['new'] = pd.Series([0 for x in range(len(df.index))])

          A         B         C  new
1  1.764052  0.400157  0.978738  0.0
2  2.240893  1.867558 -0.977278  0.0
3  0.950088 -0.151357 -0.103219  NaN

Answer 1

采纳答案by Phillip Cloud

The reason this puts NaNinto a column is because df.indexand the Indexof your right-hand-side object are different. @zach shows the proper way to assign a new column of zeros. In general, pandastries to do as much alignment of indices as possible. One downside is that when indices are not aligned you get NaNwherever they aren'taligned. Play around with the reindexand alignmethods to gain some intuition for alignment works with objects that have partially, totally, and not-aligned-all aligned indices. For example here's how DataFrame.align()works with partially aligned indices:

这放入NaN一列的原因是因为df.index和Index您右侧对象的不同。@zach 显示了分配新的零列的正确方法。通常，pandas尝试尽可能多地对齐索引。一个缺点是，当索引未对齐时，您会到达NaN它们未对齐的任何地方。玩转reindex和align方法以获得一些对齐方式的直觉，适用于具有部分、完全和非全部对齐索引的对象。例如，这里是如何DataFrame.align()使用部分对齐的索引：

In [7]: from pandas import DataFrame

In [8]: from numpy.random import randint

In [9]: df = DataFrame({'a': randint(3, size=10)})

In [10]:

In [10]: df
Out[10]:
   a
0  0
1  2
2  0
3  1
4  0
5  0
6  0
7  0
8  0
9  0

In [11]: s = df.a[:5]

In [12]: dfa, sa = df.align(s, axis=0)

In [13]: dfa
Out[13]:
   a
0  0
1  2
2  0
3  1
4  0
5  0
6  0
7  0
8  0
9  0

In [14]: sa
Out[14]:
0     0
1     2
2     0
3     1
4     0
5   NaN
6   NaN
7   NaN
8   NaN
9   NaN
Name: a, dtype: float64

Answer 2

回答by cs95

Super simple in-place assignment: `df['new'] = 0`

超级简单的就地分配： `df['new'] = 0`

For in-place modification, perform direct assignment. This assignment is broadcasted by pandas for each row.

对于就地修改，执行直接分配。此分配由熊猫为每一行广播。

df = pd.DataFrame('x', index=range(4), columns=list('ABC'))
df

   A  B  C
0  x  x  x
1  x  x  x
2  x  x  x
3  x  x  x

df['new'] = 'y'
# Same as,
# df.loc[:, 'new'] = 'y'
df

   A  B  C new
0  x  x  x   y
1  x  x  x   y
2  x  x  x   y
3  x  x  x   y

Note for object columns

对象列的注意事项

If you want to add an column of empty lists, here is my advice:

如果你想添加一列空列表，这是我的建议：

Consider not doing this. objectcolumns are bad news in terms of performance. Rethink how your data is structured.
Consider storing your data in a sparse data structure. More information: sparse data structures
If you must store a column of lists, ensure not to copy the same reference multiple times.
```
# Wrong
df['new'] = [[]] * len(df)
# Right
df['new'] = [[] for _ in range(len(df))]
```

考虑不这样做。object列在性能方面是个坏消息。重新思考数据的结构。
考虑将数据存储在稀疏数据结构中。更多信息：稀疏数据结构

如果您必须存储一列列表，请确保不要多次复制相同的引用。

# Wrong
df['new'] = [[]] * len(df)
# Right
df['new'] = [[] for _ in range(len(df))]

Generating a copy: `df.assign(new=0)`

生成副本： `df.assign(new=0)`

If you need a copy instead, use DataFrame.assign:

如果您需要副本，请使用DataFrame.assign：

df.assign(new='y')

   A  B  C new
0  x  x  x   y
1  x  x  x   y
2  x  x  x   y
3  x  x  x   y

And, if you need to assign multiple such columns with the same value, this is as simple as,

而且，如果您需要为多个这样的列分配相同的值，这很简单，

c = ['new1', 'new2', ...]
df.assign(**dict.fromkeys(c, 'y'))

   A  B  C new1 new2
0  x  x  x    y    y
1  x  x  x    y    y
2  x  x  x    y    y
3  x  x  x    y    y

Multiple column assignment

多列分配

Finally, if you need to assign multiple columns with different values, you can use assignwith a dictionary.

最后，如果您需要为多个列分配不同的值，您可以使用assign字典。

c = {'new1': 'w', 'new2': 'y', 'new3': 'z'}
df.assign(**c)

   A  B  C new1 new2 new3
0  x  x  x    w    y    z
1  x  x  x    w    y    z
2  x  x  x    w    y    z
3  x  x  x    w    y    z

Answer 3

回答by Grant Shannon

Here is another one liner using lambdas(create column with constant value = 10)

这是使用 lambdas 的另一个班轮（创建具有常量值 = 10 的列）

df['newCol'] = df.apply(lambda x: 10, axis=1)

before

前

df
    A           B           C
1   1.764052    0.400157    0.978738
2   2.240893    1.867558    -0.977278
3   0.950088    -0.151357   -0.103219

after

后

df
        A           B           C           newCol
    1   1.764052    0.400157    0.978738    10
    2   2.240893    1.867558    -0.977278   10
    3   0.950088    -0.151357   -0.103219   10

Answer 4

回答by Roko Mijic

Some of these answers might be out of date as of 2020. With modern pandas you can just do:

其中一些答案可能会在 2020 年过时。使用现代熊猫，您可以这样做：

df['new'] = 0

Python 将具有常量值的列添加到 Pandas 数据框

提问by yemu

采纳答案by Phillip Cloud

回答by cs95

Super simple in-place assignment: `df['new'] = 0`

超级简单的就地分配： `df['new'] = 0`

Note for object columns

对象列的注意事项

Generating a copy: `df.assign(new=0)`

生成副本： `df.assign(new=0)`

Multiple column assignment

多列分配

回答by Grant Shannon

回答by Roko Mijic

相关推荐

最近更新

标签

Python 将具有常量值的列添加到 Pandas 数据框

提问by yemu

采纳答案by Phillip Cloud

回答by cs95

Super simple in-place assignment: df['new'] = 0

超级简单的就地分配： df['new'] = 0

Note for object columns

对象列的注意事项

Generating a copy: df.assign(new=0)

生成副本： df.assign(new=0)

Multiple column assignment

多列分配

回答by Grant Shannon

回答by Roko Mijic

相关推荐

IPython Notebook - 提前退出单元格

Python Pandas GroupBy 获取组列表

Python 了解 sklearn 中 CountVectorizer 中的 `ngram_range` 参数

在python的集合操作中添加vs更新

相关推荐

最近更新

标签

Super simple in-place assignment: `df['new'] = 0`

超级简单的就地分配： `df['new'] = 0`

Generating a copy: `df.assign(new=0)`

生成副本： `df.assign(new=0)`