Python Pandas 一次更新多列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37400246/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 19:21:00  来源:igfitidea点击:

Pandas update multiple columns at once

pythonpandasdataframe

提问by flyingmeatball

I'm trying to update a couple fields at once - I have two data sources and I'm trying to reconcile them. I know I could do some ugly merging and then delete columns, but was expecting this code below to work:

我正在尝试一次更新几个字段 - 我有两个数据源,我正在尝试协调它们。我知道我可以做一些丑陋的合并然后删除列,但希望下面的代码可以工作:

df = pd.DataFrame([['A','B','C',np.nan,np.nan,np.nan],
                  ['D','E','F',np.nan,np.nan,np.nan],[np.nan,np.nan,np.nan,'a','b','d'],
                  [np.nan,np.nan,np.nan,'d','e','f']], columns = ['Col1','Col2','Col3','col1_v2','col2_v2','col3_v2'])

print df

 Col1 Col2 Col3 col1_v2 col2_v2 col3_v2
0    A    B    C     NaN     NaN     NaN
1    D    E    F     NaN     NaN     NaN
2  NaN  NaN  NaN       a       b       d
3  NaN  NaN  NaN       d       e       f

#update 
df.loc[df['Col1'].isnull(),['Col1','Col2', 'Col3']] = df[['col1_v2','col2_v2','col3_v2']]

print df

 Col1 Col2 Col3 col1_v2 col2_v2 col3_v2
0    A    B    C     NaN     NaN     NaN
1    D    E    F     NaN     NaN     NaN
2  NaN  NaN  NaN       a       b       d
3  NaN  NaN  NaN       d       e       f

My desired output would be:

我想要的输出是:

 Col1 Col2 Col3 col1_v2 col2_v2 col3_v2
0    A    B    C     NaN     NaN     NaN
1    D    E    F     NaN     NaN     NaN
2    a    b    c       a       b       d
3    d    e    f       d       e       f

I'm betting it has to do with updating/setting on a slice, but I always use .loc to update values, just not on multiple columns at once.

我打赌它与切片上的更新/设置有关,但我总是使用 .loc 来更新值,而不是一次在多个列上。

I feel like there's an easy way to do this that I'm just missing, any thoughts/suggestions would be welcome!

我觉得有一种简单的方法可以做到这一点,但我只是想念,欢迎提出任何想法/建议!

Edit to reflect solution belowThanks for the comment on the indexes. However, I have a question about this as it relates to series. If I wanted to update an individual series in a similar manner, I could do something like this:

编辑以反映下面的解决方案感谢您对索引的评论。但是,我对此有疑问,因为它与系列有关。如果我想以类似的方式更新单个系列,我可以这样做:

df.loc[df['Col1'].isnull(),['Col1']] = df['col1_v2']

print df

  Col1 Col2 Col3 col1_v2 col2_v2 col3_v2
0    A    B    C     NaN     NaN     NaN
1    D    E    F     NaN     NaN     NaN
2    a  NaN  NaN       a       b       d
3    d  NaN  NaN       d       e       f

Note that I didn't account for the indexes here, I filtered to a 2x1 series and set that equal to a 4x1 series, yet it handled it correctly. Thoughts? I'm trying to understand the functionality a bit better of something I've used for a while, but I guess don't have a full grasp of the underlying mechanism/rule

请注意,我没有考虑这里的索引,我过滤到 2x1 系列并将其设置为等于 4x1 系列,但它正确处理了它。想法?我试图更好地理解我使用过一段时间的功能,但我想没有完全掌握底层机制/规则

采纳答案by piRSquared

you want to replace

你想更换

print df.loc[df['Col1'].isnull(),['Col1','Col2', 'Col3']]

  Col1 Col2 Col3
2  NaN  NaN  NaN
3  NaN  NaN  NaN

With:

和:

replace_with_this = df.loc[df['Col1'].isnull(),['col1_v2','col2_v2', 'col3_v2']]
print replace_with_this

  col1_v2 col2_v2 col3_v2
2       a       b       d
3       d       e       f

Seems reasonable. However, when you do the assignment, you need to account for index alignment, which includes columns.

似乎有道理。但是,在进行分配时,您需要考虑索引对齐,其中包括列。

So, this should work:

所以,这应该有效:

df.loc[df['Col1'].isnull(),['Col1','Col2', 'Col3']] = replace_with_this.values

print df

  Col1 Col2 Col3 col1_v2 col2_v2 col3_v2
0    A    B    C     NaN     NaN     NaN
1    D    E    F     NaN     NaN     NaN
2    a    b    d       a       b       d
3    d    e    f       d       e       f

I accounted for columns by using .valuesat the end. This stripped the column information from the replace_with_thisdataframe and just used the values in the appropriate positions.

.values在最后使用了列。这从replace_with_this数据框中剥离了列信息,只使用了适当位置的值。

回答by jdg

In the "take the hill" spirit, I offer the below solution which yields the requested result.

本着“上山”的精神,我提供了以下解决方案,可以产生所要求的结果。

I realize this is not exactly what you are after as I am not slicing the df (in the reasonable - but non functional - way in which you propose).

我意识到这并不完全是您所追求的,因为我没有对 df 进行切片(以您提出的合理但非功能性的方式)。

#Does not work when indexing on np.nan, so I fill with some arbitrary value. 
df = df.fillna('AAA')

#mask to determine which rows to update
mask = df['Col1'] == 'AAA'

#dict with key value pairs for columns to be updated
mp = {'Col1':'col1_v2','Col2':'col2_v2','Col3':'col3_v2'}

#update
for k in mp: 
     df.loc[mask,k] = df[mp.get(k)]

#swap back np.nans for the arbitrary values
df = df.replace('AAA',np.nan)

Output:

输出:

Col1    Col2    Col3    col1_v2     col2_v2     col3_v2
A       B       C       NaN         NaN         NaN
D       E       F       NaN         NaN         NaN
a       b       d       a           b           d
d       e       f       d           e           f

The error I get if I do not replace nans is below. I'm going to research exactly where that error stems from.

如果我不替换 nans,我得到的错误如下。我将研究该错误的确切来源。

ValueError: array is not broadcastable to correct shape