Python Pandas 一次更新多列

Question

提问by flyingmeatball

I'm trying to update a couple fields at once - I have two data sources and I'm trying to reconcile them. I know I could do some ugly merging and then delete columns, but was expecting this code below to work:

我正在尝试一次更新几个字段 - 我有两个数据源，我正在尝试协调它们。我知道我可以做一些丑陋的合并然后删除列，但希望下面的代码可以工作：

df = pd.DataFrame([['A','B','C',np.nan,np.nan,np.nan],
                  ['D','E','F',np.nan,np.nan,np.nan],[np.nan,np.nan,np.nan,'a','b','d'],
                  [np.nan,np.nan,np.nan,'d','e','f']], columns = ['Col1','Col2','Col3','col1_v2','col2_v2','col3_v2'])

print df

 Col1 Col2 Col3 col1_v2 col2_v2 col3_v2
0    A    B    C     NaN     NaN     NaN
1    D    E    F     NaN     NaN     NaN
2  NaN  NaN  NaN       a       b       d
3  NaN  NaN  NaN       d       e       f

#update 
df.loc[df['Col1'].isnull(),['Col1','Col2', 'Col3']] = df[['col1_v2','col2_v2','col3_v2']]

print df

 Col1 Col2 Col3 col1_v2 col2_v2 col3_v2
0    A    B    C     NaN     NaN     NaN
1    D    E    F     NaN     NaN     NaN
2  NaN  NaN  NaN       a       b       d
3  NaN  NaN  NaN       d       e       f

My desired output would be:

我想要的输出是：

 Col1 Col2 Col3 col1_v2 col2_v2 col3_v2
0    A    B    C     NaN     NaN     NaN
1    D    E    F     NaN     NaN     NaN
2    a    b    c       a       b       d
3    d    e    f       d       e       f

I'm betting it has to do with updating/setting on a slice, but I always use .loc to update values, just not on multiple columns at once.

我打赌它与切片上的更新/设置有关，但我总是使用 .loc 来更新值，而不是一次在多个列上。

I feel like there's an easy way to do this that I'm just missing, any thoughts/suggestions would be welcome!

我觉得有一种简单的方法可以做到这一点，但我只是想念，欢迎提出任何想法/建议！

Edit to reflect solution belowThanks for the comment on the indexes. However, I have a question about this as it relates to series. If I wanted to update an individual series in a similar manner, I could do something like this:

编辑以反映下面的解决方案感谢您对索引的评论。但是，我对此有疑问，因为它与系列有关。如果我想以类似的方式更新单个系列，我可以这样做：

df.loc[df['Col1'].isnull(),['Col1']] = df['col1_v2']

print df

  Col1 Col2 Col3 col1_v2 col2_v2 col3_v2
0    A    B    C     NaN     NaN     NaN
1    D    E    F     NaN     NaN     NaN
2    a  NaN  NaN       a       b       d
3    d  NaN  NaN       d       e       f

Note that I didn't account for the indexes here, I filtered to a 2x1 series and set that equal to a 4x1 series, yet it handled it correctly. Thoughts? I'm trying to understand the functionality a bit better of something I've used for a while, but I guess don't have a full grasp of the underlying mechanism/rule

请注意，我没有考虑这里的索引，我过滤到 2x1 系列并将其设置为等于 4x1 系列，但它正确处理了它。想法？我试图更好地理解我使用过一段时间的功能，但我想没有完全掌握底层机制/规则

Answer 1

采纳答案by piRSquared

you want to replace

你想更换

print df.loc[df['Col1'].isnull(),['Col1','Col2', 'Col3']]

  Col1 Col2 Col3
2  NaN  NaN  NaN
3  NaN  NaN  NaN

With:

和：

replace_with_this = df.loc[df['Col1'].isnull(),['col1_v2','col2_v2', 'col3_v2']]
print replace_with_this

  col1_v2 col2_v2 col3_v2
2       a       b       d
3       d       e       f

Seems reasonable. However, when you do the assignment, you need to account for index alignment, which includes columns.

似乎有道理。但是，在进行分配时，您需要考虑索引对齐，其中包括列。

So, this should work:

所以，这应该有效：

df.loc[df['Col1'].isnull(),['Col1','Col2', 'Col3']] = replace_with_this.values

print df

  Col1 Col2 Col3 col1_v2 col2_v2 col3_v2
0    A    B    C     NaN     NaN     NaN
1    D    E    F     NaN     NaN     NaN
2    a    b    d       a       b       d
3    d    e    f       d       e       f

I accounted for columns by using .valuesat the end. This stripped the column information from the replace_with_thisdataframe and just used the values in the appropriate positions.

我.values在最后使用了列。这从replace_with_this数据框中剥离了列信息，只使用了适当位置的值。

Answer 2

回答by jdg

In the "take the hill" spirit, I offer the below solution which yields the requested result.

本着“上山”的精神，我提供了以下解决方案，可以产生所要求的结果。

I realize this is not exactly what you are after as I am not slicing the df (in the reasonable - but non functional - way in which you propose).

我意识到这并不完全是您所追求的，因为我没有对 df 进行切片（以您提出的合理但非功能性的方式）。

#Does not work when indexing on np.nan, so I fill with some arbitrary value. 
df = df.fillna('AAA')

#mask to determine which rows to update
mask = df['Col1'] == 'AAA'

#dict with key value pairs for columns to be updated
mp = {'Col1':'col1_v2','Col2':'col2_v2','Col3':'col3_v2'}

#update
for k in mp: 
     df.loc[mask,k] = df[mp.get(k)]

#swap back np.nans for the arbitrary values
df = df.replace('AAA',np.nan)

Output:

输出：

Col1    Col2    Col3    col1_v2     col2_v2     col3_v2
A       B       C       NaN         NaN         NaN
D       E       F       NaN         NaN         NaN
a       b       d       a           b           d
d       e       f       d           e           f

The error I get if I do not replace nans is below. I'm going to research exactly where that error stems from.

如果我不替换 nans，我得到的错误如下。我将研究该错误的确切来源。

ValueError: array is not broadcastable to correct shape

Python Pandas 一次更新多列

提问by flyingmeatball

采纳答案by piRSquared

回答by jdg

相关推荐

最近更新

标签

Python Pandas 一次更新多列

提问by flyingmeatball

采纳答案by piRSquared

回答by jdg

相关推荐

Python 使用布尔系列/数组从熊猫数据框中选择

Python 将 Jupyter 笔记本导出为 HTML 时隐藏代码

Python 当我在 jupyter notebook 中使用 matplotlib 时，它总是引发“matplotlib 当前正在使用非 GUI 后端”错误？

Python 如何检查 nan 和空字符串

相关推荐

最近更新

标签