pandas 如何合并/组合熊猫中的列？

Question

提问by mati

I have a (example-) dataframe with 4 columns:

我有一个包含 4 列的（示例-）数据框：

data = {'A': ['a', 'b', 'c', 'd', 'e', 'f'],
    'B': [42, 52, np.nan, np.nan, np.nan, np.nan],  
    'C': [np.nan, np.nan, 31, 2, np.nan, np.nan],
    'D': [np.nan, np.nan, np.nan, np.nan, 62, 70]}
df = pd.DataFrame(data, columns = ['A', 'B', 'C', 'D'])

    A   B       C       D
0   a   42.0    NaN     NaN
1   b   52.0    NaN     NaN
2   c   NaN     31.0    NaN
3   d   NaN     2.0     NaN
4   e   NaN     NaN     62.0
5   f   NaN     NaN     70.0

I would now like to merge/combine columns B, C, and D to a new column E like in this example:

我现在想将 B、C 和 D 列合并/组合到一个新的 E 列，如本例所示：

data2 = {'A': ['a', 'b', 'c', 'd', 'e', 'f'],
    'E': [42, 52, 31, 2, 62, 70]}
df2 = pd.DataFrame(data2, columns = ['A', 'E'])

    A   E
0   a   42
1   b   52
2   c   31
3   d   2
4   e   62
5   f   70

I found a quite similar question herebut this adds the merged colums B, C, and D at the end of column A:

我在这里发现了一个非常相似的问题，但这在 A 列的末尾添加了合并的列 B、C 和 D：

0      a
1      b
2      c
3      d
4      e
5      f
6     42
7     52
8     31
9      2
10    62
11    70
dtype: object

Thanks for help.

感谢帮助。

Answer 1

采纳答案by Zero

Option 1
Using assignand drop

选项 1
使用assign和drop

In [644]: cols = ['B', 'C', 'D']

In [645]: df.assign(E=df[cols].sum(1)).drop(cols, 1)
Out[645]:
   A     E
0  a  42.0
1  b  52.0
2  c  31.0
3  d   2.0
4  e  62.0
5  f  70.0

Option 2
Using assignment and drop

选项 2
使用赋值和drop

In [648]: df['E'] = df[cols].sum(1)

In [649]: df = df.drop(cols, 1)

In [650]: df
Out[650]:
   A     E
0  a  42.0
1  b  52.0
2  c  31.0
3  d   2.0
4  e  62.0
5  f  70.0

Option 3Lately, I like the 3rd option.
Using groupby

选项 3最近，我喜欢第三个选项。
使用groupby

In [660]: df.groupby(np.where(df.columns == 'A', 'A', 'E'), axis=1).first() #or sum max min
Out[660]:
   A     E
0  a  42.0
1  b  52.0
2  c  31.0
3  d   2.0
4  e  62.0
5  f  70.0

In [661]: df.columns == 'A'
Out[661]: array([ True, False, False, False], dtype=bool)

In [662]: np.where(df.columns == 'A', 'A', 'E')
Out[662]:
array(['A', 'E', 'E', 'E'],
      dtype='|S1')

Answer 2

回答by Zero

The question as written asks for merge/combine as opposed to sum, so posting this to help folks who find this answer looking for help on coalescing with combine_first, which can be a bit tricky.

所写的问题要求合并/合并，而不是求和，因此发布此问题是为了帮助找到此答案的人寻求与 combine_first 合并的帮助，这可能有点棘手。

df2 = pd.concat([df["A"], 
             df["B"].combine_first(df["C"]).combine_first(df["D"])], 
            axis=1)
df2.rename(columns={"B":"E"}, inplace=True)
   A     E
0  a  42.0
1  b  52.0
2  c  31.0
3  d  2.0 
4  e  62.0
5  f  70.0

What's so tricky about that? in this case there's no problem - but let's say you were pulling the B, C and D values from different dataframes, in which the a,b,c,d,e,f labels were present, but not necessarily in the same order. combine_first() aligns on the index, so you'd need to tack a set_index() on to each of your df references.

这有什么好纠结的？在这种情况下没有问题 - 但假设您从不同的数据帧中提取 B、C 和 D 值，其中存在 a、b、c、d、e、f 标签，但顺序不一定相同。combine_first() 在索引上对齐，因此您需要在每个 df 引用上添加 set_index() 。

df2 = pd.concat([df.set_index("A", drop=False)["A"], 
             df.set_index("A")["B"]\
             .combine_first(df.set_index("A")["C"])\
             .combine_first(df.set_index("A")["D"]).astype(int)], 
            axis=1).reset_index(drop=True)
df2.rename(columns={"B":"E"}, inplace=True)

   A   E
0  a  42
1  b  52
2  c  31
3  d  2 
4  e  62
5  f  70

Answer 3

回答by jezrael

Use differencefor columns names without Aand then get sumor max:

使用difference了不列名A，然后得到sum或max：

cols = df.columns.difference(['A'])
df['E'] = df[cols].sum(axis=1).astype(int)
# df['E'] = df[cols].max(axis=1).astype(int)
df = df.drop(cols, axis=1)
print (df)
   A   E
0  a  42
1  b  52
2  c  31
3  d   2
4  e  62
5  f  70

If multiple values per rows:

如果每行有多个值：

data = {'A': ['a', 'b', 'c', 'd', 'e', 'f'],
    'B': [42, 52, np.nan, np.nan, np.nan, np.nan],  
    'C': [np.nan, np.nan, 31, 2, np.nan, np.nan],
    'D': [10, np.nan, np.nan, np.nan, 62, 70]}
df = pd.DataFrame(data, columns = ['A', 'B', 'C', 'D'])

print (df)
   A     B     C     D
0  a  42.0   NaN  10.0
1  b  52.0   NaN   NaN
2  c   NaN  31.0   NaN
3  d   NaN   2.0   NaN
4  e   NaN   NaN  62.0
5  f   NaN   NaN  70.0

cols = df.columns.difference(['A'])
df['E'] = df[cols].apply(lambda x: ', '.join(x.dropna().astype(int).astype(str)), 1)
df = df.drop(cols, axis=1)
print (df)
   A       E
0  a  42, 10
1  b      52
2  c      31
3  d       2
4  e      62
5  f      70

Answer 4

回答by jpp

You can also use ffillwith iloc:

您还可以使用ffill带iloc：

df['E'] = df.iloc[:, 1:].ffill(1).iloc[:, -1].astype(int)
df = df.iloc[:, [0, -1]]

print(df)

   A   E
0  a  42
1  b  52
2  c  31
3  d   2
4  e  62
5  f  70

pandas 如何合并/组合熊猫中的列？

提问by mati

采纳答案by Zero

回答by Zero

回答by jezrael

回答by jpp

相关推荐

最近更新

标签

pandas 如何合并/组合熊猫中的列？

提问by mati

采纳答案by Zero

回答by Zero

回答by jezrael

回答by jpp

相关推荐

将 Pandas 数据帧中的列从 float 转换为 int

pandas 如何将数据帧列乘以浮点常量？

pandas 尝试修改pandas groupby的列值时出现“ValueError：值的长度与索引的长度不匹配”

pandas 如何根据精确匹配的日期值过滤熊猫数据框

相关推荐

最近更新

标签