pandas 如何合并数据帧熊猫中的两行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/41693000/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to merge two rows in a dataframe pandas
提问by Carmen
I have a dataframe with two rows and I'd like to merge the two rows to one row. The df Looks as follows:
我有一个包含两行的数据框,我想将两行合并为一行。df 如下所示:
PC Rating CY Rating PY HT
0 DE101 NaN AA GV
0 DE101 AA+ NaN GV
I have tried to create two seperate dataframes and Combine them with df.merge(df2) without success. The result should be the following
我试图创建两个单独的数据帧并将它们与 df.merge(df2) 结合但没有成功。结果应该如下
PC Rating CY Rating PY HT
0 DE101 AA+ AA GV
Any ideas? Thanks in advance Could df.update be a possible solution?
有任何想法吗?提前致谢 df.update 是一个可能的解决方案吗?
EDIT:
编辑:
df.head(1).combine_first(df.tail(1))
This works for the example above. However, for columns containing numerical values, this approach doesn't yield the desired output, e.g. for
这适用于上面的示例。但是,对于包含数值的列,这种方法不会产生所需的输出,例如对于
PC Rating CY Rating PY HT MV1 MV2
0 DE101 NaN AA GV 0 20
0 DE101 AA+ NaN GV 10 0
The output should be:
输出应该是:
PC Rating CY Rating PY HT MV1 MV2
0 DE101 AA+ AA GV 10 20
The formula above doesn't sum up the values in the last two columns, but takes the values in the first row of the dataframe.
上面的公式不会对最后两列中的值求和,而是采用数据帧第一行中的值。
PC Rating CY Rating PY HT MV1 MV2
0 DE101 AA+ AA GV 0 20
How could this problem be fixed?
如何解决这个问题?
采纳答案by Nickil Maveli
You can make use of DF.combine_first()
method after separating the DF
into 2 parts where the null values in the first half would be replaced with the finite values in the other half while keeping it's other finite values untouched:
您可以在将前半部分的空值替换为另一半的有限值的 2 部分DF.combine_first()
后使用方法,DF
同时保持其他有限值不变:
df.head(1).combine_first(df.tail(1))
# Practically this is same as → df.head(1).fillna(df.tail(1))
Incase there are columns of mixed datatype, partitioning them into it's constituent dtype
columns and then performing various operations on it would be feasible by chaining them across.
如果存在混合数据类型的列,将它们划分为组成dtype
列,然后通过将它们链接起来对其执行各种操作是可行的。
obj_df = df.select_dtypes(include=[np.object])
num_df = df.select_dtypes(exclude=[np.object])
obj_df.head(1).combine_first(obj_df.tail(1)).join(num_df.head(1).add(num_df.tail(1)))
回答by Zero
You could use max
with transpose like
你可以max
像转置一样使用
In [2103]: df.max().to_frame().T
Out[2103]:
PC Rating CY Rating PY HT MV1 MV2
0 DE101 AA+ AA GV 10 20