仅使用公共列的多个数据框的 pandas.concat

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39862654/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:08:26  来源:igfitidea点击:

pandas.concat of multiple data frames using only common columns

pythonpandasdataframe

提问by VM1

I have multiple pandas data frame objects cost1, cost2 , cost3 ....

我有多个 Pandas 数据框对象 cost1, cost2, cost3 ....

  1. They have different column names (and number of columns) but have some in common.
  2. Number of columns are fairly large in each dataframe, hence handpicking the common columns manually will be painful.
  1. 它们有不同的列名(和列数),但有一些共同点。
  2. 每个数据框中的列数都相当大,因此手动挑选公共列会很痛苦。

How can I append rows from all of these data frames into one single data frame while retaining elements from only the common column names ?

如何将所有这些数据框中的行附加到一个数据框中,同时仅保留公共列名称中的元素?

As of now I have

到目前为止,我有

frames=[cost1,cost2,cost3...]

帧=[成本1,成本2,成本3...]

new_combined = pd.concat( frames,ignore_index=True)

new_combined = pd.concat(frames,ignore_index=True)

This obviously contains columns which are not common across all data frames.

这显然包含在所有数据框中不常见的列。

采纳答案by Ami Tavory

You can find the common columns with Python's set.intersection:

您可以使用 Python 找到常见的列set.intersection

common_cols = list(set.intersection(*(set(df.columns) for df in frames)))

To concatenate using only the common columns, you can use

要仅使用公共列连接,您可以使用

pd.concat([df[common_cols] for df in frames], ignore_index=True)

回答by Alok Nayak

For future readers, Above functionality can be implemented by pandas itself. Pandas can concat dataframe while keeping common columns only, if you provide join='inner' argument in pd.concat. e.g.

对于未来的读者,上述功能可以由 Pandas 自己实现。如果您在 pd.concat 中提供 join='inner' 参数,Pandas 可以在仅保留公共列的同时连接数据帧。例如

pd.concat(frames,join='inner', ignore_index=True)