仅使用公共列的多个数据框的 pandas.concat
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39862654/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas.concat of multiple data frames using only common columns
提问by VM1
I have multiple pandas data frame objects cost1, cost2 , cost3 ....
我有多个 Pandas 数据框对象 cost1, cost2, cost3 ....
- They have different column names (and number of columns) but have some in common.
- Number of columns are fairly large in each dataframe, hence handpicking the common columns manually will be painful.
- 它们有不同的列名(和列数),但有一些共同点。
- 每个数据框中的列数都相当大,因此手动挑选公共列会很痛苦。
How can I append rows from all of these data frames into one single data frame while retaining elements from only the common column names ?
如何将所有这些数据框中的行附加到一个数据框中,同时仅保留公共列名称中的元素?
As of now I have
到目前为止,我有
frames=[cost1,cost2,cost3...]
帧=[成本1,成本2,成本3...]
new_combined = pd.concat( frames,ignore_index=True)
new_combined = pd.concat(frames,ignore_index=True)
This obviously contains columns which are not common across all data frames.
这显然包含在所有数据框中不常见的列。
采纳答案by Ami Tavory
You can find the common columns with Python's set.intersection
:
您可以使用 Python 找到常见的列set.intersection
:
common_cols = list(set.intersection(*(set(df.columns) for df in frames)))
To concatenate using only the common columns, you can use
要仅使用公共列连接,您可以使用
pd.concat([df[common_cols] for df in frames], ignore_index=True)
回答by Alok Nayak
For future readers, Above functionality can be implemented by pandas itself. Pandas can concat dataframe while keeping common columns only, if you provide join='inner' argument in pd.concat. e.g.
对于未来的读者,上述功能可以由 Pandas 自己实现。如果您在 pd.concat 中提供 join='inner' 参数,Pandas 可以在仅保留公共列的同时连接数据帧。例如
pd.concat(frames,join='inner', ignore_index=True)