Pandas 合并多个 csv 文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/49111093/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas combine multiple csv files
提问by warrenfitzhenry
I have multiple csv files that I would like to combine into one df.
我有多个 csv 文件,我想将它们合并为一个 df。
They are all in this general format, with two index columns:
它们都是这种通用格式,有两个索引列:
1 2
CU0112-005287-7 Output Energy, (Wh/h) 0.064 0.066
CU0112-005287-7 Lights (Wh) 0 0
1 2
CU0112-001885-L Output Energy, (Wh/h) 1.33 1.317
CU0112-001885-L Lights (Wh) 1.33 1.317
and so on...
等等...
The combined df would be:
合并后的 df 将是:
1 2
CU0112-005287-7 Output Energy, (Wh/h) 0.064 0.066
CU0112-005287-7 Lights (Wh) 0 0
CU0112-001885-L Output Energy, (Wh/h) 1.33 1.317
CU0112-001885-L Lights (Wh) 1.33 1.317
I am trying this code:
我正在尝试这个代码:
import os
import pandas as pd
import glob
files = glob.glob(r'2017-12-05\Aggregated\*.csv') //folder which contains all the csv files
df = pd.merge([pd.read_csv(f, index_col=[0,1])for f in files], how='outer')
df.to_csv(r'\merged.csv')
But I am getting this error:
但我收到此错误:
TypeError: merge() takes at least 2 arguments (2 given)
回答by jezrael
回答by Yayati Sule
You can try the following. I made some changes to the DataFrame combining logic
您可以尝试以下操作。我对 DataFrame 组合逻辑进行了一些更改
import os
import pandas as pd
import glob
files = glob.glob(r'2017-12-05\Aggregated\*.csv') //folder which contains all the csv files
df = reduce(lambda df1,df2: pd.merge(df1,df2,on='id',how='outer'),[pd.read_csv(f, index_col=[0,1])for f in files] )
df.to_csv(r'\merged.csv')
回答by Billy Bonaros
A simple way:
一个简单的方法:
Creating a list with the names of csvs:
创建一个带有 csvs 名称的列表:
files=listdir()
csvs=list()
for file in files:
if file.endswith(".csv"):
csvs.append(file)
concatenate the csvs:
连接 csvs:
data=pd.DataFrame()
for i in csvs:
table=pd.read_csv(i)
data=pd.concat([data,table])