使用 Python 合并两个 CSV 文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16265831/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 22:09:36  来源:igfitidea点击:

Merging two CSV files using Python

pythoncsvdictionarymergekey

提问by Rex

OK I have read several threads here on Stack Overflow. I thought this would be fairly easy for me to do but I find that I still do not have a very good grasp of Python. I tried the example located at How to combine 2 csv files with common column value, but both files have different number of linesand that was helpful but I still do not have the results that I was hoping to achieve.

好的,我已经阅读了 Stack Overflow 上的几个主题。我认为这对我来说很容易做到,但我发现我仍然没有很好地掌握 Python。我尝试了位于How to combine 2 csv files with common column value 中的示例,但两个文件的行数不同,这很有帮助,但我仍然没有得到我希望达到的结果。

Essentially I have 2 csv files with a common first column. I would like to merge the 2. i.e.

基本上我有 2 个 csv 文件,它们有一个共同的第一列。我想合并 2. 即

filea.csv

文件.csv

title,stage,jan,feb
darn,3.001,0.421,0.532
ok,2.829,1.036,0.751
three,1.115,1.146,2.921

fileb.csv

文件.csv

title,mar,apr,may,jun,
darn,0.631,1.321,0.951,1.751
ok,1.001,0.247,2.456,0.3216
three,0.285,1.283,0.924,956

output.csv (not the one I am getting but what I want)

output.csv(不是我得到的,而是我想要的)

title,stage,jan,feb,mar,apr,may,jun
darn,3.001,0.421,0.532,0.631,1.321,0.951,1.751
ok,2.829,1.036,0.751,1.001,0.247,2.456,0.3216
three,1.115,1.146,2.921,0.285,1.283,0.924,956

output.csv (the output that I actually got)

output.csv(我实际得到的输出)

title,feb,may
ok,0.751,2.456
three,2.921,0.924
darn,0.532,0.951

The code I was trying:

我正在尝试的代码:

'''
testing merging of 2 csv files
'''
import csv
import array
import os

with open('Z:\Desktop\test\filea.csv') as f:
    r = csv.reader(f, delimiter=',')
    dict1 = {row[0]: row[3] for row in r}

with open('Z:\Desktop\test\fileb.csv') as f:
    r = csv.reader(f, delimiter=',')
    #dict2 = {row[0]: row[3] for row in r}
    dict2 = {row[0:3] for row in r}

print str(dict1)
print str(dict2)

keys = set(dict1.keys() + dict2.keys())
with open('Z:\Desktop\test\output.csv', 'wb') as f:
    w = csv.writer(f, delimiter=',')
    w.writerows([[key, dict1.get(key, "''"), dict2.get(key, "''")] for key in keys])

Any help is greatly appreciated.

任何帮助是极大的赞赏。

回答by Xymostech

You need to store all of the extra rows in the files in your dictionary, not just one of them:

您需要将所有额外行存储在字典中的文件中,而不仅仅是其中之一:

dict1 = {row[0]: row[1:] for row in r}
...
dict2 = {row[0]: row[1:] for row in r}

Then, since the values in the dictionaries are lists, you need to just concatenate the lists together:

然后,由于字典中的值是列表,您只需将列表连接在一起:

w.writerows([[key] + dict1.get(key, []) + dict2.get(key, []) for key in keys])

回答by DSM

When I'm working with csvfiles, I often use the pandaslibrary. It makes things like this very easy. For example:

当我处理csv文件时,我经常使用pandas库。它使这样的事情变得非常容易。例如:

import pandas as pd

a = pd.read_csv("filea.csv")
b = pd.read_csv("fileb.csv")
b = b.dropna(axis=1)
merged = a.merge(b, on='title')
merged.to_csv("output.csv", index=False)


Some explanation follows. First, we read in the csv files:

一些解释如下。首先,我们读入 csv 文件:

>>> a = pd.read_csv("filea.csv")
>>> b = pd.read_csv("fileb.csv")
>>> a
   title  stage    jan    feb
0   darn  3.001  0.421  0.532
1     ok  2.829  1.036  0.751
2  three  1.115  1.146  2.921
>>> b
   title    mar    apr    may       jun  Unnamed: 5
0   darn  0.631  1.321  0.951    1.7510         NaN
1     ok  1.001  0.247  2.456    0.3216         NaN
2  three  0.285  1.283  0.924  956.0000         NaN

and we see there's an extra column of data (note that the first line of fileb.csv-- title,mar,apr,may,jun,-- has an extra comma at the end). We can get rid of that easily enough:

我们看到有一个额外的数据列(注意第一行fileb.csv-- title,mar,apr,may,jun,-- 在末尾有一个额外的逗号)。我们可以很容易地摆脱它:

>>> b = b.dropna(axis=1)
>>> b
   title    mar    apr    may       jun
0   darn  0.631  1.321  0.951    1.7510
1     ok  1.001  0.247  2.456    0.3216
2  three  0.285  1.283  0.924  956.0000

Now we can merge aand bon the title column:

现在我们可以在标题列上合并ab

>>> merged = a.merge(b, on='title')
>>> merged
   title  stage    jan    feb    mar    apr    may       jun
0   darn  3.001  0.421  0.532  0.631  1.321  0.951    1.7510
1     ok  2.829  1.036  0.751  1.001  0.247  2.456    0.3216
2  three  1.115  1.146  2.921  0.285  1.283  0.924  956.0000

and finally write this out:

最后写出来:

>>> merged.to_csv("output.csv", index=False)

producing:

生产:

title,stage,jan,feb,mar,apr,may,jun
darn,3.001,0.421,0.532,0.631,1.321,0.951,1.751
ok,2.829,1.036,0.751,1.001,0.247,2.456,0.3216
three,1.115,1.146,2.921,0.285,1.283,0.924,956.0