使用 Python 合并两个 CSV 文件

Question

提问by Rex

OK I have read several threads here on Stack Overflow. I thought this would be fairly easy for me to do but I find that I still do not have a very good grasp of Python. I tried the example located at How to combine 2 csv files with common column value, but both files have different number of linesand that was helpful but I still do not have the results that I was hoping to achieve.

好的，我已经阅读了 Stack Overflow 上的几个主题。我认为这对我来说很容易做到，但我发现我仍然没有很好地掌握 Python。我尝试了位于How to combine 2 csv files with common column value 中的示例，但两个文件的行数不同，这很有帮助，但我仍然没有得到我希望达到的结果。

Essentially I have 2 csv files with a common first column. I would like to merge the 2. i.e.

基本上我有 2 个 csv 文件，它们有一个共同的第一列。我想合并 2. 即

filea.csv

文件.csv

title,stage,jan,feb
darn,3.001,0.421,0.532
ok,2.829,1.036,0.751
three,1.115,1.146,2.921

fileb.csv

文件.csv

title,mar,apr,may,jun,
darn,0.631,1.321,0.951,1.751
ok,1.001,0.247,2.456,0.3216
three,0.285,1.283,0.924,956

output.csv (not the one I am getting but what I want)

output.csv（不是我得到的，而是我想要的）

title,stage,jan,feb,mar,apr,may,jun
darn,3.001,0.421,0.532,0.631,1.321,0.951,1.751
ok,2.829,1.036,0.751,1.001,0.247,2.456,0.3216
three,1.115,1.146,2.921,0.285,1.283,0.924,956

output.csv (the output that I actually got)

output.csv（我实际得到的输出）

title,feb,may
ok,0.751,2.456
three,2.921,0.924
darn,0.532,0.951

The code I was trying:

我正在尝试的代码：

'''
testing merging of 2 csv files
'''
import csv
import array
import os

with open('Z:\Desktop\test\filea.csv') as f:
    r = csv.reader(f, delimiter=',')
    dict1 = {row[0]: row[3] for row in r}

with open('Z:\Desktop\test\fileb.csv') as f:
    r = csv.reader(f, delimiter=',')
    #dict2 = {row[0]: row[3] for row in r}
    dict2 = {row[0:3] for row in r}

print str(dict1)
print str(dict2)

keys = set(dict1.keys() + dict2.keys())
with open('Z:\Desktop\test\output.csv', 'wb') as f:
    w = csv.writer(f, delimiter=',')
    w.writerows([[key, dict1.get(key, "''"), dict2.get(key, "''")] for key in keys])

Any help is greatly appreciated.

任何帮助是极大的赞赏。

Answer 1

回答by Xymostech

You need to store all of the extra rows in the files in your dictionary, not just one of them:

您需要将所有额外行存储在字典中的文件中，而不仅仅是其中之一：

dict1 = {row[0]: row[1:] for row in r}
...
dict2 = {row[0]: row[1:] for row in r}

Then, since the values in the dictionaries are lists, you need to just concatenate the lists together:

然后，由于字典中的值是列表，您只需将列表连接在一起：

w.writerows([[key] + dict1.get(key, []) + dict2.get(key, []) for key in keys])

Answer 2

回答by DSM

When I'm working with csvfiles, I often use the pandaslibrary. It makes things like this very easy. For example:

当我处理csv文件时，我经常使用pandas库。它使这样的事情变得非常容易。例如：

import pandas as pd

a = pd.read_csv("filea.csv")
b = pd.read_csv("fileb.csv")
b = b.dropna(axis=1)
merged = a.merge(b, on='title')
merged.to_csv("output.csv", index=False)

Some explanation follows. First, we read in the csv files:

一些解释如下。首先，我们读入 csv 文件：

>>> a = pd.read_csv("filea.csv")
>>> b = pd.read_csv("fileb.csv")
>>> a
   title  stage    jan    feb
0   darn  3.001  0.421  0.532
1     ok  2.829  1.036  0.751
2  three  1.115  1.146  2.921
>>> b
   title    mar    apr    may       jun  Unnamed: 5
0   darn  0.631  1.321  0.951    1.7510         NaN
1     ok  1.001  0.247  2.456    0.3216         NaN
2  three  0.285  1.283  0.924  956.0000         NaN

and we see there's an extra column of data (note that the first line of fileb.csv-- title,mar,apr,may,jun,-- has an extra comma at the end). We can get rid of that easily enough:

我们看到有一个额外的数据列（注意第一行fileb.csv-- title,mar,apr,may,jun,-- 在末尾有一个额外的逗号）。我们可以很容易地摆脱它：

>>> b = b.dropna(axis=1)
>>> b
   title    mar    apr    may       jun
0   darn  0.631  1.321  0.951    1.7510
1     ok  1.001  0.247  2.456    0.3216
2  three  0.285  1.283  0.924  956.0000

Now we can merge aand bon the title column:

现在我们可以在标题列上合并a和b：

>>> merged = a.merge(b, on='title')
>>> merged
   title  stage    jan    feb    mar    apr    may       jun
0   darn  3.001  0.421  0.532  0.631  1.321  0.951    1.7510
1     ok  2.829  1.036  0.751  1.001  0.247  2.456    0.3216
2  three  1.115  1.146  2.921  0.285  1.283  0.924  956.0000

and finally write this out:

最后写出来：

>>> merged.to_csv("output.csv", index=False)

producing:

生产：

title,stage,jan,feb,mar,apr,may,jun
darn,3.001,0.421,0.532,0.631,1.321,0.951,1.751
ok,2.829,1.036,0.751,1.001,0.247,2.456,0.3216
three,1.115,1.146,2.921,0.285,1.283,0.924,956.0

使用 Python 合并两个 CSV 文件

提问by Rex

回答by Xymostech

回答by DSM

相关推荐

最近更新

标签

使用 Python 合并两个 CSV 文件

提问by Rex

回答by Xymostech

回答by DSM

相关推荐

Python Django DetailView - 如何在 get_context_data 中使用“请求”

Python 找到最小和最大像素的值

Python Numpy 第一次出现大于现有值的值

在程序关闭期间在 Python 中捕获 KeyboardInterrupt

相关推荐

最近更新

标签