Python：比较两个 csv 文件中的特定列

Question

提问by coder999

Say that I have two CSV files (file1 and file2) with contents as shown below:

假设我有两个 CSV 文件（file1 和 file2），其内容如下所示：

file1:

文件 1：

fred,43,Male,"23,45",blue,"1, bedrock avenue"

file2:

文件2：

fred,39,Male,"23,45",blue,"1, bedrock avenue"

I would like to compare these two CSV records to see if columns 0,2,3,4, and 5 are the same. I don't care about column 1.

我想比较这两个 CSV 记录，看看第 0、2、3、4 和 5 列是否相同。我不在乎第 1 列。

What's the most pythonic way of doing this?

这样做的最pythonic方式是什么？

EDIT:

编辑：

Some example code would be appreciated.

一些示例代码将不胜感激。

EDIT2:

编辑2：

Please note the embedded commas need to be handled correctly.

请注意需要正确处理嵌入的逗号。

Answer 1

采纳答案by Elalfer

I suppose the best ways is to use Python library: http://docs.python.org/library/csv.html.

我想最好的方法是使用 Python 库：http: //docs.python.org/library/csv.html。

UPDATE (example added):

更新（添加示例）：

import csv
reader1 = csv.reader(open('data1.csv', 'rb'), delimiter=',', quotechar='"'))
row1 = reader1.next()
reader2 = csv.reader(open('data2.csv', 'rb'), delimiter=',', quotechar='"'))
row2 = reader2.next()
if (row1[0] == row2[0]) and (row1[2:] == row2[2:]):
    print "eq"
else:
    print "different"

Answer 2

回答by Santiago Alessandri

I would read both records, eliminate column 1 and the compare what's left. (In python3 works)

我会读取两条记录，消除第 1 列并比较剩下的内容。（在python3作品中）

import csv
file1 = csv.reader(open("file1.csv", "r"))
file2 = csv.reader(open("file2.csv", "r"))
r1 = next(file1)
r1.pop(1)
r2 = next(file2)
r2.pop(1)
return r1 == r2

Answer 3

回答by ulidtko

>>> import csv
>>> csv1 = csv.reader(open("file1.csv", "r"))
>>> csv2 = csv.reader(open("file2.csv", "r"))
>>> while True:
...   try:
...     line1 = csv1.next()
...     line2 = csv2.next()
...     equal = (line1[0]==line2[0] and line1[2]==line2[2] and line1[3]==line2[3] and line1[4]==line2[4] and line1[5]==line2[5])
...     print equal
...   except StopIteration:
...     break
True

Update

更新

3 years later, I think I'd rather write it this way.

3年后，我想我宁愿这样写。

import csv

interesting_cols = [0, 2, 3, 4, 5]

with open("file1.csv", 'r') as file1,\
     open("file2.csv", 'r') as file2:

    reader1, reader2 = csv.reader(file1), csv.reader(file2)

    for line1, line2 in zip(reader1, reader2):
        equal = all(x == y
            for n, (x, y) in enumerate(zip(line1, line2))
            if n in interesting_cols
        )
        print(equal)

Answer 4

回答by Daniel

# Include required modules

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Include required csv files

df_TrainSet = pd.read_csv('../data/ldp_TrainSet.csv')
df_DataSet = pd.read_csv('../data/ldp_DataSet.csv')


# First test
[c for c in df_TrainSet if c not in df_DataSet.columns]

# Second test
[c for c in df_DataSet if c not in df_TrainSet.columns]

With this example I check both CSV files whether the columns in both files are present in each other.

在这个例子中，我检查了两个 CSV 文件，这两个文件中的列是否彼此存在。

Python：比较两个 csv 文件中的特定列

提问by coder999

采纳答案by Elalfer

回答by Santiago Alessandri

回答by ulidtko

Update

更新

回答by Daniel

相关推荐

最近更新

标签

Python：比较两个 csv 文件中的特定列

提问by coder999

采纳答案by Elalfer

回答by Santiago Alessandri

回答by ulidtko

Update

更新

回答by Daniel

相关推荐

Python：无法pickle类型X，属性查找失败

在 Python 中，如何检查 StringIO 对象的大小？

如何使用python计算地球表面多边形的面积？

从 Python 调用 Perl 脚本

相关推荐

最近更新

标签