在 Python 中对 CSV 进行排序

Question

提问by Pranab

I assumed sorting a CSV file on multiple text/numeric fields using Python would be a problem that was already solved. But I can't find any example code anywhere, except for specific code focusing on sorting date fields.

我认为使用 Python 在多个文本/数字字段上对 CSV 文件进行排序将是一个已经解决的问题。但是我在任何地方都找不到任何示例代码，除了专注于对日期字段进行排序的特定代码。

How would one go about sorting a relatively large CSV file (tens of thousand lines) on multiple fields, in order?

如何按顺序对多个字段上的相对较大的 CSV 文件（数万行）进行排序？

Python code samples would be appreciated.

Python 代码示例将不胜感激。

Answer 1

采纳答案by Robert Rossney

Here's Alex's answer, reworked to support column data types:

这是亚历克斯的答案，重新设计以支持列数据类型：

import csv
import operator

def sort_csv(csv_filename, types, sort_key_columns):
    """sort (and rewrite) a csv file.
    types:  data types (conversion functions) for each column in the file
    sort_key_columns: column numbers of columns to sort by"""
    data = []
    with open(csv_filename, 'rb') as f:
        for row in csv.reader(f):
            data.append(convert(types, row))
    data.sort(key=operator.itemgetter(*sort_key_columns))
    with open(csv_filename, 'wb') as f:
        csv.writer(f).writerows(data)

Edit:

编辑：

I did a stupid. I was playing with various things in IDLE and wrote a convertfunction a couple of days ago. I forgot I'd written it, and I haven't closed IDLE in a good long while - so when I wrote the above, I thought convertwas a built-in function. Sadly no.

我做了一个傻事。convert几天前我在 IDLE 中玩各种各样的东西并写了一个函数。忘记自己写的了，好久没关闭IDLE了——所以写上面的时候，还以为convert是内置函数。可悲的是没有。

Here's my implementation, though John Machin's is nicer:

这是我的实现，虽然 John Machin 的更好：

def convert(types, values):
    return [t(v) for t, v in zip(types, values)]

Usage:

用法：

import datetime
def date(s):
    return datetime.strptime(s, '%m/%d/%y')

>>> convert((int, date, str), ('1', '2/15/09', 'z'))
[1, datetime.datetime(2009, 2, 15, 0, 0), 'z']

Answer 2

回答by Alex Martelli

Python's sort works in-memory only; however, tens of thousands of lines should fit in memory easily on a modern machine. So:

Python 的排序仅适用于内存；然而，在现代机器上，数以万计的行应该可以轻松地放入内存中。所以：

import csv

def sortcsvbymanyfields(csvfilename, themanyfieldscolumnnumbers):
  with open(csvfilename, 'rb') as f:
    readit = csv.reader(f)
    thedata = list(readit)
  thedata.sort(key=operator.itemgetter(*themanyfieldscolumnnumbers))
  with open(csvfilename, 'wb') as f:
    writeit = csv.writer(f)
    writeit.writerows(thedata)

Answer 3

回答by John Machin

Here's the convert()that's missing from Robert's fix of Alex's answer:

这是convert()罗伯特对亚历克斯答案的修复中所缺少的：

>>> def convert(convert_funcs, seq):
...    return [
...        item if func is None else func(item)
...        for func, item in zip(convert_funcs, seq)
...        ]
...
>>> convert(
...     (None, float, lambda x: x.strip().lower()),
...     [" text ", "123.45", " TEXT "]
...     )
[' text ', 123.45, 'text']
>>>

I've changed the name of the first arg to highlight that the per-columns function can do what you need, not merely type-coercion. Noneis used to indicate no conversion.

我更改了第一个 arg 的名称，以突出显示 per-columns 函数可以执行您需要的操作，而不仅仅是类型强制。None用于表示没有转换。

Answer 4

回答by telliott99

You bring up 3 issues:

你提出了3个问题：

file size
csv data
sorting on multiple fields

文件大小
csv数据
对多个字段进行排序

Here is a solution for the third part. You can handle csv data in a more sophisticated way.

这是第三部分的解决方案。您可以以更复杂的方式处理 csv 数据。

>>> data = 'a,b,c\nb,b,a\nb,c,a\n'
>>> lines = [e.split(',') for e in data.strip().split('\n')]
>>> lines
[['a', 'b', 'c'], ['b', 'b', 'a'], ['b', 'c', 'a']]
>>> def f(e):
...     field_order = [2,1]
...     return [e[i] for i in field_order]
... 
>>> sorted(lines, key=f)
[['b', 'b', 'a'], ['b', 'c', 'a'], ['a', 'b', 'c']]

Edited to use a list comprehension, generator does not work as I had expected it to.

编辑为使用列表理解，生成器无法像我预期的那样工作。

在 Python 中对 CSV 进行排序

提问by Pranab

采纳答案by Robert Rossney

回答by Alex Martelli

回答by John Machin

回答by telliott99

相关推荐

最近更新

标签

在 Python 中对 CSV 进行排序

提问by Pranab

采纳答案by Robert Rossney

回答by Alex Martelli

回答by John Machin

回答by telliott99

相关推荐

python Sql Alchemy 有什么问题？

使用带有 1 字节变量的 Python struct.unpack

在 Python 中获取数组中每行的第一个元素？

python 为什么分配给 True/False 不像我期望的那样工作？

相关推荐

最近更新

标签