在 Python 中对 CSV 进行排序
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2089036/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Sorting CSV in Python
提问by Pranab
I assumed sorting a CSV file on multiple text/numeric fields using Python would be a problem that was already solved. But I can't find any example code anywhere, except for specific code focusing on sorting date fields.
我认为使用 Python 在多个文本/数字字段上对 CSV 文件进行排序将是一个已经解决的问题。但是我在任何地方都找不到任何示例代码,除了专注于对日期字段进行排序的特定代码。
How would one go about sorting a relatively large CSV file (tens of thousand lines) on multiple fields, in order?
如何按顺序对多个字段上的相对较大的 CSV 文件(数万行)进行排序?
Python code samples would be appreciated.
Python 代码示例将不胜感激。
采纳答案by Robert Rossney
Here's Alex's answer, reworked to support column data types:
这是亚历克斯的答案,重新设计以支持列数据类型:
import csv
import operator
def sort_csv(csv_filename, types, sort_key_columns):
"""sort (and rewrite) a csv file.
types: data types (conversion functions) for each column in the file
sort_key_columns: column numbers of columns to sort by"""
data = []
with open(csv_filename, 'rb') as f:
for row in csv.reader(f):
data.append(convert(types, row))
data.sort(key=operator.itemgetter(*sort_key_columns))
with open(csv_filename, 'wb') as f:
csv.writer(f).writerows(data)
Edit:
编辑:
I did a stupid. I was playing with various things in IDLE and wrote a convert
function a couple of days ago. I forgot I'd written it, and I haven't closed IDLE in a good long while - so when I wrote the above, I thought convert
was a built-in function. Sadly no.
我做了一个傻事。convert
几天前我在 IDLE 中玩各种各样的东西并写了一个函数。忘记自己写的了,好久没关闭IDLE了——所以写上面的时候,还以为convert
是内置函数。可悲的是没有。
Here's my implementation, though John Machin's is nicer:
这是我的实现,虽然 John Machin 的更好:
def convert(types, values):
return [t(v) for t, v in zip(types, values)]
Usage:
用法:
import datetime
def date(s):
return datetime.strptime(s, '%m/%d/%y')
>>> convert((int, date, str), ('1', '2/15/09', 'z'))
[1, datetime.datetime(2009, 2, 15, 0, 0), 'z']
回答by Alex Martelli
Python's sort works in-memory only; however, tens of thousands of lines should fit in memory easily on a modern machine. So:
Python 的排序仅适用于内存;然而,在现代机器上,数以万计的行应该可以轻松地放入内存中。所以:
import csv
def sortcsvbymanyfields(csvfilename, themanyfieldscolumnnumbers):
with open(csvfilename, 'rb') as f:
readit = csv.reader(f)
thedata = list(readit)
thedata.sort(key=operator.itemgetter(*themanyfieldscolumnnumbers))
with open(csvfilename, 'wb') as f:
writeit = csv.writer(f)
writeit.writerows(thedata)
回答by John Machin
Here's the convert()
that's missing from Robert's fix of Alex's answer:
这是convert()
罗伯特对亚历克斯答案的修复中所缺少的:
>>> def convert(convert_funcs, seq):
... return [
... item if func is None else func(item)
... for func, item in zip(convert_funcs, seq)
... ]
...
>>> convert(
... (None, float, lambda x: x.strip().lower()),
... [" text ", "123.45", " TEXT "]
... )
[' text ', 123.45, 'text']
>>>
I've changed the name of the first arg to highlight that the per-columns function can do what you need, not merely type-coercion. None
is used to indicate no conversion.
我更改了第一个 arg 的名称,以突出显示 per-columns 函数可以执行您需要的操作,而不仅仅是类型强制。None
用于表示没有转换。
回答by telliott99
You bring up 3 issues:
你提出了3个问题:
- file size
- csv data
- sorting on multiple fields
- 文件大小
- csv数据
- 对多个字段进行排序
Here is a solution for the third part. You can handle csv data in a more sophisticated way.
这是第三部分的解决方案。您可以以更复杂的方式处理 csv 数据。
>>> data = 'a,b,c\nb,b,a\nb,c,a\n'
>>> lines = [e.split(',') for e in data.strip().split('\n')]
>>> lines
[['a', 'b', 'c'], ['b', 'b', 'a'], ['b', 'c', 'a']]
>>> def f(e):
... field_order = [2,1]
... return [e[i] for i in field_order]
...
>>> sorted(lines, key=f)
[['b', 'b', 'a'], ['b', 'c', 'a'], ['a', 'b', 'c']]
Edited to use a list comprehension, generator does not work as I had expected it to.
编辑为使用列表理解,生成器无法像我预期的那样工作。