将 GZIP 压缩应用于 Python Pandas 中的 CSV
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37193157/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Apply GZIP compression to a CSV in Python Pandas
提问by user2752159
I am trying to write a dataframe to a gzipped csv in python pandas, using the following:
我正在尝试使用以下命令将数据帧写入 python pandas 中的 gzipped csv:
import pandas as pd
import datetime
import csv
import gzip
# Get data (with previous connection and script variables)
df = pd.read_sql_query(script, conn)
# Create today's date, to append to file
todaysdatestring = str(datetime.datetime.today().strftime('%Y%m%d'))
print todaysdatestring
# Create csv with gzip compression
df.to_csv('foo-%s.csv.gz' % todaysdatestring,
sep='|',
header=True,
index=False,
quoting=csv.QUOTE_ALL,
compression='gzip',
quotechar='"',
doublequote=True,
line_terminator='\n')
This just creates a csv called 'foo-YYYYMMDD.csv.gz', not an actual gzip archive.
这只会创建一个名为“foo-YYYYMMDD.csv.gz”的 csv,而不是实际的 gzip 存档。
I've also tried adding this:
我也试过添加这个:
#Turn to_csv statement into a variable
d = df.to_csv('foo-%s.csv.gz' % todaysdatestring,
sep='|',
header=True,
index=False,
quoting=csv.QUOTE_ALL,
compression='gzip',
quotechar='"',
doublequote=True,
line_terminator='\n')
# Write above variable to gzip
with gzip.open('foo-%s.csv.gz' % todaysdatestring, 'wb') as output:
output.write(d)
Which fails as well. Any ideas?
这也失败了。有任何想法吗?
回答by root
Using df.to_csv()
with the keyword argument compression='gzip'
should produce a gzip archive. I tested it using same keyword arguments as you, and it worked.
使用df.to_csv()
与关键字参数compression='gzip'
应该产生一个gzip压缩文件。我使用与您相同的关键字参数对其进行了测试,并且有效。
You may need to upgrade pandas, as gzip was not implemented until version 0.17.1, but trying to use it on prior versions will not raise an error, and just produce a regular csv. You can determine your current version of pandas by looking at the output of pd.__version__
.
您可能需要升级 pandas,因为 gzip 直到版本 0.17.1 才实现,但是尝试在以前的版本上使用它不会引发错误,并且只会生成一个常规的 csv。您可以通过查看pd.__version__
.
回答by Ioannis Nasios
It is done very easily with pandas
用熊猫很容易完成
import pandas as pd
Writea pandas dataframe to disc as gunzip compressed csv
将Pandas 数据帧作为 gunzip 压缩的 csv写入光盘
df.to_csv('dfsavename.csv.gz', compression='gzip')
Readfrom disc
从光盘读取
df = pd.read_csv('dfsavename.csv.gz', compression='gzip')
回答by piRSquared
From documentation
从文档
import gzip
content = "Lots of content here"
with gzip.open('file.txt.gz', 'wb') as f:
f.write(content)
with pandas
和 pandas
import gzip
content = df.to_csv(
sep='|',
header=True,
index=False,
quoting=csv.QUOTE_ALL,
quotechar='"',
doublequote=True,
line_terminator='\n')
with gzip.open('foo-%s.csv.gz' % todaysdatestring, 'wb') as f:
f.write(content)
The trick here being that to_csv
outputs text if you don't pass it a filename. Then you just redirect that text to gzip
's write
method.
这里的技巧是,to_csv
如果您不传递文件名,则输出文本。然后您只需将该文本重定向到gzip
'swrite
方法。
回答by Alexander
with gzip.open('foo-%s.csv.gz' % todaysdatestring, 'wb') as f:
f.write(df.to_csv(sep='|', index=False, quoting=csv.QUOTE_ALL))