将 GZIP 压缩应用于 Python Pandas 中的 CSV

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37193157/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 19:01:12  来源:igfitidea点击:

Apply GZIP compression to a CSV in Python Pandas

pythoncsvpandasgzipexport-to-csv

提问by user2752159

I am trying to write a dataframe to a gzipped csv in python pandas, using the following:

我正在尝试使用以下命令将数据帧写入 python pandas 中的 gzipped csv:

import pandas as pd
import datetime
import csv
import gzip

# Get data (with previous connection and script variables)
df = pd.read_sql_query(script, conn)

# Create today's date, to append to file
todaysdatestring = str(datetime.datetime.today().strftime('%Y%m%d'))
print todaysdatestring

# Create csv with gzip compression
df.to_csv('foo-%s.csv.gz' % todaysdatestring,
      sep='|',
      header=True,
      index=False,
      quoting=csv.QUOTE_ALL,
      compression='gzip',
      quotechar='"',
      doublequote=True,
      line_terminator='\n')

This just creates a csv called 'foo-YYYYMMDD.csv.gz', not an actual gzip archive.

这只会创建一个名为“foo-YYYYMMDD.csv.gz”的 csv,而不是实际的 gzip 存档。

I've also tried adding this:

我也试过添加这个:

#Turn to_csv statement into a variable
d = df.to_csv('foo-%s.csv.gz' % todaysdatestring,
      sep='|',
      header=True,
      index=False,
      quoting=csv.QUOTE_ALL,
      compression='gzip',
      quotechar='"',
      doublequote=True,
      line_terminator='\n')

# Write above variable to gzip
 with gzip.open('foo-%s.csv.gz' % todaysdatestring, 'wb') as output:
   output.write(d)

Which fails as well. Any ideas?

这也失败了。有任何想法吗?

回答by root

Using df.to_csv()with the keyword argument compression='gzip'should produce a gzip archive. I tested it using same keyword arguments as you, and it worked.

使用df.to_csv()与关键字参数compression='gzip'应该产生一个gzip压缩文件。我使用与您相同的关键字参数对其进行了测试,并且有效。

You may need to upgrade pandas, as gzip was not implemented until version 0.17.1, but trying to use it on prior versions will not raise an error, and just produce a regular csv. You can determine your current version of pandas by looking at the output of pd.__version__.

您可能需要升级 pandas,因为 gzip 直到版本 0.17.1 才实现,但是尝试在以前的版本上使用它不会引发错误,并且只会生成一个常规的 csv。您可以通过查看pd.__version__.

回答by Ioannis Nasios

It is done very easily with pandas

用熊猫很容易完成

import pandas as pd

Writea pandas dataframe to disc as gunzip compressed csv

Pandas 数据帧作为 gunzip 压缩的 csv写入光盘

df.to_csv('dfsavename.csv.gz', compression='gzip')

Readfrom disc

从光盘读取

df = pd.read_csv('dfsavename.csv.gz', compression='gzip')

回答by piRSquared

From documentation

文档

import gzip
content = "Lots of content here"
with gzip.open('file.txt.gz', 'wb') as f:
    f.write(content)

with pandas

pandas

import gzip


content = df.to_csv(
      sep='|',
      header=True,
      index=False,
      quoting=csv.QUOTE_ALL,
      quotechar='"',
      doublequote=True,
      line_terminator='\n')

with gzip.open('foo-%s.csv.gz' % todaysdatestring, 'wb') as f:
    f.write(content)

The trick here being that to_csvoutputs text if you don't pass it a filename. Then you just redirect that text to gzip's writemethod.

这里的技巧是,to_csv如果您不传递文件名,则输出文本。然后您只需将该文本重定向到gzip'swrite方法。

回答by Alexander

with gzip.open('foo-%s.csv.gz' % todaysdatestring, 'wb') as f:
    f.write(df.to_csv(sep='|', index=False, quoting=csv.QUOTE_ALL))