使用 Pandas Excelwriter 写入 StringIO 对象?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28058563/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:51:40  来源:igfitidea点击:

Write to StringIO object using Pandas Excelwriter?

pythonexcelpandasstringioxlsxwriter

提问by A User

I can pass a StringIO object to pd.to_csv() just fine:

我可以将 StringIO 对象传递给 pd.to_csv() 就好了:

io = StringIO.StringIO()
pd.DataFrame().to_csv(io)

But when using the excel writer, I am having a lot more trouble.

但是当使用 excel writer 时,我遇到了更多的麻烦。

io = StringIO.StringIO()
writer = pd.ExcelWriter(io)
pd.DataFrame().to_excel(writer,"sheet name")
writer.save()   

Returns an

返回一个

AttributeError: StringIO instance has no attribute 'rfind'

I'm trying to create an ExcelWriterobject without calling pd.ExcelWriter()but am having some trouble. This is what I've tried so far:

我试图在ExcelWriter不调用的情况下创建一个对象,pd.ExcelWriter()但遇到了一些麻烦。这是我迄今为止尝试过的:

from xlsxwriter.workbook import Workbook
writer = Workbook(io)
pd.DataFrame().to_excel(writer,"sheet name")
writer.save()

But now I am getting an AttributeError: 'Workbook' object has no attribute 'write_cells'

但现在我得到了 AttributeError: 'Workbook' object has no attribute 'write_cells'

How can I save a pandas dataframe in excel format to a StringIOobject?

如何将 excel 格式的 Pandas 数据框保存到StringIO对象中?

回答by jmcnamara

Pandas expects a filename path to the ExcelWriter constructors although each of the writer engines support StringIO. Perhaps that should be raised as a bug/feature request in Pandas.

Pandas 需要 ExcelWriter 构造函数的文件名路径,尽管每个编写器引擎都支持StringIO. 也许这应该作为 Pandas 中的错误/功能请求提出。

In the meantime here is a workaround example using the Pandas xlsxwriterengine:

同时,这里有一个使用 Pandasxlsxwriter引擎的解决方法示例:

import pandas as pd
import StringIO

io = StringIO.StringIO()

# Use a temp filename to keep pandas happy.
writer = pd.ExcelWriter('temp.xlsx', engine='xlsxwriter')

# Set the filename/file handle in the xlsxwriter.workbook object.
writer.book.filename = io

# Write the data frame to the StringIO object.
pd.DataFrame().to_excel(writer, sheet_name='Sheet1')
writer.save()
xlsx_data = io.getvalue()

Update: As of Pandas 0.17 it is now possible to do this more directly:

更新:从 Pandas 0.17 开始,现在可以更直接地执行此操作:

# Note, Python 2 example. For Python 3 use: output = io.BytesIO().
output = StringIO.StringIO()

# Use the StringIO object as the filehandle.
writer = pd.ExcelWriter(output, engine='xlsxwriter')

See also Saving the Dataframe output to a stringin the XlsxWriter docs.

另请参阅XlsxWriter 文档中的将数据帧输出保存为字符串

回答by clockwatcher

Glancing at the pandas.io.excel source looks like it shouldn't be too much of a problem if you don't mind using xlwt as your writer. The other engines may not be all that difficult either but xlwt jumps out as easy since its save method takes a stream or a filepath.

如果你不介意使用 xlwt 作为你的作者,看看 pandas.io.excel 的源代码看起来应该不会有太大的问题。其他引擎可能也不是那么困难,但 xlwt 跳出来一样容易,因为它的保存方法采用流或文件路径。

You need to initially pass in a filename just to make pandas happy as it checks the filename extension against the engine to make sure it's a supported format. But in the case of the xlwt engine, it just stuffs the filename into the object's path attribute and then uses it in the save method. If you change the path attribute to your stream, it'll happily save to that stream when you call the save method.

您需要最初传入文件名只是为了让 Pandas 满意,因为它会根据引擎检查文件扩展名以确保它是受支持的格式。但是在 xlwt 引擎的情况下,它只是将文件名填充到对象的路径属性中,然后在 save 方法中使用它。如果您将路径属性更改为您的流,那么当您调用 save 方法时,它会很高兴地保存到该流中。

Here's an example:

下面是一个例子:

import pandas as pd
import StringIO
import base64

df = pd.DataFrame.from_csv('http://moz.com/top500/domains/csv')
xlwt_writer = pd.io.excel.get_writer('xlwt')
my_writer = xlwt_writer('whatever.xls')  #make pandas happy 
xl_out = StringIO.StringIO()
my_writer.path = xl_out  
df.to_excel(my_writer)
my_writer.save()
print base64.b64encode(xl_out.getvalue())

That's the quick, easy and slightly dirty way to do it. BTW... a cleaner way to do it is to subclass ExcelWriter (or one of it's existing subclasses, e.g. _XlwtWriter) -- but honestly there's so little involved in updating the path attribute, I voted to show you the easy way rather than go the slightly longer route.

这是一种快速、简单且略显脏乱的方法。顺便说一句……一种更简洁的方法是对 ExcelWriter(或其中一个现有的子类,例如 _XlwtWriter)进行子类化——但老实说,更新路径属性几乎没有涉及,我投票向你展示了简单的方法而不是去稍长的路线。

回答by Deano

For those not using xlsxwriteras their engine=for to_excelhere is a solution to use openpyxlin memory:

对于那些不使用xlsxwriter它们的人engine=to_excel这里是一个openpyxl在内存中使用的解决方案:

in_memory_file = StringIO.StringIO()
xlw = pd.ExcelWriter('temp.xlsx', engine='openpyxl')
# ... do many .to_excel() thingies
xlw.book.save(in_memory_file)
# if you want to read it or stream to a client, don't forget this
in_memory_file.seek(0)

explanation: the ExcelWriterwrapper class exposes the engines individual workbook through the .bookproperty. For openpyxlyou can then use the Workbook.savemethod as usual!

解释:ExcelWriter包装类通过.book属性公开引擎个人工作簿。因为openpyxl您可以Workbook.save照常使用该方法!