Pandas 数据框 to_csv - 拆分为多个输出文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/44502306/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas dataframe to_csv - split into multiple output files
提问by PlagTag
What is the best /easiest way to split a very large data frame (50GB) into multiple outputs (horizontally)?
将非常大的数据帧(50GB)拆分为多个输出(水平)的最佳/最简单的方法是什么?
I thought about doing something like:
我想过做这样的事情:
stepsize = int(1e8)
for id, i in enumerate(range(0,df.size,stepsize)):
start = i
end = i + stepsize-1 #neglect last row ...
df.ix[start:end].to_csv('/data/bs_'+str(id)+'.csv.out')
But i bet there is a smarter solution out there?
但我敢打赌有更聪明的解决方案吗?
回答by Gautam Shahi
Use id in the filename else it will not work. You missed id, and without id, it gives an error.
在文件名中使用 id 否则它将不起作用。您错过了id,而没有id,则会出现错误。
for id, df_i in enumerate(np.array_split(df, number_of_chunks)):
df_i.to_csv('/data/bs_{id}.csv'.format(id=id))
回答by PlagTag
Ok, thisanswer brought me to an satisfying solution using
好的,这个答案让我得到了一个令人满意的解决方案
numpy.array_split(object, number_of_chunks)
numpy.array_split(对象,number_of_chunks)
number_of_chunks = 10
[df_i.to_csv('/data/bs_{id}.csv'.format(id=id)) for id, df_i in enumerate(np.array_split(df, number_of_chunks))]
or as for loop:
或作为循环:
for id, df_i in enumerate(np.array_split(df, number_of_chunks)):
# the `id` inside {} may be omitted,
# I also inserted the missing closing parenthesis
df_i.to_csv('/data/bs_{}.csv'.format(id=id))

