Pandas 数据框 to_csv - 拆分为多个输出文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/44502306/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas dataframe to_csv - split into multiple output files
提问by PlagTag
What is the best /easiest way to split a very large data frame (50GB) into multiple outputs (horizontally)?
将非常大的数据帧(50GB)拆分为多个输出(水平)的最佳/最简单的方法是什么?
I thought about doing something like:
我想过做这样的事情:
stepsize = int(1e8)
for id, i in enumerate(range(0,df.size,stepsize)):
start = i
end = i + stepsize-1 #neglect last row ...
df.ix[start:end].to_csv('/data/bs_'+str(id)+'.csv.out')
But i bet there is a smarter solution out there?
但我敢打赌有更聪明的解决方案吗?
回答by Gautam Shahi
Use id in the filename else it will not work. You missed id
, and without id
, it gives an error.
在文件名中使用 id 否则它将不起作用。您错过了id
,而没有id
,则会出现错误。
for id, df_i in enumerate(np.array_split(df, number_of_chunks)):
df_i.to_csv('/data/bs_{id}.csv'.format(id=id))
回答by PlagTag
Ok, thisanswer brought me to an satisfying solution using
好的,这个答案让我得到了一个令人满意的解决方案
numpy.array_split(object, number_of_chunks)
numpy.array_split(对象,number_of_chunks)
number_of_chunks = 10
[df_i.to_csv('/data/bs_{id}.csv'.format(id=id)) for id, df_i in enumerate(np.array_split(df, number_of_chunks))]
or as for loop:
或作为循环:
for id, df_i in enumerate(np.array_split(df, number_of_chunks)):
# the `id` inside {} may be omitted,
# I also inserted the missing closing parenthesis
df_i.to_csv('/data/bs_{}.csv'.format(id=id))