Pandas 数据框 to_csv - 拆分为多个输出文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/44502306/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:46:25  来源:igfitidea点击:

Pandas dataframe to_csv - split into multiple output files

pythonpandas

提问by PlagTag

What is the best /easiest way to split a very large data frame (50GB) into multiple outputs (horizontally)?

将非常大的数据帧(50GB)拆分为多个输出(水平)的最佳/最简单的方法是什么?

I thought about doing something like:

我想过做这样的事情:

stepsize = int(1e8)
for id, i in enumerate(range(0,df.size,stepsize)): 
    start = i 
    end = i + stepsize-1 #neglect last row ...
    df.ix[start:end].to_csv('/data/bs_'+str(id)+'.csv.out')

But i bet there is a smarter solution out there?

但我敢打赌有更聪明的解决方案吗?

回答by Gautam Shahi

Use id in the filename else it will not work. You missed id, and without id, it gives an error.

在文件名中使用 id 否则它将不起作用。您错过了id,而没有id,则会出现错误。

for id, df_i in  enumerate(np.array_split(df, number_of_chunks)):
    df_i.to_csv('/data/bs_{id}.csv'.format(id=id))

回答by PlagTag

Ok, thisanswer brought me to an satisfying solution using

好的,这个答案让我得到了一个令人满意的解决方案

numpy.array_split(object, number_of_chunks)

numpy.array_split(对象,number_of_chunks)

number_of_chunks = 10
[df_i.to_csv('/data/bs_{id}.csv'.format(id=id)) for id, df_i in  enumerate(np.array_split(df, number_of_chunks))]

or as for loop:

或作为循环:

for id, df_i in  enumerate(np.array_split(df, number_of_chunks)):
    # the `id` inside {} may be omitted,
    # I also inserted the missing closing parenthesis
    df_i.to_csv('/data/bs_{}.csv'.format(id=id))