Pandas 数据框 to_csv - 拆分为多个输出文件

Question

提问by PlagTag

What is the best /easiest way to split a very large data frame (50GB) into multiple outputs (horizontally)?

将非常大的数据帧（50GB）拆分为多个输出（水平）的最佳/最简单的方法是什么？

I thought about doing something like:

我想过做这样的事情：

stepsize = int(1e8)
for id, i in enumerate(range(0,df.size,stepsize)): 
    start = i 
    end = i + stepsize-1 #neglect last row ...
    df.ix[start:end].to_csv('/data/bs_'+str(id)+'.csv.out')

But i bet there is a smarter solution out there?

但我敢打赌有更聪明的解决方案吗？

Answer 1

回答by Gautam Shahi

Use id in the filename else it will not work. You missed id, and without id, it gives an error.

在文件名中使用 id 否则它将不起作用。您错过了id，而没有id，则会出现错误。

for id, df_i in  enumerate(np.array_split(df, number_of_chunks)):
    df_i.to_csv('/data/bs_{id}.csv'.format(id=id))

Answer 2

回答by PlagTag

Ok, thisanswer brought me to an satisfying solution using

好的，这个答案让我得到了一个令人满意的解决方案

numpy.array_split(object, number_of_chunks)

numpy.array_split（对象，number_of_chunks）

number_of_chunks = 10
[df_i.to_csv('/data/bs_{id}.csv'.format(id=id)) for id, df_i in  enumerate(np.array_split(df, number_of_chunks))]

or as for loop:

或作为循环：

for id, df_i in  enumerate(np.array_split(df, number_of_chunks)):
    # the `id` inside {} may be omitted,
    # I also inserted the missing closing parenthesis
    df_i.to_csv('/data/bs_{}.csv'.format(id=id))

Pandas 数据框 to_csv - 拆分为多个输出文件

提问by PlagTag

回答by Gautam Shahi

回答by PlagTag

相关推荐

最近更新

标签

Pandas 数据框 to_csv - 拆分为多个输出文件

提问by PlagTag

回答by Gautam Shahi

回答by PlagTag

相关推荐

pandas 熊猫数据框索引匹配

pandas 熊猫组合两个分组依据，过滤和合并组（计数）

Pandas groupby 自定义函数到每个系列

pandas 标签不在列表中和 KeyError

相关推荐

最近更新

标签