Pandas concat 是一个就地函数吗?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16982936/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 20:53:21  来源:igfitidea点击:

is Pandas concat an in-place function?

pythonpandas

提问by James Bond

I guess this question needs some insight into the implementation of concat.

我想这个问题需要深入了解 concat 的实现。

Say, I have 30 files, 1G each, and I can only use up to 32 G memory. I loaded the files into a list of DataFrames, called 'list_of_pieces'. This list_of_pieces should be ~ 30G in size, right?

比如说,我有30个文件,每个1G,最多只能使用32G内存。我将文件加载到名为“list_of_pieces”的 DataFrame 列表中。这个 list_of_pieces 大小应该是 ~ 30G 吧?

if I do 'pd.concat(list_of_pieces)', does concat allocate another 30G (or maybe 10G 15G) in the heap and do some operations, or it run the concatation 'in-place' without allocating new memory?

如果我执行 'pd.concat(list_of_pieces)',concat 是否会在堆中分配另外 30G(或者可能是 10G 15G)并执行一些操作,或者它在不分配新内存的情况下运行“就地”连接?

anyone knows this?

有人知道吗?

Thanks!

谢谢!

回答by Jeff

The answer is no, this is not an in-place operation; np.concatenate is used under the hood, see here: Concatenate Numpy arrays without copying

答案是否定的,这不是就地操作;np.concatenate 在幕后使用,请参见此处:Concatenate Numpy arrays without copying

A better approach to the problem is to write each of these pieces to an HDFStoretable, see here: http://pandas.pydata.org/pandas-docs/dev/io.html#hdf5-pytablesfor docs, and here: http://pandas.pydata.org/pandas-docs/dev/cookbook.html#hdfstorefor some recipies.

解决该问题的更好方法是将这些部分中的每一个都写入HDFStore表格,请参见此处:http: //pandas.pydata.org/pandas-docs/dev/io.html#hdf5-pytablesfor docs,以及此处: http ://pandas.pydata.org/pandas-docs/dev/cookbook.html#hdfstore一些食谱。

Then you can select whatever portions (or even the whole set) as needed (by query or even row number)

然后你可以根据需要选择任何部分(甚至整个集)(通过查询甚至行号)

Certain types of operations can even be done when the data is on-disk: https://github.com/pydata/pandas/issues/3202?source=cc, and here: http://pytables.github.io/usersguide/libref/expr_class.html#

当数据在磁盘上时,甚至可以执行某些类型的操作:https: //github.com/pydata/pandas/issues/3202?source=cc,这里:http: //pytables.github.io/usersguide /libref/expr_class.html#