Python 类型错误：第一个参数必须是可迭代的熊猫对象，您传递了一个“DataFrame”类型的对象

Question

提问by Petr Petrov

I have a big dataframe and I try to split that and after concatthat. I use

我有一个大数据框，我尝试将其拆分，然后再拆分concat。我用

df2 = pd.read_csv('et_users.csv', header=None, names=names2, chunksize=100000)
for chunk in df2:
    chunk['ID'] = chunk.ID.map(rep.set_index('member_id')['panel_mm_id'])

df2 = pd.concat(chunk, ignore_index=True)

But it return an error

但它返回一个错误

TypeError: first argument must be an iterable of pandas objects, you passed an object of type "DataFrame"

How can I fix that?

我该如何解决？

Answer 1

采纳答案by EdChum

IIUC you want the following:

IIUC 你想要以下内容：

df2 = pd.read_csv('et_users.csv', header=None, names=names2, chunksize=100000)
chunks=[]
for chunk in df2:
    chunk['ID'] = chunk.ID.map(rep.set_index('member_id')['panel_mm_id'])
    chunks.append(chunk)

df2 = pd.concat(chunks, ignore_index=True)

You need to append each chunk to a list and then use concatto concatenate them all, also I think the ignore_indexmay not be necessary but I may be wrong

您需要将每个块附加到一个列表中，然后concat将它们全部连接起来，我认为这ignore_index可能没有必要，但我可能错了

Answer 2

回答by Nikhil VJ

I was getting the same issue, and just realised that we have to pass the (multiple!) dataframes as a LIST in the first argument instead of as multiple arguments!

我遇到了同样的问题，刚刚意识到我们必须在第一个参数中将（多个！）数据帧作为 LIST 传递，而不是作为多个参数传递！

Reference: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.concat.html

参考：https: //pandas.pydata.org/pandas-docs/stable/generated/pandas.concat.html

a = pd.DataFrame()
b = pd.DataFrame()
c = pd.concat(a,b) # errors out:
TypeError: first argument must be an iterable of pandas objects, you passed an object of type "DataFrame"

c = pd.concat([a,b]) # works.

If the processing action doesn't require ALL the data to be present, then is no reason to keep saving all the chunks to an external array and process everything only after the chunking loop is over: that defeats the whole purpose of chunking. We use chunksize because we want to do the processing at each chunkand free up the memory for the next chunk.

如果处理操作不需要存在所有数据，则没有理由将所有块保存到外部数组并仅在分块循环结束后处理所有内容：这违背了分块的全部目的。我们使用 chunksize 是因为我们想在每个块上进行处理并为下一个块释放内存。

In terms of OP's code, they need to create another empty dataframe and concat the chunks into there.

就 OP 的代码而言，他们需要创建另一个空数据帧并将块连接到那里。

df3 = pd.DataFrame() # create empty df for collecting chunks
df2 = pd.read_csv('et_users.csv', header=None, names=names2, chunksize=100000)
for chunk in df2:
    chunk['ID'] = chunk.ID.map(rep.set_index('member_id')['panel_mm_id'])
    df3 = pd.concat([df3,chunk], ignore_index=True)

print(df3)

However, I'd like to reiterate that chunking was invented precisely to avoidbuilding up all the rows of the entire CSV into a single DataFrame, as that is what causes out-of-memory errors when dealing with large CSVs. We don't want to just shift the error down the road from the pd.read_csv()line to the pd.concat()line. We need to craft ways to finish off the bulk of our data processing insidethe chunking loop. In my own use case I'm eliminating away most of the rows using a df query and concatenating only the fewer required rows, so the final df is much smaller than the original csv.

但是，我想重申，分块的发明正是为了避免将整个 CSV 的所有行构建到单个 DataFrame 中，因为这是在处理大型 CSV 时导致内存不足错误的原因。我们不想只是将错误从pd.read_csv()线路转移到pd.concat()线路。我们需要精心设计方法来完成分块循环内的大部分数据处理。在我自己的用例中，我使用 df 查询消除了大部分行并仅连接较少的所需行，因此最终的 df 比原始 csv 小得多。

Answer 3

回答by Dimanjan

Last line must be in following format:

最后一行必须采用以下格式：

df2=pd.concat([df1,df2,df3,df4,...], ignore_index=True)

The thing is dataframes to be concatenated need to be passed as list/tuple.

问题是要连接的数据帧需要作为列表/元组传递。

Answer 4

回答by acm151130

Like what they said, you need to pass it in as a list. Also, it may help to make sure it's in a DataFrame prior to using concat.

就像他们说的那样，您需要将其作为列表传递。此外，在使用 concat 之前确保它位于 DataFrame 中可能会有所帮助。

i.e.

IE

chunks = pd.DataFrame(chunks)
df2 = pd.concat([chunks], ignore_index=True)

Python 类型错误：第一个参数必须是可迭代的熊猫对象，您传递了一个“DataFrame”类型的对象

提问by Petr Petrov

采纳答案by EdChum

回答by Nikhil VJ

回答by Dimanjan

回答by acm151130

相关推荐

最近更新

标签

Python 类型错误：第一个参数必须是可迭代的熊猫对象，您传递了一个“DataFrame”类型的对象

提问by Petr Petrov

采纳答案by EdChum

回答by Nikhil VJ

回答by Dimanjan

回答by acm151130

相关推荐

Python 导入错误：缺少必需的依赖项 ['numpy']

Python PyCharm 加载包列表时出错

Python 迭代工作表、行、列

从 Shell 脚本向 Python 传递参数

相关推荐

最近更新

标签