pandas 初始化一个空的 DataFrame 并附加行

Question

提问by alvas

Different from creating an empty dataframe and populating rows later, I have many many dataframes that needs to be concatenated.

与创建空数据框并稍后填充行不同，我有许多需要连接的数据框。

If there were only two data frames, I can do this:

如果只有两个数据框，我可以这样做：

df1 = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB'))
df2 = pd.DataFrame([[5, 6], [7, 8]], columns=list('AB'))

df1.append(df2, ignore_index=True)

Imagine I have millions of dfthat needs to be appended/concatenated each time I read a new file into a DataFrame object.

想象一下df，每次我将新文件读入 DataFrame 对象时，我都需要附加/连接数百万个。

But when I tried to initialize an empty dataframe and then adding the new dataframes through a loop:

但是当我尝试初始化一个空的数据帧然后通过循环添加新的数据帧时：

import pandas as pd
alldf = pd.DataFrame(, columns=list('AB'))
for filename in os.listdir(indir):
    df = pd.read_csv(indir+filename, delimiter=' ')
    alldf.append(df, ignore_index=True)

This would return an empty alldfwith only the header row, e.g.

这将返回一个alldf只有标题行的空，例如

alldf = pd.DataFrame(columns=list('AB'))
df1 = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB'))
df2 = pd.DataFrame([[5, 6], [7, 8]], columns=list('AB'))
for df in [df1, df2]:
    alldf.append(df, ignore_index=True)

Answer 1

回答by philshem

df.concat()over an array of dataframes is probably the way to go, especially for clean CSVs. Butin case you suspect your CSVs are either dirty or could get recognized by read_csv()with mixed types between files, you may want to explicity create each dataframe in a loop.

df.concat()通过一系列数据帧可能是要走的路，尤其是对于干净的 CSV。但是，如果您怀疑您的 CSV 是脏的或者可能被read_csv()文件之间的混合类型识别，您可能希望在循环中明确创建每个数据帧。

You can initialize a dataframe for the first file, and then each subsequent file start with an empty dataframe based on the first.

您可以为第一个文件初始化一个数据帧，然后每个后续文件都以基于第一个的空数据帧开始。

df2 = pd.DataFrame(data=None, columns=df1.columns,index=df1.index)

This takes the structure of dataframe df1but no data, and create df2. If you want to force data type on columns, then you can do it to df1when it is created, before its structure is copied.

这需要数据帧的结构df1但没有数据，并创建df2. 如果您想在列上强制使用数据类型，那么您可以df1在创建它时，在复制其结构之前执行此操作。

more details

更多细节

Answer 2

回答by alvas

From @DSM comment, this works:

从@DSM 评论来看，这是有效的：

import pandas as pd
dfs = []
for filename in os.listdir(indir):
    df = pd.read_csv(indir+filename, delimiter=' ')
    dfs(df)

alldf = pd.concat(dfs)

pandas 初始化一个空的 DataFrame 并附加行

提问by alvas

回答by philshem

回答by alvas

相关推荐

最近更新

标签

pandas 初始化一个空的 DataFrame 并附加行

提问by alvas

回答by philshem

回答by alvas

相关推荐

pandas 从多列制作熊猫数据框行值的列表

pandas AttributeError: 模块 'numpy' 没有属性 'version'

pandas 将日期从excel文件转换为pandas

pandas python数据帧写入R数据格式

相关推荐

最近更新

标签