pandas 初始化一个空的 DataFrame 并附加行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/43175865/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Initializing an empty DataFrame and appending rows
提问by alvas
Different from creating an empty dataframe and populating rows later, I have many many dataframes that needs to be concatenated.
与创建空数据框并稍后填充行不同,我有许多需要连接的数据框。
If there were only two data frames, I can do this:
如果只有两个数据框,我可以这样做:
df1 = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB'))
df2 = pd.DataFrame([[5, 6], [7, 8]], columns=list('AB'))
df1.append(df2, ignore_index=True)
Imagine I have millions of df
that needs to be appended/concatenated each time I read a new file into a DataFrame object.
想象一下df
,每次我将新文件读入 DataFrame 对象时,我都需要附加/连接数百万个。
But when I tried to initialize an empty dataframe and then adding the new dataframes through a loop:
但是当我尝试初始化一个空的数据帧然后通过循环添加新的数据帧时:
import pandas as pd
alldf = pd.DataFrame(, columns=list('AB'))
for filename in os.listdir(indir):
df = pd.read_csv(indir+filename, delimiter=' ')
alldf.append(df, ignore_index=True)
This would return an empty alldf
with only the header row, e.g.
这将返回一个alldf
只有标题行的空,例如
alldf = pd.DataFrame(columns=list('AB'))
df1 = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB'))
df2 = pd.DataFrame([[5, 6], [7, 8]], columns=list('AB'))
for df in [df1, df2]:
alldf.append(df, ignore_index=True)
回答by philshem
df.concat()
over an array of dataframes is probably the way to go, especially for clean CSVs. Butin case you suspect your CSVs are either dirty or could get recognized by read_csv()
with mixed types between files, you may want to explicity create each dataframe in a loop.
df.concat()
通过一系列数据帧可能是要走的路,尤其是对于干净的 CSV。但是,如果您怀疑您的 CSV 是脏的或者可能被read_csv()
文件之间的混合类型识别,您可能希望在循环中明确创建每个数据帧。
You can initialize a dataframe for the first file, and then each subsequent file start with an empty dataframe based on the first.
您可以为第一个文件初始化一个数据帧,然后每个后续文件都以基于第一个的空数据帧开始。
df2 = pd.DataFrame(data=None, columns=df1.columns,index=df1.index)
This takes the structure of dataframe df1
but no data, and create df2
. If you want to force data type on columns, then you can do it to df1
when it is created, before its structure is copied.
这需要数据帧的结构df1
但没有数据,并创建df2
. 如果您想在列上强制使用数据类型,那么您可以df1
在创建它时,在复制其结构之前执行此操作。
回答by alvas
From @DSM comment, this works:
从@DSM 评论来看,这是有效的:
import pandas as pd
dfs = []
for filename in os.listdir(indir):
df = pd.read_csv(indir+filename, delimiter=' ')
dfs(df)
alldf = pd.concat(dfs)