pandas 从多个文件创建熊猫数据框

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/10545957/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 15:43:09  来源:igfitidea点击:

creating pandas data frame from multiple files

pythonpandas

提问by Abhi

I am trying to create a pandas DataFrameand it works fine for a single file. If I need to build it for multiple files which have the same data structure. So instead of single file name I have a list of file names from which I would like to create the DataFrame.

我正在尝试创建一个熊猫DataFrame,它适用于单个文件。如果我需要为具有相同数据结构的多个文件构建它。因此,我有一个文件名列表,而不是单个文件名,我想从中创建DataFrame.

Not sure what's the way to append to current DataFramein pandas or is there a way for pandas to suck a list of files into a DataFrame.

不确定DataFrame在 Pandas 中附加到 current 的方法是什么,或者是否有办法让 Pandas 将文件列表吸入DataFrame.

回答by zach

The pandas concatcommand is your friend here. Lets say you have all you files in a directory, targetdir. You can:

pandasconcat命令是您的朋友。假设您将所有文件都放在一个目录 targetdir 中。你可以:

  1. make a list of the files
  2. load them as pandas dataframes
  3. and concatenate them together
  1. 列出文件列表
  2. 将它们加载为熊猫数据帧
  3. 并将它们连接在一起

`

`

import os
import pandas as pd

#list the files
filelist = os.listdir(targetdir) 
#read them into pandas
df_list = [pd.read_table(file) for file in filelist]
#concatenate them together
big_df = pd.concat(df_list)

回答by mrdevlar

Potentially horribly inefficient but...

可能效率低下,但...

Why not use read_csv, to build two (or more) dataframes, then use join to put them together?

为什么不使用read_csv, 构建两个(或更多)数据框,然后使用 join 将它们放在一起?

That said, it would be easier to answer your question if you provide some data or some of the code you've used thus far.

也就是说,如果您提供一些数据或迄今为止您使用过的一些代码,那么回答您的问题会更容易。

回答by Jose Blanca

I might try to concatenate the files before feeding them to pandas. If you're in Linux or Mac you could use cat, otherwise a very simple Python function could do the job for you.

我可能会尝试将文件连接起来,然后再将它们提供给熊猫。如果您使用的是 Linux 或 Mac cat,则可以使用,否则一个非常简单的 Python 函数就可以为您完成这项工作。

回答by nitin

Are these files in a csv format. You could use the read_csv. http://pandas.sourceforge.net/io.html

这些文件是 csv 格式吗?您可以使用 read_csv。 http://pandas.sourceforge.net/io.html

Once you have read the files and save it in two dataframes, you could merge the two dataframes or add additional columns to one of the two dataframes( assuming common index). Pandas should be able to fill in missing rows.

读取文件并将其保存在两个数据帧中后,您可以合并两个数据帧或向两个数据帧之一添加额外的列(假设有公共索引)。Pandas 应该能够填充缺失的行。