pandas 熊猫读取带有部分通配符的csv文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/49898742/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:29:20  来源:igfitidea点击:

Pandas reading csv files with partial wildcard

pythonpandas

提问by Kvothe

I'm trying to write a script that imports a file, then does something with the file and outputs the result into another file.

我正在尝试编写一个导入文件的脚本,然后对该文件执行某些操作并将结果输出到另一个文件中。

df = pd.read_csv('somefile2018.csv')

df = pd.read_csv('somefile2018.csv')

The above code works perfectly fine. However, I'd like to avoid hardcoding the file name in the code.

上面的代码工作得很好。但是,我想避免在代码中硬编码文件名。

The script will be run in a folder (directory) that contains the script.pyand several csv files.

该脚本将在包含script.py和多个 csv 文件的文件夹(目录)中运行。

I've tried the following:

我尝试了以下方法:

somefile_path = glob.glob('somefile*.csv')

somefile_path = glob.glob('somefile*.csv')

df = pd.read_csv(somefile_path)

df = pd.read_csv(somefile_path)

But I get the following error:

但我收到以下错误:

ValueError: Invalid file path or buffer object type: <class 'list'>

ValueError: Invalid file path or buffer object type: <class 'list'>

回答by James

globreturns a list, not a string. The read_csvfunction takes a string as the input to find the file. Try this:

glob返回一个列表,而不是一个字符串。该read_csv函数将一个字符串作为输入来查找文件。尝试这个:

for f in glob('somefile*.csv'):
    df = pd.read_csv(f)
    ...
    # the rest of your script

回答by iDrwish

You can get the list of the CSV files in the script and loop over them.

您可以在脚本中获取 CSV 文件的列表并循环遍历它们。

from os import listdir
from os.path import isfile, join
mypath = os.getcwd()

csvfiles = [f for f in listdir(mypath) if isfile(join(mypath, f)) if '.csv' in f]

for f in csvfiles:
    pd.read_csv(f)
# the rest of your script

回答by pleicht17

To read all of the files that follow a certain pattern, so long as they share the same schema, use this function:

要读取遵循特定模式的所有文件,只要它们共享相同的架构,请使用此函数:

import glob
import pandas as pd

def pd_read_pattern(pattern):
    files = glob.glob(pattern)

    df = pd.DataFrame()
    for f in files:
        df = df.append(pd.read_csv(f))

    return df.reset_index(drop=True)

df = pd_read_pattern('somefile*.csv')

This will work with either an absolute or relative path.

这将适用于绝对或相对路径。

回答by Boud

Loop over each file and build a list of DataFrame, then assemble them together using concat.

循环遍历每个文件并构建一个 DataFrame 列表,然后使用concat.