pandas 从 read_csv 中提取文件名 - Python

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/50337843/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:33:45  来源:igfitidea点击:

Extract file name from read_csv - Python

pythonstringpandas

提问by JD2775

I have a script that current reads raw data from a .csv file and performs some pandas data analysis against the data. Currently the .csv file is hardcoded and is read in like this:

我有一个脚本,当前从 .csv 文件读取原始数据并对数据执行一些 Pandas 数据分析。目前 .csv 文件是硬编码的,读取方式如下:

data = pd.read_csv('test.csv',sep="|", names=col)

I want to change 2 things:

我想改变两件事:

  1. I want to turn this into a loop so it loops through a directory of .csv files and executes the pandas analysis below each one in the script.

  2. I want to take each .csv file and strip the '.csv' and store that in a another list variable, let's call it 'new_table_list'.

  1. 我想把它变成一个循环,以便它遍历 .csv 文件的目录并在脚本中的每个文件下面执行Pandas分析。

  2. 我想获取每个 .csv 文件并删除“.csv”并将其存储在另一个列表变量中,我们称之为“new_table_list”。

I think I need something like below, at least for the 1st point(though I know this isn't completely correct). I am not sure how to address the 2nd point

我想我需要像下面这样的东西,至少在第一点(尽管我知道这并不完全正确)。我不知道如何解决第二点

Any help is appreciated

任何帮助表示赞赏

import os 

path = '\test\test\csvfiles'
table_list = []

for filename in os.listdir(path):
    if filename.endswith('.csv'):
        table_list.append(file)
data = pd.read_csv(table_list,sep="|", names=col)

采纳答案by Yuvraj Jaiswal

Many ways to do it

有很多方法可以做到

for filename in os.listdir(path):
    if filename.endswith('.csv'):
        table_list.append(pd.read_csv(filename,sep="|"))
        new_table_list.append(filename.split(".")[0])

One more

多一个

for filename in os.listdir(path):
    if filename.endswith('.csv'):
        table_list.append(pd.read_csv(filename,sep="|"))
        new_table_list.append(filename[:-4])

and many more

还有很多

As @barmar pointed out, better to append path as well to the table_listto avoid any issues related to path and location of files and script.

正如@barmar 指出的那样,最好将路径附加到table_list以避免与文件和脚本的路径和位置相关的任何问题。

回答by Paulo Scardine

You can try something like this:

你可以尝试这样的事情:

import glob

data = {}
for filename in glob.glob('/path/to/csvfiles/*.csv'):
    data[filename[:-4]] = pd.read_csv(filename, sep="|", names=col)

Then data.keys()is the list of filenames without the ".csv" part and data.values()is a list with one pandas dataframe for each file.

然后data.keys()是没有“.csv”部分的文件名列表,并且data.values()是一个列表,每个文件都有一个 Pandas 数据框。

回答by piRSquared

I'd start with using pathlib.

我会开始使用pathlib.

from pathlib import Path

And then leverage the stemattribute and globmethod.

然后利用stem属性和glob方法。

Let's make an import function.

让我们做一个导入功能。

def read_csv(f):
    return pd.read_csv(table_list, sep="|")

The most generic approach would be to store in a dictionary.

最通用的方法是存储在字典中。

p = Path('\test\test\csvfiles')
dod = {f.stem: read_csv(f) for f in p.glob('*.csv')}

And you can also use pd.concatto turn that into a dataframe.

您还可以使用pd.concat将其转换为数据框。

df = pd.concat(dod)

回答by Ziyad Moraished

to get the list CSV files in the directory use globit is easier than os

要获取目录中的列表 CSV 文件,使用glob它比os

from glob import glob 

# csvs will contain all CSV files names ends with .csv in a list
csvs = glob('you\dir\to\csvs_folder\*.csv')

# remove the trailing .csv from CSV files names
new_table_list = [csv[:-3] for csv in csvs]

# read csvs as dataframes
dfs = [pd.read_csv(csv, sep="|", names=col) for csv in csvs]

#concatenate all dataframes into a single dataframe
df = pd.concat(dfs, ignore_index=True)

回答by Joe

you can try so:

你可以试试:

import os
path = 'your path'
all_csv_files = [f for f in os.listdir(path) if f.endswith('.csv')]
for f in all_csv_files:
    data = pd.read_csv(os.path.join(path, f), sep="|", names=col)

# list without .csv
files = [f[:-4] for f all_csv_files]