pandas 从 read_csv 中提取文件名 - Python

Question

提问by JD2775

I have a script that current reads raw data from a .csv file and performs some pandas data analysis against the data. Currently the .csv file is hardcoded and is read in like this:

我有一个脚本，当前从 .csv 文件读取原始数据并对数据执行一些 Pandas 数据分析。目前 .csv 文件是硬编码的，读取方式如下：

data = pd.read_csv('test.csv',sep="|", names=col)

I want to change 2 things:

我想改变两件事：

I want to turn this into a loop so it loops through a directory of .csv files and executes the pandas analysis below each one in the script.
I want to take each .csv file and strip the '.csv' and store that in a another list variable, let's call it 'new_table_list'.

我想把它变成一个循环，以便它遍历 .csv 文件的目录并在脚本中的每个文件下面执行Pandas分析。
我想获取每个 .csv 文件并删除“.csv”并将其存储在另一个列表变量中，我们称之为“new_table_list”。

I think I need something like below, at least for the 1st point(though I know this isn't completely correct). I am not sure how to address the 2nd point

我想我需要像下面这样的东西，至少在第一点（尽管我知道这并不完全正确）。我不知道如何解决第二点

Any help is appreciated

任何帮助表示赞赏

import os 

path = '\test\test\csvfiles'
table_list = []

for filename in os.listdir(path):
    if filename.endswith('.csv'):
        table_list.append(file)
data = pd.read_csv(table_list,sep="|", names=col)

Answer 1

采纳答案by Yuvraj Jaiswal

Many ways to do it

有很多方法可以做到

for filename in os.listdir(path):
    if filename.endswith('.csv'):
        table_list.append(pd.read_csv(filename,sep="|"))
        new_table_list.append(filename.split(".")[0])

One more

多一个

for filename in os.listdir(path):
    if filename.endswith('.csv'):
        table_list.append(pd.read_csv(filename,sep="|"))
        new_table_list.append(filename[:-4])

and many more

还有很多

As @barmar pointed out, better to append path as well to the table_listto avoid any issues related to path and location of files and script.

正如@barmar 指出的那样，最好将路径附加到table_list以避免与文件和脚本的路径和位置相关的任何问题。

Answer 2

回答by Paulo Scardine

You can try something like this:

你可以尝试这样的事情：

import glob

data = {}
for filename in glob.glob('/path/to/csvfiles/*.csv'):
    data[filename[:-4]] = pd.read_csv(filename, sep="|", names=col)

Then data.keys()is the list of filenames without the ".csv" part and data.values()is a list with one pandas dataframe for each file.

然后data.keys()是没有“.csv”部分的文件名列表，并且data.values()是一个列表，每个文件都有一个 Pandas 数据框。

Answer 3

回答by piRSquared

I'd start with using pathlib.

我会开始使用pathlib.

from pathlib import Path

And then leverage the stemattribute and globmethod.

然后利用stem属性和glob方法。

Let's make an import function.

让我们做一个导入功能。

def read_csv(f):
    return pd.read_csv(table_list, sep="|")

The most generic approach would be to store in a dictionary.

最通用的方法是存储在字典中。

p = Path('\test\test\csvfiles')
dod = {f.stem: read_csv(f) for f in p.glob('*.csv')}

And you can also use pd.concatto turn that into a dataframe.

您还可以使用pd.concat将其转换为数据框。

df = pd.concat(dod)

Answer 4

回答by Ziyad Moraished

to get the list CSV files in the directory use globit is easier than os

要获取目录中的列表 CSV 文件，使用glob它比os

from glob import glob 

# csvs will contain all CSV files names ends with .csv in a list
csvs = glob('you\dir\to\csvs_folder\*.csv')

# remove the trailing .csv from CSV files names
new_table_list = [csv[:-3] for csv in csvs]

# read csvs as dataframes
dfs = [pd.read_csv(csv, sep="|", names=col) for csv in csvs]

#concatenate all dataframes into a single dataframe
df = pd.concat(dfs, ignore_index=True)

Answer 5

回答by Joe

you can try so:

你可以试试：

import os
path = 'your path'
all_csv_files = [f for f in os.listdir(path) if f.endswith('.csv')]
for f in all_csv_files:
    data = pd.read_csv(os.path.join(path, f), sep="|", names=col)

# list without .csv
files = [f[:-4] for f all_csv_files]

pandas 从 read_csv 中提取文件名 - Python

提问by JD2775

采纳答案by Yuvraj Jaiswal

回答by Paulo Scardine

回答by piRSquared

回答by Ziyad Moraished

回答by Joe

相关推荐

最近更新

标签

pandas 从 read_csv 中提取文件名 - Python

提问by JD2775

采纳答案by Yuvraj Jaiswal

回答by Paulo Scardine

回答by piRSquared

回答by Ziyad Moraished

回答by Joe

相关推荐

pandas 缩放/标准化熊猫列

pandas PermissionError：权限被拒绝在 Python 中读取 CSV 文件

Pandas 导入：ModuleNotFoundError：没有名为“pandas._libs.tslib”的模块

pandas 映射熊猫数据框中的值范围

相关推荐

最近更新

标签