Python Pandas 添加文件名列 CSV

Question

提问by specmer

My python code works correctly in the below example. My code combines a directory of CSV files and matches the headers. However, I want to take it a step further - how do I add a column that appends the filename of the CSV that was used?

我的 python 代码在下面的例子中工作正常。我的代码组合了一个 CSV 文件目录并匹配标题。但是，我想更进一步 - 如何添加一个列来附加所使用的 CSV 的文件名？

import pandas as pd
import glob

globbed_files = glob.glob("*.csv") #creates a list of all csv files

data = [] # pd.concat takes a list of dataframes as an agrument
for csv in globbed_files:
    frame = pd.read_csv(csv)
    data.append(frame)

bigframe = pd.concat(data, ignore_index=True) #dont want pandas to try an align row indexes
bigframe.to_csv("Pandas_output2.csv")

Answer 1

回答by Mike Müller

This should work:

这应该有效：

import os

for csv in globbed_files:
    frame = pd.read_csv(csv)
    frame['filename'] = os.path.basename(csv)
    data.append(frame)

frame['filename']creates a new column named filenameand os.path.basename()turns a path like /a/d/c.txtinto the filename c.txt.

frame['filename']创建一个名为新列filename和os.path.basename()原来一样的路径/a/d/c.txt到文件名c.txt。

Answer 2

回答by Daniel Butler

Mike's answer above works perfectly. In case any googlers run into the following error:

上面迈克的回答完美无缺。如果任何 googlers 遇到以下错误：

>>> TypeError: cannot concatenate object of type "<type 'str'>"; 
    only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid

It's possibly because the separator is not correct. I was using a custom csv file so the separator was ^. Becuase of that I needed to include the separator in the pd.read_csvcall.

可能是因为分隔符不正确。我使用的是自定义 csv 文件，所以分隔符是^. 因为我需要在pd.read_csv调用中包含分隔符。

import os

for csv in globbed_files:
    frame = pd.read_csv(csv, sep='^')
    frame['filename'] = os.path.basename(csv)
    data.append(frame)

Python Pandas 添加文件名列 CSV

提问by specmer

回答by Mike Müller

回答by Daniel Butler

相关推荐

最近更新

标签

Python Pandas 添加文件名列 CSV

提问by specmer

回答by Mike Müller

回答by Daniel Butler

相关推荐

pandas 使用pandas-Python 3遍历excel中的行和列

pandas python中两个时间戳的平均值

pandas 熊猫读取excel值而不是公式

pandas 分两只熊猫系列

相关推荐

最近更新

标签