Python Pandas 添加文件名列 CSV
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/41857659/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python Pandas add Filename Column CSV
提问by specmer
My python code works correctly in the below example. My code combines a directory of CSV files and matches the headers. However, I want to take it a step further - how do I add a column that appends the filename of the CSV that was used?
我的 python 代码在下面的例子中工作正常。我的代码组合了一个 CSV 文件目录并匹配标题。但是,我想更进一步 - 如何添加一个列来附加所使用的 CSV 的文件名?
import pandas as pd
import glob
globbed_files = glob.glob("*.csv") #creates a list of all csv files
data = [] # pd.concat takes a list of dataframes as an agrument
for csv in globbed_files:
frame = pd.read_csv(csv)
data.append(frame)
bigframe = pd.concat(data, ignore_index=True) #dont want pandas to try an align row indexes
bigframe.to_csv("Pandas_output2.csv")
回答by Mike Müller
This should work:
这应该有效:
import os
for csv in globbed_files:
frame = pd.read_csv(csv)
frame['filename'] = os.path.basename(csv)
data.append(frame)
frame['filename']
creates a new column named filename
and os.path.basename()
turns a path like /a/d/c.txt
into the filename c.txt
.
frame['filename']
创建一个名为新列filename
和os.path.basename()
原来一样的路径/a/d/c.txt
到文件名c.txt
。
回答by Daniel Butler
Mike's answer above works perfectly. In case any googlers run into the following error:
上面迈克的回答完美无缺。如果任何 googlers 遇到以下错误:
>>> TypeError: cannot concatenate object of type "<type 'str'>";
only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid
It's possibly because the separator is not correct. I was using a custom csv file so the separator was ^
. Becuase of that I needed to include the separator in the pd.read_csv
call.
可能是因为分隔符不正确。我使用的是自定义 csv 文件,所以分隔符是^
. 因为我需要在pd.read_csv
调用中包含分隔符。
import os
for csv in globbed_files:
frame = pd.read_csv(csv, sep='^')
frame['filename'] = os.path.basename(csv)
data.append(frame)