以文件名作为列标题将多个 *.txt 文件读入 Pandas Dataframe

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/26415906/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:35:18  来源:igfitidea点击:

Read multiple *.txt files into Pandas Dataframe with filename as column header

python-2.7csvtextpandasdataframe

提问by edesz

I am trying to import a set of *.txt files. I need to import the files into successive columns of a Pandas DataFrame in Python.

我正在尝试导入一组 *.txt 文件。我需要将文件导入 Python 中 Pandas DataFrame 的连续列中。

Requirements and Background information:

要求和背景信息:

  1. Each file has one column of numbers
  2. No headers are present in the files
  3. Positive and negative integers are possible
  4. The size of all the *.txt files is the same
  5. The columns of the DataFrame must have the name of file (without extension) as the header
  6. The number of files is not known ahead of time
  1. 每个文件都有一列数字
  2. 文件中不存在标题
  3. 正整数和负整数都是可能的
  4. 所有 *.txt 文件的大小相同
  5. DataFrame 的列必须以文件名(不带扩展名)作为标题
  6. 提前不知道文件数量

Here is one sample *.txt file. All the others have the same format.

这是一个示例 *.txt 文件。所有其他人都具有相同的格式。

16
54
-314
1
15
4
153
86
4
64
373
3
434
31
93
53
873
43
11
533
46

Here is my attempt:

这是我的尝试:

import pandas as pd
import os
import glob

# Step 1: get a list of all csv files in target directory
my_dir = "C:\Python27\Files\"
filelist = []
filesList = []
os.chdir( my_dir )

# Step 2: Build up list of files:
for files in glob.glob("*.txt"):
    fileName, fileExtension = os.path.splitext(files)
    filelist.append(fileName) #filename without extension
    filesList.append(files) #filename with extension

# Step 3: Build up DataFrame:
df = pd.DataFrame()
for ijk in filelist:
    frame = pd.read_csv(filesList[ijk])
    df = df.append(frame)
print df

Steps 1 and 2 work. I am having problems with step 3. I get the following error message:

步骤 1 和 2 起作用。我在执行第 3 步时遇到问题。我收到以下错误消息:

Traceback (most recent call last):
  File "C:\Python27\TextFile.py", line 26, in <module>
    frame = pd.read_csv(filesList[ijk])
TypeError: list indices must be integers, not str

Question: Is there a better way to load these *.txt files into a Pandas dataframe? Why does read_csv not accept strings for file names?

问题:有没有更好的方法将这些 *.txt 文件加载到 Pandas 数据框中?为什么 read_csv 不接受文件名的字符串?

回答by CT Zhu

You can read them into multiple dataframes and concat them together afterwards. Suppose you have two of those files, containing the data shown.

您可以将它们读入多个数据帧,然后将它们连接在一起。假设您有其中两个文件,其中包含显示的数据。

In [6]:
filelist = ['val1.txt', 'val2.txt']
print pd.concat([pd.read_csv(item, names=[item[:-4]]) for item in filelist], axis=1)
    val1  val2
0     16    16
1     54    54
2   -314  -314
3      1     1
4     15    15
5      4     4
6    153   153
7     86    86
8      4     4
9     64    64
10   373   373
11     3     3
12   434   434
13    31    31
14    93    93
15    53    53
16   873   873
17    43    43
18    11    11
19   533   533
20    46    46

回答by Kracit

You're very close. ijkis the filename already, you don't need to access the list:

你很亲近。ijk已经是文件名,您不需要访问列表:

# Step 3: Build up DataFrame: df = pd.DataFrame() for ijk in filelist: frame = pd.read_csv(ijk) df = df.append(frame) print df

# Step 3: Build up DataFrame: df = pd.DataFrame() for ijk in filelist: frame = pd.read_csv(ijk) df = df.append(frame) print df

In the future, please provide working code exactly as is. You import from pandas import *yet then refer to pandas as pd, implying the import import pandas as pd.

将来,请完全按原样提供工作代码。您导入from pandas import *然后将Pandas称为 pd,暗示 import import pandas as pd

You also want to be careful with variable names. filesis actually a single file path, and filelistand filesListhave no discernible difference from the variable name. It also seems like a bad idea to keep personal documents in your python directory.

您还需要注意变量名称。files实际上是单个文件路径,并且filelistfilesList变量名称没有明显区别。将个人文档保存在 python 目录中似乎也是一个坏主意。