将多个 CSV 文件读入 Python Pandas 数据帧

Question

提问by user892627

The general use case behind the question is to read multiple CSV log files from a target directory into a single Python Pandas DataFrame for quick turnaround statistical analysis & charting. The idea for utilizing Pandas vs MySQL is to conduct this data import or append + stat analysis periodically throughout the day.

问题背后的一般用例是将多个 CSV 日志文件从目标目录读取到单个 Python Pandas DataFrame 中，以便快速周转统计分析和图表。利用 Pandas 与 MySQL 的想法是全天定期进行此数据导入或附加 + 统计分析。

The script below attempts to read all of the CSV (same file layout) files into a single Pandas dataframe & adds a year column associated with each file read.

下面的脚本尝试将所有 CSV（相同文件布局）文件读取到单个 Pandas 数据帧中，并添加与读取的每个文件关联的年份列。

The problem with the script is it now only reads the very last file in the directory instead of the desired outcome being allfiles within the targeted directory.

该脚本的问题在于它现在只读取目录中的最后一个文件，而不是目标目录中的所有文件。

# Assemble all of the data files into a single DataFrame & add a year field
# 2010 is the last available year
years = range(1880, 2011)

for year in years:
    path ='C:\Documents and Settings\Foo\My Documents\pydata-book\pydata-book-master`\ch02\names\yob%d.txt' % year
    frame = pd.read_csv(path, names=columns)

    frame['year'] = year
    pieces.append(frame)

# Concatenates everything into a single Dataframe
names = pd.concat(pieces, ignore_index=True)

# Expected row total should be 1690784
names
<class 'pandas.core.frame.DataFrame'>
Int64Index: 33838 entries, 0 to 33837
Data columns:
name      33838  non-null values
sex       33838  non-null values
births    33838  non-null values
year      33838  non-null values
dtypes: int64(2), object(2)

# Start aggregating the data at the year & gender level using groupby or pivot
total_births = names.pivot_table('births', rows='year', cols='sex', aggfunc=sum)
# Prints pivot table
total_births.tail()

Out[35]:
sex     F   M
year        
2010    1759010     1898382

Answer 1

回答by Greg Reda

The appendmethod on an instance of a DataFrame does not function the same as the appendmethod on an instance of a list. Dataframe.append()does not occur in-place and instead returns a new object.

在append对数据帧的实例方法不起作用一样append在列表的实例方法。 Dataframe.append()不会就地发生，而是返回一个新对象。

years = range(1880, 2011)

names = pd.DataFrame()
for year in years:
    path ='C:\Documents and Settings\Foo\My Documents\pydata-book\pydata-book-master`\ch02\names\yob%d.txt' % year
    frame = pd.read_csv(path, names=columns)

    frame['year'] = year
    names = names.append(frame, ignore_index=True)

or you can use concat:

或者你可以使用concat：

years = range(1880, 2011)

names = pd.DataFrame()
for year in years:
    path ='C:\Documents and Settings\Foo\My Documents\pydata-book\pydata-book-master`\ch02\names\yob%d.txt' % year
    frame = pd.read_csv(path, names=columns)

    frame['year'] = year
    names = pd.concat(names, frame, ignore_index=True)

Answer 2

回答by cromastro

I could not get either one of the above answers to work. The first answer was close, but the line space between the second and third lines after the forweren't right. I used the below code snippet in Canopy. Also, for those who are interested... this problem came from an example in "Python for Data Analysis". (An enjoyable book so far otherwise)

我无法获得上述任何一个答案。第一个答案很接近，但是后面的第二行和第三行之间的行距for不对。我在 Canopy 中使用了以下代码片段。另外，对于那些有兴趣的人......这个问题来自“Python for Data Analysis”中的一个例子。（到目前为止，一本有趣的书，否则）

import pandas as pd

years = range(1880,2011)
columns = ['name','sex','births']
names = pd.DataFrame()

for year in years:
    path = 'C:/PythonData/pydata-book-master/pydata-book-master/ch02/names/yob%d.txt' % year
    frame = pd.read_csv(path, names=columns)
    frame['year'] = year
    names = names.append(frame,ignore_index=True)

Answer 3

回答by user3290447

remove the line space between:

删除之间的行空间：

    frame = pd.read_csv(path, names=columns)

&

    frame['year'] = year

so it reads

所以它读

    for year in years:
        path ='C:\Documents and Settings\Foo\My Documents\pydata-book\pydata-book-master`\ch02\names\yob%d.txt' % year
        frame = pd.read_csv(path, names=columns)
        frame['year'] = year
        names = pd.append(names, frame, ignore_index=True)

将多个 CSV 文件读入 Python Pandas 数据帧

提问by user892627

The problem with the script is it now only reads the very last file in the directory instead of the desired outcome being allfiles within the targeted directory.

该脚本的问题在于它现在只读取目录中的最后一个文件，而不是目标目录中的所有文件。

回答by Greg Reda

回答by cromastro

回答by user3290447

相关推荐

最近更新

标签

将多个 CSV 文件读入 Python Pandas 数据帧

提问by user892627

The problem with the script is it now only reads the very last file in the directory instead of the desired outcome being allfiles within the targeted directory.

该脚本的问题在于它现在只读取目录中的最后一个文件，而不是目标目录中的所有文件。

回答by Greg Reda

回答by cromastro

回答by user3290447

相关推荐

Python: URLError: <urlopen 错误 [Errno 10060]

在 OSX 上调用 python 和 Spyder 的方法

Python 使用 IDLE 时的工作目录是什么？

Python 在列表中添加奇数

相关推荐

最近更新

标签