pandas 如何在 Python 中读取大文本文件?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18602226/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to read a large text file in Python?
提问by beginagain
I am using Enthought Canopy (a set of many different Python Library packages e.g. NumPy, Pandas,etc) for data analysis. I am trying to read a text file and create a dataframe out of it. The text file has 1180598 rows and 18 columns. All columns have numbers in them. I wrote following code for reading and naming data columns:
我正在使用 Enthought Canopy(一组许多不同的 Python 库包,例如 NumPy、Pandas 等)进行数据分析。我正在尝试读取一个文本文件并从中创建一个数据框。文本文件有 1180598 行和 18 列。所有列中都有数字。我编写了以下用于读取和命名数据列的代码:
from pandas import DataFrame, read_csv
import matplotlib.pyplot as plt
import pandas as pd
print 'Pandas Version ' + pd.__version__
Pandas Version 0.12.0
location=r'C:\UMAIR\Directed Studies\US-101 Data\Main Data\US-101-Main-Data\vehicle-trajectory-data50am-0805am\tra.txt'
df=read_csv(location, names=['Vehicle ID','Frame ID','Total Frames','Global Time','Local X','Local Y','Global X','Global Y','Vehicle Length','Vehicle Width','Vehicle Class','Vehicle Velocity','Vehicle Acceleration','Lane Identification','Preceding Vehicle','Following Vehicle','Spacing','Headway'])
df
Out[41]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1180598 entries, 0 to 1180597
Data columns (total 18 columns):
Vehicle ID 1180598 non-null values
Frame ID 0 non-null values
Total Frames 0 non-null values
Global Time 0 non-null values
Local X 0 non-null values
Local Y 0 non-null values
Global X 0 non-null values
Global Y 0 non-null values
Vehicle Length 0 non-null values
Vehicle Width 0 non-null values
Vehicle Class 0 non-null values
Vehicle Velocity 0 non-null values
Vehicle Acceleration 0 non-null values
Lane Identification 0 non-null values
Preceding Vehicle 0 non-null values
Following Vehicle 0 non-null values
Spacing 0 non-null values
Headway 0 non-null values
dtypes: float64(17), object(1)
As you can see from Out[41], the file was read to have 1 column only. What should I do to let Python know that my file has 18 columns so that it is read the way it is meant to be?
正如您从 Out[41] 中看到的,读取的文件只有 1 列。我该怎么做才能让 Python 知道我的文件有 18 列,以便按预期方式读取它?
采纳答案by elyase
This will import your dataset correctly:
这将正确导入您的数据集:
df = pd.read_csv(location, names=names, header=None, delim_whitespace=True)

