Python 使用 Pandas 读取数据（.dat 文件）

Question

提问by KcFnMi

How do I read the following (two columns) data (from a .dat file) with Pandas

如何使用 Pandas 读取以下（两列）数据（来自 .dat 文件）

TIME                      XGSM
2004 006 01 00 01 37 600  1
2004 006 01 00 02 32 800  5
2004 006 01 00 03 28 000  8
2004 006 01 00 04 23 200  11
2004 006 01 00 05 18 400  17

Column separator is (at least) 2 spaces.

列分隔符是（至少）2 个空格。

I tried

我试过

df = pd.read_table("test.dat", sep="\s+", usecols=['TIME', 'XGSM'])
print df

But it prints

但它打印

Answer 1

采纳答案by jezrael

You can use parameter usecols with order of columns:

您可以使用带有列顺序的参数 usecols：

import pandas as pd
from pandas.compat import StringIO

temp=u"""TIME             XGSM
2004 006 01 00 01 37 600  1
2004 006 01 00 02 32 800  5
2004 006 01 00 03 28 000  8
2004 006 01 00 04 23 200  11
2004 006 01 00 05 18 400  17"""
#after testing replace StringIO(temp) to filename
df = pd.read_csv(StringIO(temp), 
                 sep="\s+", 
                 skiprows=1, 
                 usecols=[0,7], 
                 names=['TIME','XGSM'])

print (df)
   TIME  XGSM
0  2004     1
1  2004     5
2  2004     8
3  2004    11
4  2004    17

Edit:

编辑：

You can use separator regex- 2 and more spaces and then add engine='python'because warning:

您可以使用分隔符regex- 2 个或更多空格，然后添加engine='python'因为警告：

ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.

ParserWarning：回退到 'python' 引擎，因为 'c' 引擎不支持正则表达式分隔符（分隔符 > 1 个字符且不同于 '\s+' 被解释为正则表达式）；您可以通过指定 engine='python' 来避免此警告。

import pandas as pd
from pandas.compat import StringIO

temp=u"""TIME              XGSM
2004 006 01 00 01 37 600   1
2004 006 01 00 02 32 800   5
2004 006 01 00 03 28 000   8
2004 006 01 00 04 23 200   11
2004 006 01 00 05 18 400   17"""
#after testing replace StringIO(temp) to filename
df = pd.read_csv(StringIO(temp), sep=r'\s{2,}', engine='python')

print (df)
                       TIME  XGSM
0  2004 006 01 00 01 37 600     1
1  2004 006 01 00 02 32 800     5
2  2004 006 01 00 03 28 000     8
3  2004 006 01 00 04 23 200    11
4  2004 006 01 00 05 18 400    17

Answer 2

回答by Psidom

Could also try pd.read_fwf()(Read a table of fixed-width formatted lines into DataFrame):

也可以尝试pd.read_fwf()（将固定宽度格式的行表读入 DataFrame）：

import pandas as pd
from io import StringIO

pd.read_fwf(StringIO("""TIME                      XGSM
2004 006 01 00 01 37 600  1
2004 006 01 00 02 32 800  5
2004 006 01 00 03 28 000  8
2004 006 01 00 04 23 200  11
2004 006 01 00 05 18 400  17"""), usecols = ["TIME", "XGSM"])

#   TIME    XGSM
#0  2004    1
#1  2004    5
#2  2004    8
#3  2004    11
#4  2004    17

Python 使用 Pandas 读取数据（.dat 文件）

提问by KcFnMi

采纳答案by jezrael

回答by Psidom

相关推荐

最近更新

标签

Python 使用 Pandas 读取数据（.dat 文件）

提问by KcFnMi

采纳答案by jezrael

回答by Psidom

相关推荐

Python AttributeError: 模块“cv2.cv2”没有属性“createLBHFaceRecognizer”

Python 参数 1 具有意外类型“NoneType”？

Python 从 Pandas 聚合中重命名结果列（“FutureWarning：不推荐使用重命名的字典”）

python 3.6 socket pickle 数据被截断

相关推荐

最近更新

标签