Python 使用 Pandas 读取数据(.dat 文件)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/41025416/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Read data (.dat file) with Pandas
提问by KcFnMi
How do I read the following (two columns) data (from a .dat file) with Pandas
如何使用 Pandas 读取以下(两列)数据(来自 .dat 文件)
TIME XGSM
2004 006 01 00 01 37 600 1
2004 006 01 00 02 32 800 5
2004 006 01 00 03 28 000 8
2004 006 01 00 04 23 200 11
2004 006 01 00 05 18 400 17
Column separator is (at least) 2 spaces.
列分隔符是(至少)2 个空格。
I tried
我试过
df = pd.read_table("test.dat", sep="\s+", usecols=['TIME', 'XGSM'])
print df
But it prints
但它打印
TIME XGSM
2004 6
2004 6
2004 6
2004 6
2004 6
采纳答案by jezrael
You can use parameter usecols with order of columns:
您可以使用带有列顺序的参数 usecols:
import pandas as pd
from pandas.compat import StringIO
temp=u"""TIME XGSM
2004 006 01 00 01 37 600 1
2004 006 01 00 02 32 800 5
2004 006 01 00 03 28 000 8
2004 006 01 00 04 23 200 11
2004 006 01 00 05 18 400 17"""
#after testing replace StringIO(temp) to filename
df = pd.read_csv(StringIO(temp),
sep="\s+",
skiprows=1,
usecols=[0,7],
names=['TIME','XGSM'])
print (df)
TIME XGSM
0 2004 1
1 2004 5
2 2004 8
3 2004 11
4 2004 17
Edit:
编辑:
You can use separator regex
- 2 and more spaces and then add engine='python'
because warning:
您可以使用分隔符regex
- 2 个或更多空格,然后添加engine='python'
因为警告:
ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.
ParserWarning:回退到 'python' 引擎,因为 'c' 引擎不支持正则表达式分隔符(分隔符 > 1 个字符且不同于 '\s+' 被解释为正则表达式);您可以通过指定 engine='python' 来避免此警告。
import pandas as pd
from pandas.compat import StringIO
temp=u"""TIME XGSM
2004 006 01 00 01 37 600 1
2004 006 01 00 02 32 800 5
2004 006 01 00 03 28 000 8
2004 006 01 00 04 23 200 11
2004 006 01 00 05 18 400 17"""
#after testing replace StringIO(temp) to filename
df = pd.read_csv(StringIO(temp), sep=r'\s{2,}', engine='python')
print (df)
TIME XGSM
0 2004 006 01 00 01 37 600 1
1 2004 006 01 00 02 32 800 5
2 2004 006 01 00 03 28 000 8
3 2004 006 01 00 04 23 200 11
4 2004 006 01 00 05 18 400 17
回答by Psidom
Could also try pd.read_fwf()
(Read a table of fixed-width formatted lines into DataFrame):
也可以尝试pd.read_fwf()
(将固定宽度格式的行表读入 DataFrame):
import pandas as pd
from io import StringIO
pd.read_fwf(StringIO("""TIME XGSM
2004 006 01 00 01 37 600 1
2004 006 01 00 02 32 800 5
2004 006 01 00 03 28 000 8
2004 006 01 00 04 23 200 11
2004 006 01 00 05 18 400 17"""), usecols = ["TIME", "XGSM"])
# TIME XGSM
#0 2004 1
#1 2004 5
#2 2004 8
#3 2004 11
#4 2004 17