Python 将 CSV 加载到 Pandas MultiIndex DataFrame
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19103624/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Load CSV to Pandas MultiIndex DataFrame
提问by Handloomweaver
I have a 719mb CSV file that looks like:
我有一个 719mb 的 CSV 文件,如下所示:
from, to, dep, freq, arr, code, mode (header row)
RGBOXFD,RGBPADTON,127,0,27,99999,2
RGBOXFD,RGBPADTON,127,0,33,99999,2
RGBOXFD,RGBRDLEY,127,0,1425,99999,2
RGBOXFD,RGBCHOLSEY,127,0,52,99999,2
RGBOXFD,RGBMDNHEAD,127,0,91,99999,2
RGBDIDCOTP,RGBPADTON,127,0,46,99999,2
RGBDIDCOTP,RGBPADTON,127,0,3,99999,2
RGBDIDCOTP,RGBCHOLSEY,127,0,61,99999,2
RGBDIDCOTP,RGBRDLEY,127,0,1430,99999,2
RGBDIDCOTP,RGBPADTON,127,0,115,99999,2
and so on...
I want to load in to a pandas DataFrame. Now I know there is a load from csv method:
我想加载到 Pandas DataFrame。现在我知道有来自 csv 方法的负载:
r = pd.DataFrame.from_csv('test_data2.csv')
But I specifically want to load it as a 'MultiIndex' DataFrame where from and to are the indexes:
但我特别想将它加载为“多索引”数据帧,其中 from 和 to 是索引:
So ending up with:
所以最终得到:
dep, freq, arr, code, mode
RGBOXFD RGBPADTON 127 0 27 99999 2
RGBRDLEY 127 0 33 99999 2
RGBCHOLSEY 127 0 1425 99999 2
RGBMDNHEAD 127 0 1525 99999 2
etc. I'm not sure how to do that?
等等。我不知道该怎么做?
采纳答案by DSM
You could use pd.read_csv
:
你可以使用pd.read_csv
:
>>> df = pd.read_csv("test_data2.csv", index_col=[0,1], skipinitialspace=True)
>>> df
dep freq arr code mode
from to
RGBOXFD RGBPADTON 127 0 27 99999 2
RGBPADTON 127 0 33 99999 2
RGBRDLEY 127 0 1425 99999 2
RGBCHOLSEY 127 0 52 99999 2
RGBMDNHEAD 127 0 91 99999 2
RGBDIDCOTP RGBPADTON 127 0 46 99999 2
RGBPADTON 127 0 3 99999 2
RGBCHOLSEY 127 0 61 99999 2
RGBRDLEY 127 0 1430 99999 2
RGBPADTON 127 0 115 99999 2
where I've used skipinitialspace=True
to get rid of those annoying spaces in the header row.
我过去常常skipinitialspace=True
在标题行中去掉那些烦人的空格。
回答by 7stud
from_csv() works similarly:
from_csv() 的工作原理类似:
import pandas as pd
df = pd.DataFrame.from_csv(
'data.txt',
index_col = [0, 1]
)
print df
--output:--
dep freq arr code mode
from to
RGBOXFD RGBPADTON 127 0 27 99999 2
RGBPADTON 127 0 33 99999 2
RGBRDLEY 127 0 1425 99999 2
RGBCHOLSEY 127 0 52 99999 2
RGBMDNHEAD 127 0 91 99999 2
RGBDIDCOTP RGBPADTON 127 0 46 99999 2
RGBPADTON 127 0 3 99999 2
RGBCHOLSEY 127 0 61 99999 2
RGBRDLEY 127 0 1430 99999 2
RGBPADTON 127 0 115 99999 2
From this discussion,
从这次讨论中,
https://github.com/pydata/pandas/issues/4916
https://github.com/pydata/pandas/issues/4916
it looks like read_csv() was implemented to allow you to set more options, which makes from_csv() superfluous.
看起来 read_csv() 的实现是为了让您设置更多选项,这使得 from_csv() 变得多余。