Python 将 CSV 加载到 Pandas MultiIndex DataFrame

Question

提问by Handloomweaver

I have a 719mb CSV file that looks like:

我有一个 719mb 的 CSV 文件，如下所示：

from, to, dep, freq, arr, code, mode   (header row)
RGBOXFD,RGBPADTON,127,0,27,99999,2
RGBOXFD,RGBPADTON,127,0,33,99999,2
RGBOXFD,RGBRDLEY,127,0,1425,99999,2
RGBOXFD,RGBCHOLSEY,127,0,52,99999,2
RGBOXFD,RGBMDNHEAD,127,0,91,99999,2
RGBDIDCOTP,RGBPADTON,127,0,46,99999,2
RGBDIDCOTP,RGBPADTON,127,0,3,99999,2
RGBDIDCOTP,RGBCHOLSEY,127,0,61,99999,2
RGBDIDCOTP,RGBRDLEY,127,0,1430,99999,2
RGBDIDCOTP,RGBPADTON,127,0,115,99999,2
and so on...

I want to load in to a pandas DataFrame. Now I know there is a load from csv method:

我想加载到 Pandas DataFrame。现在我知道有来自 csv 方法的负载：

 r = pd.DataFrame.from_csv('test_data2.csv')

But I specifically want to load it as a 'MultiIndex' DataFrame where from and to are the indexes:

但我特别想将它加载为“多索引”数据帧，其中 from 和 to 是索引：

So ending up with:

所以最终得到：

                   dep, freq, arr, code, mode
RGBOXFD RGBPADTON  127     0   27  99999    2
        RGBRDLEY   127     0   33  99999    2
        RGBCHOLSEY 127     0 1425  99999    2
        RGBMDNHEAD 127     0 1525  99999    2

etc. I'm not sure how to do that?

等等。我不知道该怎么做？

Answer 1

采纳答案by DSM

You could use pd.read_csv:

你可以使用pd.read_csv：

>>> df = pd.read_csv("test_data2.csv", index_col=[0,1], skipinitialspace=True)
>>> df
                       dep  freq   arr   code  mode
from       to                                      
RGBOXFD    RGBPADTON   127     0    27  99999     2
           RGBPADTON   127     0    33  99999     2
           RGBRDLEY    127     0  1425  99999     2
           RGBCHOLSEY  127     0    52  99999     2
           RGBMDNHEAD  127     0    91  99999     2
RGBDIDCOTP RGBPADTON   127     0    46  99999     2
           RGBPADTON   127     0     3  99999     2
           RGBCHOLSEY  127     0    61  99999     2
           RGBRDLEY    127     0  1430  99999     2
           RGBPADTON   127     0   115  99999     2

where I've used skipinitialspace=Trueto get rid of those annoying spaces in the header row.

我过去常常skipinitialspace=True在标题行中去掉那些烦人的空格。

Answer 2

回答by 7stud

from_csv() works similarly:

from_csv() 的工作原理类似：

import pandas as pd

df = pd.DataFrame.from_csv(
    'data.txt',
    index_col = [0, 1]
)

print df

--output:--
                        dep   freq   arr   code   mode
from        to                                        
RGBOXFD    RGBPADTON    127      0    27  99999      2
           RGBPADTON    127      0    33  99999      2
           RGBRDLEY     127      0  1425  99999      2
           RGBCHOLSEY   127      0    52  99999      2
           RGBMDNHEAD   127      0    91  99999      2
RGBDIDCOTP RGBPADTON    127      0    46  99999      2
           RGBPADTON    127      0     3  99999      2
           RGBCHOLSEY   127      0    61  99999      2
           RGBRDLEY     127      0  1430  99999      2
           RGBPADTON    127      0   115  99999      2

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.from_csv.html#pandas.DataFrame.from_csv

From this discussion,

从这次讨论中，

https://github.com/pydata/pandas/issues/4916

it looks like read_csv() was implemented to allow you to set more options, which makes from_csv() superfluous.

看起来 read_csv() 的实现是为了让您设置更多选项，这使得 from_csv() 变得多余。

Python 将 CSV 加载到 Pandas MultiIndex DataFrame

提问by Handloomweaver

采纳答案by DSM

回答by 7stud

相关推荐

最近更新

标签

Python 将 CSV 加载到 Pandas MultiIndex DataFrame

提问by Handloomweaver

采纳答案by DSM

回答by 7stud

相关推荐

连接由 Python 2.7 中的 peer [errno 104] 重置

Python 如何遍历数据帧的行并检查列行中的值是否为 NaN

Python 用 str.replace() 替换数字

使用 Python 对 REST API 的 PUT 请求

相关推荐

最近更新

标签