如何从/向 ASCII 文件写入/读取带有 MultiIndex 的 Pandas DataFrame？

Question

提问by dailyglen

I want to be able to create a Pandas DataFramewith MultiIndexes for the rows and the columns index and read it from an ASCII text file. My data looks like:

我希望能够DataFrame为行和列索引创建一个带有 MultiIndexes的 Pandas ，并从 ASCII 文本文件中读取它。我的数据看起来像：

col_indx = MultiIndex.from_tuples([('A',  'B',  'C'), ('A',  'B',  'C2'), ('A',  'B',  'C3'), 
                                   ('A',  'B2', 'C'), ('A',  'B2', 'C2'), ('A',  'B2', 'C3'), 
                                   ('A',  'B3', 'C'), ('A',  'B3', 'C2'), ('A',  'B3', 'C3'), 
                                   ('A2', 'B',  'C'), ('A2', 'B',  'C2'), ('A2', 'B',  'C3'), 
                                   ('A2', 'B2', 'C'), ('A2', 'B2', 'C2'), ('A2', 'B2', 'C3'), 
                                   ('A2', 'B3', 'C'), ('A2', 'B3', 'C2'), ('A2', 'B3', 'C3')], 
                                   names=['one','two','three']) 
row_indx = MultiIndex.from_tuples([(0,  'North', 'M'), 
                                   (1,  'East',  'F'), 
                                   (2,  'West',  'M'), 
                                   (3,  'South', 'M'), 
                                   (4,  'South', 'F'), 
                                   (5,  'West',  'F'), 
                                   (6,  'North', 'M'), 
                                   (7,  'North', 'M'), 
                                   (8,  'East',  'F'), 
                                   (9,  'South', 'M')], 
                                   names=['n', 'location', 'sex'])
size=len(row_indx), len(col_indx)
data = np.random.randint(0,10, size)
df = DataFrame(data, index=row_indx, columns=col_indx)
print df

I've tried df.to_csv()and read_csv()but they don't keep the index.

我试过了df.to_csv()，read_csv()但他们不保留索引。

I was thinking of maybe creating a new format using extra delimeters. For example, using a row of ----------------to mark the end of the column indexes and a |to mark the end of a row index. So it would look like this:

我正在考虑使用额外的分隔符创建一种新格式。例如，使用一行----------------标记列索引的结束，使用a标记|行索引的结束。所以它看起来像这样：

one            | A   A   A   A   A   A   A   A   A  A2  A2  A2  A2  A2  A2  A2  A2  A2
two            | B   B   B  B2  B2  B2  B3  B3  B3   B   B   B  B2  B2  B2  B3  B3  B3
three          | C  C2  C3   C  C2  C3   C  C2  C3   C  C2  C3   C  C2  C3   C  C2  C3
--------------------------------------------------------------------------------------
n location sex :                                                                      
0 North    M   | 2   3   9   1   0   6   5   9   5   9   4   4   0   9   6   2   6   1
1 East     F   | 6   2   9   2   7   0   0   3   7   4   8   1   3   2   1   7   7   5
2 West     M   | 5   8   9   7   6   0   3   0   2   5   0   3   9   6   7   3   4   9
3 South    M   | 6   2   3   6   4   0   4   0   1   9   3   6   2   1   0   6   9   3
4 South    F   | 9   6   0   0   6   1   7   0   8   1   7   6   2   0   8   1   5   3
5 West     F   | 7   9   7   8   2   0   4   3   8   9   0   3   4   9   2   5   1   7
6 North    M   | 3   3   5   7   9   4   2   6   3   2   7   5   5   5   6   4   2   9
7 North    M   | 7   4   8   6   8   4   5   7   9   0   2   9   1   9   7   9   5   6
8 East     F   | 1   6   5   3   6   4   6   9   6   9   2   4   2   9   8   4   2   4
9 South    M   | 9   6   6   1   3   1   3   5   7   4   8   6   7   7   8   9   2   3

Does Pandas have a way to write/read DataFrames to/from ASCII files with MultiIndexes?

Pandas 是否有办法使用 MultiIndexes 向/从 ASCII 文件写入/读取数据帧？

Answer 1

回答by diliop

Not sure which version of pandas you are using but with 0.7.3you can export your DataFrameto a TSV file and retain the indices by doing this:

不确定您使用的是哪个版本的熊猫，但0.7.3您可以DataFrame通过执行以下操作将您的熊猫导出到 TSV 文件并保留索引：

df.to_csv('mydf.tsv', sep='\t')

The reason you need to export to TSV versus CSV is since the column headers have ,characters in them. This should solve the first part of your question.

您需要导出到 TSV 而不是 CSV 的原因是因为列标题中包含,字符。这应该可以解决您问题的第一部分。

The second part gets a bit more tricky since from as far as I can tell, you need to beforehand have an idea of what you want your DataFrame to contain. In particular, you need to know:

第二部分变得有点棘手，因为据我所知，您需要事先了解您希望 DataFrame 包含什么。特别是，您需要知道：

Which columns on your TSV represent the row MultiIndex
and that the rest of the columns should also be converted to a MultiIndex

TSV 上的哪些列代表行 MultiIndex
并且其余的列也应该转换为 MultiIndex

To illustrate this, lets read back the TSV file we saved above into a new DataFrame:

为了说明这一点，让我们将上面保存的 TSV 文件读回一个新的DataFrame：

In [1]: t_df = read_table('mydf.tsv', index_col=[0,1,2])
In [2]: all(t_df.index == df.index)
Out[2]: True

So we managed to read mydf.tsvinto a DataFramethat has the same row index as the original df. But:

所以我们设法读mydf.tsv入DataFrame与原始df. 但：

In [3]: all(t_df.columns == df.columns)
Out[3]: False

And the reason here is because pandas (as far as I can tell) has no way of parsing the header row correctly into a MultiIndex. As I mentioned above, if you know beorehand that your TSV file header represents a MultiIndexthen you can do the following to fix this:

这里的原因是因为熊猫（据我所知）无法将标题行正确解析为MultiIndex. 正如我上面提到的，如果你知道你的 TSV 文件头代表 aMultiIndex那么你可以执行以下操作来解决这个问题：

In [4]: from ast import literal_eval
In [5]: t_df.columns = MultiIndex.from_tuples(t_df.columns.map(literal_eval).tolist(), 
                                              names=['one','two','three'])
In [6]: all(t_df.columns == df.columns)
Out[6]: True

Answer 2

回答by Andy Hayden

You can change the print options using set_option:

您可以使用set_option以下方法更改打印选项：

display.multi_sparse:
: boolean
Default True, "sparsify" MultiIndexdisplay
(don't display repeated elements in outer levels within groups)

display.multi_sparse:
: boolean
默认True, "sparsify"MultiIndex显示
（不显示组内外层重复元素）

Now the DataFrame will be printed as desired:

现在将根据需要打印 DataFrame：

In [11]: pd.set_option('multi_sparse', False)

In [12]: df
Out[12]: 
one             A   A   A   A   A   A   A   A   A  A2  A2  A2  A2  A2  A2  A2  A2  A2
two             B   B   B  B2  B2  B2  B3  B3  B3   B   B   B  B2  B2  B2  B3  B3  B3
three           C  C2  C3   C  C2  C3   C  C2  C3   C  C2  C3   C  C2  C3   C  C2  C3
n location sex                                                                       
0 North    M    2   1   6   4   6   4   7   1   1   0   4   3   9   2   0   0   6   4
1 East     F    3   5   5   6   4   8   0   3   2   3   9   8   1   6   7   4   7   2
2 West     M    7   9   3   5   0   1   2   8   1   6   0   7   9   9   3   2   2   4
3 South    M    1   0   0   3   5   7   7   0   9   3   0   3   3   6   8   3   6   1
4 South    F    8   0   0   7   3   8   0   8   0   5   5   6   0   0   0   1   8   7
5 West     F    6   5   9   4   7   2   5   6   1   2   9   4   7   5   5   4   3   6
6 North    M    3   3   0   1   1   3   6   3   8   6   4   1   0   5   5   5   4   9
7 North    M    0   4   9   8   5   7   7   0   5   8   4   1   5   7   6   3   6   8
8 East     F    5   6   2   7   0   6   2   7   1   2   0   5   6   1   4   8   0   3
9 South    M    1   2   0   6   9   7   5   3   3   8   7   6   0   5   4   3   5   9

Note: in older pandas versions this was pd.set_printoptions(multi_sparse=False).

注意：在较旧的熊猫版本中，这是pd.set_printoptions(multi_sparse=False).

如何从/向 ASCII 文件写入/读取带有 MultiIndex 的 Pandas DataFrame？

提问by dailyglen

回答by diliop

回答by Andy Hayden

相关推荐

最近更新

标签

如何从/向 ASCII 文件写入/读取带有 MultiIndex 的 Pandas DataFrame？

提问by dailyglen

回答by diliop

回答by Andy Hayden

相关推荐

在 IPython 中使用 Pandas 绘制股票图表

pandas 开源 Enthought Python 替代方案

Pandas DataFrame - 所需索引具有重复值

pandas 熊猫中的简单交叉表

相关推荐

最近更新

标签