如何从/向 ASCII 文件写入/读取带有 MultiIndex 的 Pandas DataFrame?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/11041411/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 15:44:42  来源:igfitidea点击:

How to write/read a Pandas DataFrame with MultiIndex from/to an ASCII file?

pythonpandas

提问by dailyglen

I want to be able to create a Pandas DataFramewith MultiIndexes for the rows and the columns index and read it from an ASCII text file. My data looks like:

我希望能够DataFrame为行和列索引创建一个带有 MultiIndexes的 Pandas ,并从 ASCII 文本文件中读取它。我的数据看起来像:

col_indx = MultiIndex.from_tuples([('A',  'B',  'C'), ('A',  'B',  'C2'), ('A',  'B',  'C3'), 
                                   ('A',  'B2', 'C'), ('A',  'B2', 'C2'), ('A',  'B2', 'C3'), 
                                   ('A',  'B3', 'C'), ('A',  'B3', 'C2'), ('A',  'B3', 'C3'), 
                                   ('A2', 'B',  'C'), ('A2', 'B',  'C2'), ('A2', 'B',  'C3'), 
                                   ('A2', 'B2', 'C'), ('A2', 'B2', 'C2'), ('A2', 'B2', 'C3'), 
                                   ('A2', 'B3', 'C'), ('A2', 'B3', 'C2'), ('A2', 'B3', 'C3')], 
                                   names=['one','two','three']) 
row_indx = MultiIndex.from_tuples([(0,  'North', 'M'), 
                                   (1,  'East',  'F'), 
                                   (2,  'West',  'M'), 
                                   (3,  'South', 'M'), 
                                   (4,  'South', 'F'), 
                                   (5,  'West',  'F'), 
                                   (6,  'North', 'M'), 
                                   (7,  'North', 'M'), 
                                   (8,  'East',  'F'), 
                                   (9,  'South', 'M')], 
                                   names=['n', 'location', 'sex'])
size=len(row_indx), len(col_indx)
data = np.random.randint(0,10, size)
df = DataFrame(data, index=row_indx, columns=col_indx)
print df

I've tried df.to_csv()and read_csv()but they don't keep the index.

我试过了df.to_csv()read_csv()但他们不保留索引。

I was thinking of maybe creating a new format using extra delimeters. For example, using a row of ----------------to mark the end of the column indexes and a |to mark the end of a row index. So it would look like this:

我正在考虑使用额外的分隔符创建一种新格式。例如,使用一行----------------标记列索引的结束,使用a标记|行索引的结束。所以它看起来像这样:

one            | A   A   A   A   A   A   A   A   A  A2  A2  A2  A2  A2  A2  A2  A2  A2
two            | B   B   B  B2  B2  B2  B3  B3  B3   B   B   B  B2  B2  B2  B3  B3  B3
three          | C  C2  C3   C  C2  C3   C  C2  C3   C  C2  C3   C  C2  C3   C  C2  C3
--------------------------------------------------------------------------------------
n location sex :                                                                      
0 North    M   | 2   3   9   1   0   6   5   9   5   9   4   4   0   9   6   2   6   1
1 East     F   | 6   2   9   2   7   0   0   3   7   4   8   1   3   2   1   7   7   5
2 West     M   | 5   8   9   7   6   0   3   0   2   5   0   3   9   6   7   3   4   9
3 South    M   | 6   2   3   6   4   0   4   0   1   9   3   6   2   1   0   6   9   3
4 South    F   | 9   6   0   0   6   1   7   0   8   1   7   6   2   0   8   1   5   3
5 West     F   | 7   9   7   8   2   0   4   3   8   9   0   3   4   9   2   5   1   7
6 North    M   | 3   3   5   7   9   4   2   6   3   2   7   5   5   5   6   4   2   9
7 North    M   | 7   4   8   6   8   4   5   7   9   0   2   9   1   9   7   9   5   6
8 East     F   | 1   6   5   3   6   4   6   9   6   9   2   4   2   9   8   4   2   4
9 South    M   | 9   6   6   1   3   1   3   5   7   4   8   6   7   7   8   9   2   3

Does Pandas have a way to write/read DataFrames to/from ASCII files with MultiIndexes?

Pandas 是否有办法使用 MultiIndexes 向/从 ASCII 文件写入/读取数据帧?

回答by diliop

Not sure which version of pandas you are using but with 0.7.3you can export your DataFrameto a TSV file and retain the indices by doing this:

不确定您使用的是哪个版本的熊猫,但0.7.3您可以DataFrame通过执行以下操作将您的熊猫导出到 TSV 文件并保留索引:

df.to_csv('mydf.tsv', sep='\t')

The reason you need to export to TSV versus CSV is since the column headers have ,characters in them. This should solve the first part of your question.

您需要导出到 TSV 而不是 CSV 的原因是因为列标题中包含,字符。这应该可以解决您问题的第一部分。

The second part gets a bit more tricky since from as far as I can tell, you need to beforehand have an idea of what you want your DataFrame to contain. In particular, you need to know:

第二部分变得有点棘手,因为据我所知,您需要事先了解您希望 DataFrame 包含什么。特别是,您需要知道:

  1. Which columns on your TSV represent the row MultiIndex
  2. and that the rest of the columns should also be converted to a MultiIndex
  1. TSV 上的哪些列代表行 MultiIndex
  2. 并且其余的列也应该转换为 MultiIndex

To illustrate this, lets read back the TSV file we saved above into a new DataFrame:

为了说明这一点,让我们将上面保存的 TSV 文件读回一个新的DataFrame

In [1]: t_df = read_table('mydf.tsv', index_col=[0,1,2])
In [2]: all(t_df.index == df.index)
Out[2]: True

So we managed to read mydf.tsvinto a DataFramethat has the same row index as the original df. But:

所以我们设法读mydf.tsvDataFrame与原始df. 但:

In [3]: all(t_df.columns == df.columns)
Out[3]: False

And the reason here is because pandas (as far as I can tell) has no way of parsing the header row correctly into a MultiIndex. As I mentioned above, if you know beorehand that your TSV file header represents a MultiIndexthen you can do the following to fix this:

这里的原因是因为熊猫(据我所知)无法将标题行正确解析为MultiIndex. 正如我上面提到的,如果你知道你的 TSV 文件头代表 aMultiIndex那么你可以执行以下操作来解决这个问题:

In [4]: from ast import literal_eval
In [5]: t_df.columns = MultiIndex.from_tuples(t_df.columns.map(literal_eval).tolist(), 
                                              names=['one','two','three'])
In [6]: all(t_df.columns == df.columns)
Out[6]: True

回答by Andy Hayden

You can change the print options using set_option:

您可以使用set_option以下方法更改打印选项:

display.multi_sparse:
: boolean
   Default True, "sparsify" MultiIndexdisplay
   (don't display repeated elements in outer levels within groups)

display.multi_sparse:
: boolean
   默认True, "sparsify"MultiIndex显示
   (不显示组内外层重复元素)

Now the DataFrame will be printed as desired:

现在将根据需要打印 DataFrame:

In [11]: pd.set_option('multi_sparse', False)

In [12]: df
Out[12]: 
one             A   A   A   A   A   A   A   A   A  A2  A2  A2  A2  A2  A2  A2  A2  A2
two             B   B   B  B2  B2  B2  B3  B3  B3   B   B   B  B2  B2  B2  B3  B3  B3
three           C  C2  C3   C  C2  C3   C  C2  C3   C  C2  C3   C  C2  C3   C  C2  C3
n location sex                                                                       
0 North    M    2   1   6   4   6   4   7   1   1   0   4   3   9   2   0   0   6   4
1 East     F    3   5   5   6   4   8   0   3   2   3   9   8   1   6   7   4   7   2
2 West     M    7   9   3   5   0   1   2   8   1   6   0   7   9   9   3   2   2   4
3 South    M    1   0   0   3   5   7   7   0   9   3   0   3   3   6   8   3   6   1
4 South    F    8   0   0   7   3   8   0   8   0   5   5   6   0   0   0   1   8   7
5 West     F    6   5   9   4   7   2   5   6   1   2   9   4   7   5   5   4   3   6
6 North    M    3   3   0   1   1   3   6   3   8   6   4   1   0   5   5   5   4   9
7 North    M    0   4   9   8   5   7   7   0   5   8   4   1   5   7   6   3   6   8
8 East     F    5   6   2   7   0   6   2   7   1   2   0   5   6   1   4   8   0   3
9 South    M    1   2   0   6   9   7   5   3   3   8   7   6   0   5   4   3   5   9

Note: in older pandas versions this was pd.set_printoptions(multi_sparse=False).

注意:在较旧的熊猫版本中,这是pd.set_printoptions(multi_sparse=False).