Python 从 (row,col,values) 的元组列表构造 pandas DataFrame

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19961490/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 15:04:53  来源:igfitidea点击:

Construct pandas DataFrame from list of tuples of (row,col,values)

pythonpython-2.7pandaspivot

提问by gt6989b

I have a list of tuples like

我有一个像这样的元组列表

data = [
('r1', 'c1', avg11, stdev11),
('r1', 'c2', avg12, stdev12),
('r2', 'c1', avg21, stdev21),
('r2', 'c2', avg22, stdev22)
]

and I would like to put them into a pandas DataFrame with rows named by the first column and columns named by the 2nd column. It seems the way to take care of the row names is something like pandas.DataFrame([x[1:] for x in data], index = [x[0] for x in data])but how do I take care of the columns to get a 2x2 matrix (the output from the previous set is 3x4)? Is there a more intelligent way of taking care of row labels as well, instead of explicitly omitting them?

我想将它们放入一个 Pandas DataFrame 中,行由第一列命名,列由第二列命名。似乎处理行名称的方法类似于pandas.DataFrame([x[1:] for x in data], index = [x[0] for x in data])但我如何处理列以获得 2x2 矩阵(前一组的输出为 3x4)?是否有更智能的方式来处理行标签,而不是明确省略它们?

EDITIt seems I will need 2 DataFrames - one for averages and one for standard deviations, is that correct? Or can I store a list of values in each "cell"?

编辑看来我需要 2 个数据帧 - 一个用于平均值,一个用于标准偏差,对吗?或者我可以在每个“单元格”中存储一个值列表?

采纳答案by Roman Pekar

You can pivot your DataFrame after creating:

您可以在创建后旋转您的 DataFrame:

>>> df = pd.DataFrame(data)
>>> df.pivot(index=0, columns=1, values=2)
# avg DataFrame
1      c1     c2
0               
r1  avg11  avg12
r2  avg21  avg22
>>> df.pivot(index=0, columns=1, values=3)
# stdev DataFrame
1        c1       c2
0                   
r1  stdev11  stdev12
r2  stdev21  stdev22

回答by ely

I submit that it is better to leave your data stacked as it is:

我认为最好让您的数据保持原样:

df = pandas.DataFrame(data, columns=['R_Number', 'C_Number', 'Avg', 'Std'])

# Possibly also this if these can always be the indexes:
# df = df.set_index(['R_Number', 'C_Number'])

Then it's a bit more intuitive to say

然后再直观一点的说

df.set_index(['R_Number', 'C_Number']).Avg.unstack(level=1)

This way it is implicit that you're seeking to reshape the averages, or the standard deviations. Whereas, just using pivot, it's purely based on column convention as to what semantic entity it is that you are reshaping.

这种方式暗示您正在寻求重塑平均值或标准偏差。而仅使用pivot,它完全基于列约定,即您正在重塑的语义实体。

回答by Martin Thoma

This is what I expected to see when I came to this question:

当我遇到这个问题时,这是我期望看到的:

#!/usr/bin/env python

import pandas as pd


df = pd.DataFrame([(1, 2, 3, 4),
                   (5, 6, 7, 8),
                   (9, 0, 1, 2),
                   (3, 4, 5, 6)],
                  columns=list('abcd'),
                  index=['India', 'France', 'England', 'Germany'])
print(df)

gives

         a  b  c  d
India    1  2  3  4
France   5  6  7  8
England  9  0  1  2
Germany  3  4  5  6