Python 熊猫数据透视表到数据框
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/42708193/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas pivot table to data frame
提问by alma123
I have a dataframe (df) that looks like this:
我有一个如下所示的数据框 (df):
+---------+-------+------------+----------+
| subject | pills | date | strength |
+---------+-------+------------+----------+
| 1 | 4 | 10/10/2012 | 250 |
| 1 | 4 | 10/11/2012 | 250 |
| 1 | 2 | 10/12/2012 | 500 |
| 2 | 1 | 1/6/2014 | 1000 |
| 2 | 1 | 1/7/2014 | 250 |
| 2 | 1 | 1/7/2014 | 500 |
| 2 | 3 | 1/8/2014 | 250 |
+---------+-------+------------+----------+
When I use reshape in R, I get what I want:
当我在 R 中使用 reshape 时,我得到了我想要的:
reshape(df, idvar = c("subject","date"), timevar = 'strength', direction = "wide")
+---------+------------+--------------+--------------+---------------+
| subject | date | strength.250 | strength.500 | strength.1000 |
+---------+------------+--------------+--------------+---------------+
| 1 | 10/10/2012 | 4 | NA | NA |
| 1 | 10/11/2012 | 4 | NA | NA |
| 1 | 10/12/2012 | NA | 2 | NA |
| 2 | 1/6/2014 | NA | NA | 1 |
| 2 | 1/7/2014 | 1 | 1 | NA |
| 2 | 1/8/2014 | 3 | NA | NA |
+---------+------------+--------------+--------------+---------------+
Using pandas:
使用熊猫:
df.pivot_table(df, index=['subject','date'],columns='strength')
+---------+------------+-------+----+-----+
| | | pills |
+---------+------------+-------+----+-----+
| | strength | 250 | 500| 1000|
+---------+------------+-------+----+-----+
| subject | date | | | |
+---------+------------+-------+----+-----+
| 1 | 10/10/2012 | 4 | NA | NA |
| | 10/11/2012 | 4 | NA | NA |
| | 10/12/2012 | NA | 2 | NA |
+---------+------------+-------+----+-----+
| 2 | 1/6/2014 | NA | NA | 1 |
| | 1/7/2014 | 1 | 1 | NA |
| | 1/8/2014 | 3 | NA | NA |
+---------+------------+-------+----+-----+
How do I get exactly the same output as in R with pandas? I only want 1 header.
如何使用 Pandas 获得与 R 中完全相同的输出?我只想要 1 个标题。
回答by DYZ
After pivoting, convert the dataframe to records and then back to dataframe:
旋转后,将数据帧转换为记录,然后再转换回数据帧:
flattened = pd.DataFrame(pivoted.to_records())
# subject date ('pills', 250) ('pills', 500) ('pills', 1000)
#0 1 10/10/2012 4.0 NaN NaN
#1 1 10/11/2012 4.0 NaN NaN
#2 1 10/12/2012 NaN 2.0 NaN
#3 2 1/6/2014 NaN NaN 1.0
#4 2 1/7/2014 1.0 1.0 NaN
#5 2 1/8/2014 3.0 NaN NaN
You can now "repair" the column names, if you want:
如果需要,您现在可以“修复”列名称:
flattened.columns = [hdr.replace("('pills', ", "strength.").replace(")", "") \
for hdr in flattened.columns]
flattened
# subject date strength.250 strength.500 strength.1000
#0 1 10/10/2012 4.0 NaN NaN
#1 1 10/11/2012 4.0 NaN NaN
#2 1 10/12/2012 NaN 2.0 NaN
#3 2 1/6/2014 NaN NaN 1.0
#4 2 1/7/2014 1.0 1.0 NaN
#5 2 1/8/2014 3.0 NaN NaN
It's awkward, but it works.
这很尴尬,但它有效。