pandas python pandas中的R dcast等价物
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/25618650/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
R dcast equivalent in python pandas
提问by Adriano Almeida
I am trying to do the equivalent of the below commands in python:
我正在尝试在 python 中执行以下命令:
test <- data.frame(convert_me=c('Convert1','Convert2','Convert3'),
values=rnorm(3,45, 12), age_col=c('23','33','44'))
test
library(reshape2)
t <- dcast(test, values ~ convert_me+age_col, length )
t
That is, this:
也就是说,这个:
convert_me values age_col
Convert1 21.71502 23
Convert2 58.35506 33
Convert3 60.41639 44
becomes this:
变成这样:
values Convert2_33 Convert1_23 Convert3_44
21.71502 0 1 0
58.35506 1 0 0
60.41639 0 0 1
I know that with dummy variables I can get the value of the columns and transform as the name of the column, but is there a way to merge them(combination) easily, as R does?
我知道使用虚拟变量我可以获得列的值并转换为列的名称,但是有没有办法像 R 一样轻松地合并它们(组合)?
采纳答案by joris
You can use the crosstabfunction for this:
您可以crosstab为此使用该功能:
In [14]: pd.crosstab(index=df['values'], columns=[df['convert_me'], df['age_col']])
Out[14]:
convert_me Convert1 Convert2 Convert3
age_col 23 33 44
values
21.71502 1 0 0
58.35506 0 1 0
60.41639 0 0 1
or the pivot_table(with lenas the aggregating function, but here you have to fillnathe NaNs with zeros manually):
或pivot_table(len作为聚合函数,但在这里您必须fillna手动将 NaN设为零):
In [18]: df.pivot_table(index=['values'], columns=['age_col', 'convert_me'], aggfunc=len).fillna(0)
Out[18]:
age_col 23 33 44
convert_me Convert1 Convert2 Convert3
values
21.71502 1 0 0
58.35506 0 1 0
60.41639 0 0 1
See here for the docs on this: http://pandas.pydata.org/pandas-docs/stable/reshaping.html#pivot-tables-and-cross-tabulations
有关此方面的文档,请参见此处:http: //pandas.pydata.org/pandas-docs/stable/reshaping.html#pivot-tables-and-cross-tabulations
Most functions in pandas will return a multi-level (hierarchical) index, in this case for the columns. If you want to 'melt' this into one level like in R you can do:
pandas 中的大多数函数将返回一个多级(分层)索引,在这种情况下为列。如果你想像在 R 中那样将它“融化”到一个级别,你可以这样做:
In [15]: df_cross = pd.crosstab(index=df['values'], columns=[df['convert_me'], df['age_col']])
In [16]: df_cross.columns = ["{0}_{1}".format(l1, l2) for l1, l2 in df_cross.columns]
In [17]: df_cross
Out[17]:
Convert1_23 Convert2_33 Convert3_44
values
21.71502 1 0 0
58.35506 0 1 0
60.41639 0 0 1
回答by Keiku
We can use pd.get_dummiesfunction. In the current pandas 0.22.0, it is common to use pd.get_dummieswhen one-hot encoding to Dataframe.
我们可以使用pd.get_dummies函数。在目前的pandas 0.22.0中,普遍使用pd.get_dummiesone-hot encoding到Dataframe的时候。
import pandas as pd
df_dummies = pd.get_dummies(
df[['convert_me', 'age_col']].apply(lambda x: '_'.join(x.astype(str)), axis=1),
prefix_sep='')
df = pd.concat([df["values"], df_dummies], axis=1)
# Out[39]:
# values Convert1_23 Convert2_33 Convert3_44
# 0 21.71502 1 0 0
# 1 58.35506 0 1 0
# 2 60.41639 0 0 1

