pandas python pandas中的R dcast等价物

Question

提问by Adriano Almeida

I am trying to do the equivalent of the below commands in python:

我正在尝试在 python 中执行以下命令：

test <- data.frame(convert_me=c('Convert1','Convert2','Convert3'),
                   values=rnorm(3,45, 12), age_col=c('23','33','44'))
test

library(reshape2)
t <- dcast(test, values ~ convert_me+age_col, length  )
t

That is, this:

也就是说，这个：

convert_me   values     age_col
Convert1     21.71502      23
Convert2     58.35506      33
Convert3     60.41639      44

becomes this:

变成这样：

values     Convert2_33 Convert1_23 Convert3_44
21.71502          0           1           0
58.35506          1           0           0
60.41639          0           0           1

I know that with dummy variables I can get the value of the columns and transform as the name of the column, but is there a way to merge them(combination) easily, as R does?

我知道使用虚拟变量我可以获得列的值并转换为列的名称，但是有没有办法像 R 一样轻松地合并它们（组合）？

Answer 1

采纳答案by joris

You can use the crosstabfunction for this:

您可以crosstab为此使用该功能：

In [14]: pd.crosstab(index=df['values'], columns=[df['convert_me'], df['age_col']])
Out[14]: 
convert_me  Convert1  Convert2  Convert3
age_col           23        33        44
values                                  
21.71502           1         0         0
58.35506           0         1         0
60.41639           0         0         1

or the pivot_table(with lenas the aggregating function, but here you have to fillnathe NaNs with zeros manually):

或pivot_table（len作为聚合函数，但在这里您必须fillna手动将 NaN设为零）：

In [18]: df.pivot_table(index=['values'], columns=['age_col', 'convert_me'], aggfunc=len).fillna(0)
Out[18]: 
age_col           23        33        44
convert_me  Convert1  Convert2  Convert3
values                                  
21.71502           1         0         0
58.35506           0         1         0
60.41639           0         0         1

See here for the docs on this: http://pandas.pydata.org/pandas-docs/stable/reshaping.html#pivot-tables-and-cross-tabulations

有关此方面的文档，请参见此处：http: //pandas.pydata.org/pandas-docs/stable/reshaping.html#pivot-tables-and-cross-tabulations

Most functions in pandas will return a multi-level (hierarchical) index, in this case for the columns. If you want to 'melt' this into one level like in R you can do:

pandas 中的大多数函数将返回一个多级（分层）索引，在这种情况下为列。如果你想像在 R 中那样将它“融化”到一个级别，你可以这样做：

In [15]: df_cross = pd.crosstab(index=df['values'], columns=[df['convert_me'], df['age_col']])

In [16]: df_cross.columns = ["{0}_{1}".format(l1, l2) for l1, l2 in df_cross.columns]

In [17]: df_cross
Out[17]: 
          Convert1_23  Convert2_33  Convert3_44
values                                         
21.71502            1            0            0
58.35506            0            1            0
60.41639            0            0            1

Answer 2

回答by Keiku

We can use pd.get_dummiesfunction. In the current pandas 0.22.0, it is common to use pd.get_dummieswhen one-hot encoding to Dataframe.

我们可以使用pd.get_dummies函数。在目前的pandas 0.22.0中，普遍使用pd.get_dummiesone-hot encoding到Dataframe的时候。

import pandas as pd

df_dummies = pd.get_dummies(
    df[['convert_me', 'age_col']].apply(lambda x: '_'.join(x.astype(str)), axis=1),
    prefix_sep='')
df = pd.concat([df["values"], df_dummies], axis=1)
# Out[39]:
#      values  Convert1_23  Convert2_33  Convert3_44
# 0  21.71502            1            0            0
# 1  58.35506            0            1            0
# 2  60.41639            0            0            1

pandas python pandas中的R dcast等价物

提问by Adriano Almeida

采纳答案by joris

回答by Keiku

相关推荐

最近更新

标签

pandas python pandas中的R dcast等价物

提问by Adriano Almeida

采纳答案by joris

回答by Keiku

相关推荐

Pandas - 标记化数据预期 1 个字段看到多个

pandas 从数据框列检查字符串是否为 nan

pandas 如何绘制样品的 PMF？

pandas Seaborn groupby 熊猫系列

相关推荐

最近更新

标签