在 Pandas 中反转“one-hot”编码
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/38334296/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Reversing 'one-hot' encoding in Pandas
提问by Peadar Coyle
Problem statementI want to go from this data frame which is basically one hot encoded.
问题陈述我想从这个数据帧开始,它基本上是一个热编码。
In [2]: pd.DataFrame({"monkey":[0,1,0],"rabbit":[1,0,0],"fox":[0,0,1]})
Out[2]:
fox monkey rabbit
0 0 0 1
1 0 1 0
2 1 0 0
3 0 0 0
4 0 0 0
To this one which is 'reverse' one-hot encoded.
对于这个是“反向”one-hot 编码的。
In [3]: pd.DataFrame({"animal":["monkey","rabbit","fox"]})
Out[3]:
animal
0 monkey
1 rabbit
2 fox
I imagine there's some sort of clever use of apply or zip to do thins but I'm not sure how... Can anyone help?
我想有一些巧妙地使用 apply 或 zip 来做薄,但我不知道如何......任何人都可以帮忙吗?
I've not had much success using indexing etc to try to solve this problem.
我使用索引等尝试解决这个问题并没有取得太大的成功。
采纳答案by PYOak
I would use apply to decode the columns:
我会使用 apply 来解码列:
In [2]: animals = pd.DataFrame({"monkey":[0,1,0,0,0],"rabbit":[1,0,0,0,0],"fox":[0,0,1,0,0]})
In [3]: def get_animal(row):
...: for c in animals.columns:
...: if row[c]==1:
...: return c
In [4]: animals.apply(get_animal, axis=1)
Out[4]:
0 rabbit
1 monkey
2 fox
3 None
4 None
dtype: object
回答by MaxU
UPDATE:i think ayhanis right and it should be:
更新:我认为ayhan是对的,应该是:
df.idxmax(axis=1)
Demo:
演示:
In [40]: s = pd.Series(['dog', 'cat', 'dog', 'bird', 'fox', 'dog'])
In [41]: s
Out[41]:
0 dog
1 cat
2 dog
3 bird
4 fox
5 dog
dtype: object
In [42]: pd.get_dummies(s)
Out[42]:
bird cat dog fox
0 0.0 0.0 1.0 0.0
1 0.0 1.0 0.0 0.0
2 0.0 0.0 1.0 0.0
3 1.0 0.0 0.0 0.0
4 0.0 0.0 0.0 1.0
5 0.0 0.0 1.0 0.0
In [43]: pd.get_dummies(s).idxmax(1)
Out[43]:
0 dog
1 cat
2 dog
3 bird
4 fox
5 dog
dtype: object
OLD answer:(most probably, incorrect answer)
旧答案:(很可能是错误答案)
try this:
尝试这个:
In [504]: df.idxmax().reset_index().rename(columns={'index':'animal', 0:'idx'})
Out[504]:
animal idx
0 fox 2
1 monkey 1
2 rabbit 0
data:
数据:
In [505]: df
Out[505]:
fox monkey rabbit
0 0 0 1
1 0 1 0
2 1 0 0
3 0 0 0
4 0 0 0
回答by piRSquared
回答by Sudharshann D
This works with both single and multiple labels.
这适用于单个和多个标签。
We can use advanced indexing to tackle this problem. Hereis the link.
我们可以使用高级索引来解决这个问题。这是链接。
import pandas as pd
df = pd.DataFrame({"monkey":[1,1,0,1,0],"rabbit":[1,1,1,1,0],\
"fox":[1,0,1,0,0], "cat":[0,0,0,0,1]})
df['tags']='' # to create an empty column
for col_name in df.columns:
df.ix[df[col_name]==1,'tags']= df['tags']+' '+col_name
print df
And the result is:
结果是:
cat fox monkey rabbit tags
0 0 1 1 1 fox monkey rabbit
1 0 0 1 1 monkey rabbit
2 0 1 0 1 fox rabbit
3 0 0 1 1 monkey rabbit
4 1 0 0 0 cat
Explanation: We iterate over the columns on the dataframe.
说明:我们遍历数据帧上的列。
df.ix[selection criteria, columns to write value] = value
df.ix[df[col_name]==1,'tags']= df['tags']+' '+col_name
The above line basically finds you all the places where df[col_name] == 1, selects column 'tags' and set it to the RHS value which is df['tags']+' '+ col_name
上面的行基本上可以找到 df[col_name] == 1 的所有位置,选择列 'tags' 并将其设置为 df['tags']+' '+ col_name 的 RHS 值
Note:.ix
has been deprecated since Pandas v0.20. You should instead use .loc
or .iloc
, as appropriate.
注意:.ix
自 Pandas v0.20 以来已被弃用。您应该改用.loc
或.iloc
,视情况而定。
回答by Merlin
Try this:
尝试这个:
df = pd.DataFrame({"monkey":[0,1,0,1,0],"rabbit":[1,0,0,0,0],"fox":[0,0,1,0,0], "cat":[0,0,0,0,1]})
df
cat fox monkey rabbit
0 0 0 0 1
1 0 0 1 0
2 0 1 0 0
3 0 0 1 0
4 1 0 0 0
pd.DataFrame([x for x in np.where(df ==1, df.columns,'').flatten().tolist() if len(x) >0],columns= (["animal"]) )
animal
0 rabbit
1 monkey
2 fox
3 monkey
4 cat
回答by conflicted_user
You could try using melt()
. This method also works when you have multiple OHE labels for a row.
您可以尝试使用melt()
. 当一行有多个 OHE 标签时,此方法也适用。
# Your OHE dataframe
df = pd.DataFrame({"monkey":[0,1,0],"rabbit":[1,0,0],"fox":[0,0,1]})
mel = df.melt(var_name=['animal'], value_name='value') # Melting
mel[mel.value == 1].reset_index(drop=True) # this gives you the result
回答by Shakeeb Pasha
It can be achieved with a simple apply on dataframe
它可以通过对数据框的简单应用来实现
# function to get column name with value one for each row in dataframe
def get_animal(row):
return(row.index[row.apply(lambda x: x==1)][0])
# prepare a animal column
df['animal'] = df.apply(lambda row:get_animal(row), axis=1)