使用最简单的索引在 python pandas 中转置一列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/29942167/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Transposing one column in python pandas with the simplest index possible
提问by izhako
I have the following data (data_current):
我有以下数据(data_current):
import pandas as pd
import numpy as np
data_current=pd.DataFrame({'medicine':['green tea','fried tomatoes','meditation','meditation'],'disease':['acne','hypertension', 'cancer','lupus']})
data_current
What I would like to do is to transpose one of the columns, so that instead of having multiple rows with same medicine and different diseases I have one row for each medicine with several columns for diseases. It is also important to keep index as simple as possible, i.e. 0,1,2... i.e. I don't want to assign 'medicines' as index column because I will merge it on some other key.
So, I need to get data_needed
我想要做的是转置其中一列,这样就不用多行包含相同的药物和不同的疾病,我对每种药物有一行,而对于疾病有几列。保持索引尽可能简单也很重要,即 0,1,2... 即我不想将“药物”指定为索引列,因为我会将它合并到其他一些键上。所以,我需要得到data_needed
data_needed=pd.DataFrame({'medicine':['green tea','fried tomatoes','meditation'],'disease_1':['acne','hypertension','cancer'], 'disease_2':['np.nan','np.nan','lupus']})
data_needed
回答by neqtor
I'm thinking you want a pivot table. Check this link for more information --> http://pandas.pydata.org/pandas-docs/stable/reshaping.html
我想你想要一个数据透视表。检查此链接以获取更多信息--> http://pandas.pydata.org/pandas-docs/stable/reshaping.html
Do you find the output from this acceptable?
你觉得这个输出可以接受吗?
data_current.pivot(index='medicine', columns='disease', values='disease')
data_current.pivot(index='medicine', columns='disease', values='disease')
回答by Zero
Here's one to achieve the output
这是实现输出的一个
Firstly, groupbyon medicineand get the diseaseas list
首先,groupby打开medicine并获取diseaseas 列表
In [368]: md = (data_current.groupby('medicine')
.apply(lambda x: x['disease'].tolist())
.reset_index())
In [369]: md
Out[369]:
medicine 0
0 fried tomatoes [hypertension]
1 green tea [acne]
2 meditation [cancer, lupus]
Then convert the lists in column to separate columns
然后将列中的列表转换为单独的列
In [370]: dval = pd.DataFrame(md[0].tolist(), )
In [371]: dval
Out[371]:
0 1
0 hypertension None
1 acne None
2 cancer lupus
Now, you can concat-- mdwith dval
现在,你可以concat—— md用dval
In [372]: md = md.drop(0, axis=1)
In [373]: data_final = pd.concat([md, dval], axis=1)
And, rename the columns as you want.
并且,根据需要重命名列。
In [374]: data_final.columns = ['medicine', 'disease_1', 'disease_2']
In [375]: data_final
Out[375]:
medicine disease_1 disease_2
0 fried tomatoes hypertension None
1 green tea acne None
2 meditation cancer lupus
回答by fixxxer
dc = data_current
dc['disease_header'] = dc.diseases.replace(
dict(zip(diseases,
map(lambda v: 'diseases_%d' %v, range(len(diseases))
)))
This will give us:
这会给我们:
In [548]: dc
Out[548]:
disease medicine disease_header
0 acne green tea diseases_0
1 hypertension fried tomatoes diseases_1
2 cancer meditation diseases_2
3 lupus meditation diseases_3
And, finally we can pivot:
最后,我们可以旋转:
In [547]: dc.pivot(columns='disease_header', index='medicine', values='disease').reset_index()
Out[547]:
disease_header medicine diseases_0 diseases_1 diseases_2 diseases_3
0 fried tomatoes NaN hypertension NaN NaN
1 green tea acne NaN NaN NaN
2 meditation NaN NaN cancer lupus

