使用最简单的索引在 python pandas 中转置一列

Question

提问by izhako

I have the following data (data_current):

我有以下数据（data_current）：

import pandas as pd
import numpy as np

data_current=pd.DataFrame({'medicine':['green tea','fried tomatoes','meditation','meditation'],'disease':['acne','hypertension', 'cancer','lupus']})
data_current

What I would like to do is to transpose one of the columns, so that instead of having multiple rows with same medicine and different diseases I have one row for each medicine with several columns for diseases. It is also important to keep index as simple as possible, i.e. 0,1,2... i.e. I don't want to assign 'medicines' as index column because I will merge it on some other key. So, I need to get data_needed

我想要做的是转置其中一列，这样就不用多行包含相同的药物和不同的疾病，我对每种药物有一行，而对于疾病有几列。保持索引尽可能简单也很重要，即 0,1,2... 即我不想将“药物”指定为索引列，因为我会将它合并到其他一些键上。所以，我需要得到data_needed

data_needed=pd.DataFrame({'medicine':['green tea','fried tomatoes','meditation'],'disease_1':['acne','hypertension','cancer'], 'disease_2':['np.nan','np.nan','lupus']})
data_needed

Answer 1

回答by neqtor

I'm thinking you want a pivot table. Check this link for more information --> http://pandas.pydata.org/pandas-docs/stable/reshaping.html

我想你想要一个数据透视表。检查此链接以获取更多信息--> http://pandas.pydata.org/pandas-docs/stable/reshaping.html

Do you find the output from this acceptable?

你觉得这个输出可以接受吗？

data_current.pivot(index='medicine', columns='disease', values='disease')

Answer 2

回答by Zero

Here's one to achieve the output

这是实现输出的一个

Firstly, groupbyon medicineand get the diseaseas list

首先，groupby打开medicine并获取diseaseas 列表

In [368]: md = (data_current.groupby('medicine')
                            .apply(lambda x: x['disease'].tolist())
                            .reset_index())

In [369]: md
Out[369]:
         medicine                0
0  fried tomatoes   [hypertension]
1       green tea           [acne]
2      meditation  [cancer, lupus]

Then convert the lists in column to separate columns

然后将列中的列表转换为单独的列

In [370]: dval = pd.DataFrame(md[0].tolist(), )

In [371]: dval
Out[371]:
              0      1
0  hypertension   None
1          acne   None
2        cancer  lupus

Now, you can concat-- mdwith dval

现在，你可以concat—— md用dval

In [372]: md = md.drop(0, axis=1)

In [373]: data_final = pd.concat([md, dval], axis=1)

And, rename the columns as you want.

并且，根据需要重命名列。

In [374]: data_final.columns = ['medicine', 'disease_1', 'disease_2']

In [375]: data_final
Out[375]:
         medicine     disease_1 disease_2
0  fried tomatoes  hypertension      None
1       green tea          acne      None
2      meditation        cancer     lupus

Answer 3

回答by fixxxer

dc = data_current
dc['disease_header'] = dc.diseases.replace(
                       dict(zip(diseases, 
                                map(lambda v: 'diseases_%d' %v, range(len(diseases))
                           )))

This will give us:

这会给我们：

In [548]: dc
Out[548]: 
        disease        medicine disease_header
0          acne       green tea     diseases_0
1  hypertension  fried tomatoes     diseases_1
2        cancer      meditation     diseases_2
3         lupus      meditation     diseases_3

And, finally we can pivot:

最后，我们可以旋转：

    In [547]: dc.pivot(columns='disease_header', index='medicine', values='disease').reset_index()
Out[547]: 
disease_header        medicine diseases_0    diseases_1 diseases_2 diseases_3
0               fried tomatoes        NaN  hypertension        NaN        NaN
1                    green tea       acne           NaN        NaN        NaN
2                   meditation        NaN           NaN     cancer      lupus

使用最简单的索引在 python pandas 中转置一列

提问by izhako

回答by neqtor

回答by Zero

回答by fixxxer

相关推荐

最近更新

标签

使用最简单的索引在 python pandas 中转置一列

提问by izhako

回答by neqtor

回答by Zero

回答by fixxxer

相关推荐

pandas pandasql 不会导入：导入错误：无法导入名称 to_sql

pandas 不区分大小写的熊猫 dataframe.merge

pandas 使用多个 isin 子句的熊猫索引

使用 Python Pandas 使用每日数据的月平均值

相关推荐

最近更新

标签