Pandas - 将列值拆分为新列

Question

提问by oliversm

I have a large dataframe and I am storing a lot of redundant values that are making it hard to handle my data. I have a dataframe of the form:

我有一个很大的数据框，并且我存储了很多冗余值，这些值使得我的数据难以处理。我有一个如下形式的数据框：

import pandas as pd

df = pd.DataFrame([["a","g","n1","y1"], ["a","g","n2","y2"], ["b","h","n1","y3"], ["b","h","n2","y4"]], columns=["meta1", "meta2", "name", "data"])

>>> df

  meta1 meta2 name data
    a     g   n1   y1
    a     g   n2   y2
    b     h   n1   y3
    b     h   n2   y4

where I have the names of the new columns I would like in nameand the respective data in data.

在那里我有我想要的新列的名称name和data.

I would like to produce a dataframe of the form:

我想生成以下形式的数据框：

df = pd.DataFrame([["a","g","y1","y2"], ["b","h","y3","y4"]], columns=["meta1", "meta2", "n1", "n2"])

>>> df

meta1 meta2  n1  n2
  a     g  y1  y2
  b     h  y3  y4

The columns called metaare around 15+ other columns that contain most of the data, and I don't think are particularly well suited to for indexing. The idea is that I have a lot of repeated/redundant data stored in metaat the moment and I would like to produce the more compact dataframe presented.

所调用的列meta大约有 15 个以上的其他列，其中包含大部分数据，我认为它们不是特别适合索引。这个想法是我目前存储了很多重复/冗余数据meta，我想生成更紧凑的数据帧。

I have found some similar Qs but can't pinpoint what sort of operations I need to do: pivot, re-index, stack or unstack, etc.?

我发现了一些类似的问题，但无法确定我需要做什么类型的操作：数据透视、重新索引、堆叠或取消堆叠等？

PS - the original index values are unimportant for my purposes.

PS - 原始索引值对我来说并不重要。

Any help would be much appreciated.

任何帮助将非常感激。

Question I think is related:

我认为相关的问题：

I think the following Q is related to what I am trying to do, but I can't see how to apply it, as I don't want to produce more indexes.

我认为以下 Q 与我正在尝试做的事情有关，但我看不到如何应用它，因为我不想生成更多索引。

Python Pandas- how to unstack a pivot table with two values with each value becoming a new column?

Python Pandas-如何将具有两个值的数据透视表拆开，每个值成为一个新列？

Answer 1

回答by piRSquared

If you group your meta columns into a list then you can do this:

如果将元列分组到列表中，则可以执行以下操作：

metas = ['meta1', 'meta2']

new_df = df.set_index(['name'] + metas).unstack('name')
print new_df

            data    
name          n1  n2
meta1 meta2         
a     g       y1  y2
b     h       y3  y4

Which gets you most of the way there. Additional tailoring can get you the rest of the way.

这让你大部分时间都在那里。额外的剪裁可以让你完成剩下的工作。

print new_df.data.rename_axis([None], axis=1).reset_index()

  meta1 meta2  n1  n2
0     a     g  y1  y2
1     b     h  y3  y4

Answer 2

回答by jezrael

You can use pivot_tablewith reset_indexand rename_axis(new in pandas0.18.0):

您可以使用pivot_table与reset_index和rename_axis（新的pandas0.18.0）：

print (df.pivot_table(index=['meta1','meta2'], 
                      columns='name', 
                      values='data', 
                      aggfunc='first')
         .reset_index()
         .rename_axis(None, axis=1))

  meta1 meta2  n1  n2
0     a     g  y1  y2
1     b     h  y3  y4

But better is use aggfuncjoin:

但更好的是使用aggfuncjoin：

print (df.pivot_table(index=['meta1','meta2'], 
                      columns='name', 
                      values='data', 
                      aggfunc=', '.join)
         .reset_index()
         .rename_axis(None, axis=1))

  meta1 meta2  n1  n2
0     a     g  y1  y2
1     b     h  y3  y4

Explanation, why joinis generally better as first:

解释，为什么join通常更好first：

If use first, you can lost all data which are not first in each group by index, but joinconcanecate them:

如果使用first，您可以丢失所有不是每个组中第一个的数据 by index，而是join将它们连接起来：

import pandas as pd

df = pd.DataFrame([["a","g","n1","y1"], 
                   ["a","g","n2","y2"], 
                   ["a","g","n1","y3"], 
                   ["b","h","n2","y4"]], columns=["meta1", "meta2", "name", "data"])

print (df)
  meta1 meta2 name data
0     a     g   n1   y1
1     a     g   n2   y2
2     a     g   n1   y3
3     b     h   n2   y4

print (df.pivot_table(index=['meta1','meta2'], 
                      columns='name', 
                      values='data', 
                      aggfunc='first')
         .reset_index()
         .rename_axis(None, axis=1))
  meta1 meta2    n1  n2
0     a     g    y1  y2
1     b     h  None  y4

print (df.pivot_table(index=['meta1','meta2'], 
                      columns='name', 
                      values='data', 
                      aggfunc=', '.join)
         .reset_index()
         .rename_axis(None, axis=1))

  meta1 meta2      n1  n2
0     a     g  y1, y3  y2
1     b     h    None  y4

Pandas - 将列值拆分为新列

提问by oliversm

回答by piRSquared

回答by jezrael

相关推荐

最近更新

标签

Pandas - 将列值拆分为新列

提问by oliversm

回答by piRSquared

回答by jezrael

相关推荐

将 Pandas DataFrame 保存到 Django 模型

Pandas groupby 两列然后获取值的字典

pandas 如何使用python pandas基于特定（字符串）列对数据框进行排序？

pandas 如何从行和列引用返回数据框值？

相关推荐

最近更新

标签