Pandas - 将列值拆分为新列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37840043/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas - unstack column values into new columns
提问by oliversm
I have a large dataframe and I am storing a lot of redundant values that are making it hard to handle my data. I have a dataframe of the form:
我有一个很大的数据框,并且我存储了很多冗余值,这些值使得我的数据难以处理。我有一个如下形式的数据框:
import pandas as pd
df = pd.DataFrame([["a","g","n1","y1"], ["a","g","n2","y2"], ["b","h","n1","y3"], ["b","h","n2","y4"]], columns=["meta1", "meta2", "name", "data"])
>>> df
meta1 meta2 name data
a g n1 y1
a g n2 y2
b h n1 y3
b h n2 y4
where I have the names of the new columns I would like in name
and the respective data in data
.
在那里我有我想要的新列的名称name
和data
.
I would like to produce a dataframe of the form:
我想生成以下形式的数据框:
df = pd.DataFrame([["a","g","y1","y2"], ["b","h","y3","y4"]], columns=["meta1", "meta2", "n1", "n2"])
>>> df
meta1 meta2 n1 n2
a g y1 y2
b h y3 y4
The columns called meta
are around 15+ other columns that contain most of the data, and I don't think are particularly well suited to for indexing. The idea is that I have a lot of repeated/redundant data stored in meta
at the moment and I would like to produce the more compact dataframe presented.
所调用的列meta
大约有 15 个以上的其他列,其中包含大部分数据,我认为它们不是特别适合索引。这个想法是我目前存储了很多重复/冗余数据meta
,我想生成更紧凑的数据帧。
I have found some similar Qs but can't pinpoint what sort of operations I need to do: pivot, re-index, stack or unstack, etc.?
我发现了一些类似的问题,但无法确定我需要做什么类型的操作:数据透视、重新索引、堆叠或取消堆叠等?
PS - the original index values are unimportant for my purposes.
PS - 原始索引值对我来说并不重要。
Any help would be much appreciated.
任何帮助将非常感激。
Question I think is related:
我认为相关的问题:
I think the following Q is related to what I am trying to do, but I can't see how to apply it, as I don't want to produce more indexes.
我认为以下 Q 与我正在尝试做的事情有关,但我看不到如何应用它,因为我不想生成更多索引。
回答by piRSquared
If you group your meta columns into a list then you can do this:
如果将元列分组到列表中,则可以执行以下操作:
metas = ['meta1', 'meta2']
new_df = df.set_index(['name'] + metas).unstack('name')
print new_df
data
name n1 n2
meta1 meta2
a g y1 y2
b h y3 y4
Which gets you most of the way there. Additional tailoring can get you the rest of the way.
这让你大部分时间都在那里。额外的剪裁可以让你完成剩下的工作。
print new_df.data.rename_axis([None], axis=1).reset_index()
meta1 meta2 n1 n2
0 a g y1 y2
1 b h y3 y4
回答by jezrael
You can use pivot_table
with reset_index
and rename_axis
(new in pandas
0.18.0
):
您可以使用pivot_table
与reset_index
和rename_axis
(新的pandas
0.18.0
):
print (df.pivot_table(index=['meta1','meta2'],
columns='name',
values='data',
aggfunc='first')
.reset_index()
.rename_axis(None, axis=1))
meta1 meta2 n1 n2
0 a g y1 y2
1 b h y3 y4
But better is use aggfunc
join
:
但更好的是使用aggfunc
join
:
print (df.pivot_table(index=['meta1','meta2'],
columns='name',
values='data',
aggfunc=', '.join)
.reset_index()
.rename_axis(None, axis=1))
meta1 meta2 n1 n2
0 a g y1 y2
1 b h y3 y4
Explanation, why join
is generally better as first
:
解释,为什么join
通常更好first
:
If use first
, you can lost all data which are not first in each group by index
, but join
concanecate them:
如果使用first
,您可以丢失所有不是每个组中第一个的数据 by index
,而是join
将它们连接起来:
import pandas as pd
df = pd.DataFrame([["a","g","n1","y1"],
["a","g","n2","y2"],
["a","g","n1","y3"],
["b","h","n2","y4"]], columns=["meta1", "meta2", "name", "data"])
print (df)
meta1 meta2 name data
0 a g n1 y1
1 a g n2 y2
2 a g n1 y3
3 b h n2 y4
print (df.pivot_table(index=['meta1','meta2'],
columns='name',
values='data',
aggfunc='first')
.reset_index()
.rename_axis(None, axis=1))
meta1 meta2 n1 n2
0 a g y1 y2
1 b h None y4
print (df.pivot_table(index=['meta1','meta2'],
columns='name',
values='data',
aggfunc=', '.join)
.reset_index()
.rename_axis(None, axis=1))
meta1 meta2 n1 n2
0 a g y1, y3 y2
1 b h None y4