Python 熊猫将一些列转换为行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28654047/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas convert some columns into rows
提问by Wizuriel
So my dataset has some information by location for n dates. The problem is each date is actually a different column header. For example the CSV looks like
所以我的数据集有 n 个日期的位置信息。问题是每个日期实际上是不同的列标题。例如 CSV 看起来像
location name Jan-2010 Feb-2010 March-2010
A "test" 12 20 30
B "foo" 18 20 25
What I would like is for it to look like
我想要的是它看起来像
location name Date Value
A "test" Jan-2010 12
A "test" Feb-2010 20
A "test" March-2010 30
B "foo" Jan-2010 18
B "foo" Feb-2010 20
B "foo" March-2010 25
problem is I don't know how many dates are in the column (though I know they will always start after name)
问题是我不知道列中有多少个日期(尽管我知道它们总是在名称之后开始)
采纳答案by DSM
UPDATE
From v0.20, melt
is a first order function, you can now use
UPDATE
从 v0.20 开始,melt
是一阶函数,现在可以使用
df.melt(id_vars=["location", "name"],
var_name="Date",
value_name="Value")
location name Date Value
0 A "test" Jan-2010 12
1 B "foo" Jan-2010 18
2 A "test" Feb-2010 20
3 B "foo" Feb-2010 20
4 A "test" March-2010 30
5 B "foo" March-2010 25
OLD(ER) VERSIONS: <0.20
旧(ER)版本:<0.20
You can use pd.melt
to get most of the way there, and then sort:
您可以使用pd.melt
大部分方式到达那里,然后排序:
>>> df
location name Jan-2010 Feb-2010 March-2010
0 A test 12 20 30
1 B foo 18 20 25
>>> df2 = pd.melt(df, id_vars=["location", "name"],
var_name="Date", value_name="Value")
>>> df2
location name Date Value
0 A test Jan-2010 12
1 B foo Jan-2010 18
2 A test Feb-2010 20
3 B foo Feb-2010 20
4 A test March-2010 30
5 B foo March-2010 25
>>> df2 = df2.sort(["location", "name"])
>>> df2
location name Date Value
0 A test Jan-2010 12
2 A test Feb-2010 20
4 A test March-2010 30
1 B foo Jan-2010 18
3 B foo Feb-2010 20
5 B foo March-2010 25
(Might want to throw in a .reset_index(drop=True)
, just to keep the output clean.)
(可能想放入一个.reset_index(drop=True)
,只是为了保持输出干净。)
Note: pd.DataFrame.sort
has been deprecatedin favour of pd.DataFrame.sort_values
.
注意:pd.DataFrame.sort
已被弃用而支持pd.DataFrame.sort_values
.
回答by Prometheus
I guess I found a simpler solution
我想我找到了一个更简单的解决方案
temp1 = pd.melt(df1, id_vars=["location"], var_name='Date', value_name='Value')
temp2 = pd.melt(df1, id_vars=["name"], var_name='Date', value_name='Value')
Concat whole temp1
with temp2
's column name
temp1
用temp2
's 列连接整个name
temp1['new_column'] = temp2['name']
You now have what you asked for.
你现在有你所要求的。
回答by jpp
pd.wide_to_long
pd.wide_to_long
You can add a prefix to your year columns and then feed directly to pd.wide_to_long
. I won't pretend this is efficient, but it may in certain situations be more convenient than pd.melt
, e.g. when your columns already have an appropriate prefix.
您可以为年份列添加前缀,然后直接提供给pd.wide_to_long
. 我不会假装这是有效的,但在某些情况下它可能比 更方便pd.melt
,例如当您的列已经有适当的前缀时。
df.columns = np.hstack((df.columns[:2], df.columns[2:].map(lambda x: f'Value{x}')))
res = pd.wide_to_long(df, stubnames=['Value'], i='name', j='Date').reset_index()\
.sort_values(['location', 'name'])
print(res)
name Date location Value
0 test Jan-2010 A 12
2 test Feb-2010 A 20
4 test March-2010 A 30
1 foo Jan-2010 B 18
3 foo Feb-2010 B 20
5 foo March-2010 B 25
回答by jezrael
Use set_index
with stack
for MultiIndex Series
, then for DataFrame
add reset_index
with rename
:
使用set_index
与stack
对MultiIndex Series
,然后DataFrame
加reset_index
用rename
:
df1 = (df.set_index(["location", "name"])
.stack()
.reset_index(name='Value')
.rename(columns={'level_2':'Date'}))
print (df1)
location name Date Value
0 A test Jan-2010 12
1 A test Feb-2010 20
2 A test March-2010 30
3 B foo Jan-2010 18
4 B foo Feb-2010 20
5 B foo March-2010 25