Python 重塑熊猫数据框
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/42928911/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
reshape a pandas dataframe
提问by Moritz
suppose a dataframe like this one:
假设一个像这样的数据框:
df = pd.DataFrame([[1,2,3,4],[5,6,7,8],[9,10,11,12]], columns = ['A', 'B', 'A1', 'B1'])
I would like to have a dataframe which looks like:
我想要一个看起来像的数据框:
what does not work:
什么不起作用:
new_rows = int(df.shape[1]/2) * df.shape[0]
new_cols = 2
df.values.reshape(new_rows, new_cols, order='F')
of course I could loop over the data and make a new list of list but there must be a better way. Any ideas ?
当然,我可以遍历数据并创建一个新的列表列表,但必须有更好的方法。有任何想法吗 ?
采纳答案by Ted Petrou
The pd.wide_to_long
function is built almost exactly for this situation, where you have many of the same variable prefixes that end in a different digit suffix. The only difference here is that your first set of variables don't have a suffix, so you will need to rename your columns first.
该pd.wide_to_long
函数几乎完全是为这种情况而构建的,在这种情况下,您有许多以不同数字后缀结尾的相同变量前缀。这里唯一的区别是您的第一组变量没有后缀,因此您需要先重命名列。
The only issue with pd.wide_to_long
is that it must have an identification variable, i
, unlike melt
. reset_index
is used to create a this uniquely identifying column, which is dropped later. I think this might get corrected in the future.
唯一的问题pd.wide_to_long
是它必须有一个标识变量i
,与 不同melt
。reset_index
用于创建此唯一标识列,稍后将删除该列。我认为这可能会在未来得到纠正。
df1 = df.rename(columns={'A':'A1', 'B':'B1', 'A1':'A2', 'B1':'B2'}).reset_index()
pd.wide_to_long(df1, stubnames=['A', 'B'], i='index', j='id')\
.reset_index()[['A', 'B', 'id']]
A B id
0 1 2 1
1 5 6 1
2 9 10 1
3 3 4 2
4 7 8 2
5 11 12 2
回答by jezrael
You can use lreshape
, for column id
numpy.repeat
:
您可以将lreshape
, 用于列id
numpy.repeat
:
a = [col for col in df.columns if 'A' in col]
b = [col for col in df.columns if 'B' in col]
df1 = pd.lreshape(df, {'A' : a, 'B' : b})
df1['id'] = np.repeat(np.arange(len(df.columns) // 2), len (df.index)) + 1
print (df1)
A B id
0 1 2 1
1 5 6 1
2 9 10 1
3 3 4 2
4 7 8 2
5 11 12 2
EDIT:
编辑:
lreshape
is currently undocumented, but it is possible it might be removed(with pd.wide_to_long too).
lreshape
目前没有记录,但它可能会被删除(也有 pd.wide_to_long)。
Possible solution is merging all 3 functions to one - maybe melt
, but now it is not implementated. Maybe in some new version of pandas. Then my answer will be updated.
可能的解决方案是将所有 3 个功能合并为一个 - 也许melt
,但现在没有实现。也许在一些新版本的熊猫中。然后我的答案会更新。
回答by mprat
I solved this in 3 steps:
我分三步解决了这个问题:
- Make a new dataframe
df2
holding only the data you want to be added to the initial dataframedf
. - Delete the data from
df
that will be added below (and that was used to makedf2
. - Append
df2
todf
.
- 制作一个新的数据框,
df2
仅包含要添加到初始数据框的数据df
。 - 从中删除数据
df
将在下面添加(并且用于制作df2
. - 附加
df2
到df
.
Like so:
像这样:
# step 1: create new dataframe
df2 = df[['A1', 'B1']]
df2.columns = ['A', 'B']
# step 2: delete that data from original
df = df.drop(["A1", "B1"], 1)
# step 3: append
df = df.append(df2, ignore_index=True)
Note how when you do df.append()
you need to specify ignore_index=True
so the new columns get appended to the index rather than keep their old index.
请注意您何时df.append()
需要指定ignore_index=True
以便将新列附加到索引而不是保留它们的旧索引。
Your end result should be your original dataframe with the data rearranged like you wanted:
您的最终结果应该是您的原始数据框,并按照您的需要重新排列数据:
In [16]: df
Out[16]:
A B
0 1 2
1 5 6
2 9 10
3 3 4
4 7 8
5 11 12
回答by Matthew
Use pd.concat()
like so:
pd.concat()
像这样使用:
#Split into separate tables
df_1 = df[['A', 'B']]
df_2 = df[['A1', 'B1']]
df_2.columns = ['A', 'B'] # Make column names line up
# Add the ID column
df_1 = df_1.assign(id=1)
df_2 = df_2.assign(id=2)
# Concatenate
pd.concat([df_1, df_2])