Python 重塑熊猫数据框

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/42928911/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 22:16:41  来源:igfitidea点击:

reshape a pandas dataframe

pythonpandasdataframereshapelreshape

提问by Moritz

suppose a dataframe like this one:

假设一个像这样的数据框:

df = pd.DataFrame([[1,2,3,4],[5,6,7,8],[9,10,11,12]], columns = ['A', 'B', 'A1', 'B1'])

enter image description here

在此处输入图片说明

I would like to have a dataframe which looks like:

我想要一个看起来像的数据框:

enter image description here

在此处输入图片说明

what does not work:

什么不起作用:

new_rows = int(df.shape[1]/2) * df.shape[0]
new_cols = 2
df.values.reshape(new_rows, new_cols, order='F')

of course I could loop over the data and make a new list of list but there must be a better way. Any ideas ?

当然,我可以遍历数据并创建一个新的列表列表,但必须有更好的方法。有任何想法吗 ?

采纳答案by Ted Petrou

The pd.wide_to_longfunction is built almost exactly for this situation, where you have many of the same variable prefixes that end in a different digit suffix. The only difference here is that your first set of variables don't have a suffix, so you will need to rename your columns first.

pd.wide_to_long函数几乎完全是为这种情况而构建的,在这种情况下,您有许多以不同数字后缀结尾的相同变量前缀。这里唯一的区别是您的第一组变量没有后缀,因此您需要先重命名列。

The only issue with pd.wide_to_longis that it must have an identification variable, i, unlike melt. reset_indexis used to create a this uniquely identifying column, which is dropped later. I think this might get corrected in the future.

唯一的问题pd.wide_to_long是它必须有一个标识变量i,与 不同meltreset_index用于创建此唯一标识列,稍后将删除该列。我认为这可能会在未来得到纠正。

df1 = df.rename(columns={'A':'A1', 'B':'B1', 'A1':'A2', 'B1':'B2'}).reset_index()
pd.wide_to_long(df1, stubnames=['A', 'B'], i='index', j='id')\
  .reset_index()[['A', 'B', 'id']]

    A   B id
0   1   2  1
1   5   6  1
2   9  10  1
3   3   4  2
4   7   8  2
5  11  12  2

回答by jezrael

You can use lreshape, for column idnumpy.repeat:

您可以将lreshape, 用于列idnumpy.repeat

a = [col for col in df.columns if 'A' in col]
b = [col for col in df.columns if 'B' in col]
df1 = pd.lreshape(df, {'A' : a, 'B' : b})

df1['id'] = np.repeat(np.arange(len(df.columns) // 2), len (df.index))  + 1
print (df1)
    A   B  id
0   1   2   1
1   5   6   1
2   9  10   1
3   3   4   2
4   7   8   2
5  11  12   2

EDIT:

编辑:

lreshapeis currently undocumented, but it is possible it might be removed(with pd.wide_to_long too).

lreshape目前没有记录,但它可能会被删除(也有 pd.wide_to_long)。

Possible solution is merging all 3 functions to one - maybe melt, but now it is not implementated. Maybe in some new version of pandas. Then my answer will be updated.

可能的解决方案是将所有 3 个功能合并为一个 - 也许melt,但现在没有实现。也许在一些新版本的熊猫中。然后我的答案会更新。

回答by mprat

I solved this in 3 steps:

我分三步解决了这个问题:

  1. Make a new dataframe df2holding only the data you want to be added to the initial dataframe df.
  2. Delete the data from dfthat will be added below (and that was used to make df2.
  3. Append df2to df.
  1. 制作一个新的数据框,df2仅包含要添加到初始数据框的数据df
  2. 从中删除数据df将在下面添加(并且用于制作df2.
  3. 附加df2df.

Like so:

像这样:

# step 1: create new dataframe
df2 = df[['A1', 'B1']]
df2.columns = ['A', 'B']

# step 2: delete that data from original
df = df.drop(["A1", "B1"], 1)

# step 3: append
df = df.append(df2, ignore_index=True)

Note how when you do df.append()you need to specify ignore_index=Trueso the new columns get appended to the index rather than keep their old index.

请注意您何时df.append()需要指定ignore_index=True以便将新列附加到索引而不是保留它们的旧索引。

Your end result should be your original dataframe with the data rearranged like you wanted:

您的最终结果应该是您的原始数据框,并按照您的需要重新排列数据:

In [16]: df
Out[16]:
    A   B
0   1   2
1   5   6
2   9  10
3   3   4
4   7   8
5  11  12

回答by Matthew

Use pd.concat()like so:

pd.concat()像这样使用:

#Split into separate tables
df_1 = df[['A', 'B']]
df_2 = df[['A1', 'B1']]
df_2.columns = ['A', 'B'] # Make column names line up

# Add the ID column
df_1 = df_1.assign(id=1)
df_2 = df_2.assign(id=2)

# Concatenate
pd.concat([df_1, df_2])