Python 重塑熊猫数据框

Question

提问by Moritz

suppose a dataframe like this one:

假设一个像这样的数据框：

df = pd.DataFrame([[1,2,3,4],[5,6,7,8],[9,10,11,12]], columns = ['A', 'B', 'A1', 'B1'])

I would like to have a dataframe which looks like:

我想要一个看起来像的数据框：

what does not work:

什么不起作用：

new_rows = int(df.shape[1]/2) * df.shape[0]
new_cols = 2
df.values.reshape(new_rows, new_cols, order='F')

of course I could loop over the data and make a new list of list but there must be a better way. Any ideas ?

当然，我可以遍历数据并创建一个新的列表列表，但必须有更好的方法。有任何想法吗？

Answer 1

采纳答案by Ted Petrou

The pd.wide_to_longfunction is built almost exactly for this situation, where you have many of the same variable prefixes that end in a different digit suffix. The only difference here is that your first set of variables don't have a suffix, so you will need to rename your columns first.

该pd.wide_to_long函数几乎完全是为这种情况而构建的，在这种情况下，您有许多以不同数字后缀结尾的相同变量前缀。这里唯一的区别是您的第一组变量没有后缀，因此您需要先重命名列。

The only issue with pd.wide_to_longis that it must have an identification variable, i, unlike melt. reset_indexis used to create a this uniquely identifying column, which is dropped later. I think this might get corrected in the future.

唯一的问题pd.wide_to_long是它必须有一个标识变量i，与不同melt。reset_index用于创建此唯一标识列，稍后将删除该列。我认为这可能会在未来得到纠正。

df1 = df.rename(columns={'A':'A1', 'B':'B1', 'A1':'A2', 'B1':'B2'}).reset_index()
pd.wide_to_long(df1, stubnames=['A', 'B'], i='index', j='id')\
  .reset_index()[['A', 'B', 'id']]

    A   B id
0   1   2  1
1   5   6  1
2   9  10  1
3   3   4  2
4   7   8  2
5  11  12  2

Answer 2

回答by jezrael

You can use lreshape, for column idnumpy.repeat:

您可以将lreshape, 用于列idnumpy.repeat：

a = [col for col in df.columns if 'A' in col]
b = [col for col in df.columns if 'B' in col]
df1 = pd.lreshape(df, {'A' : a, 'B' : b})

df1['id'] = np.repeat(np.arange(len(df.columns) // 2), len (df.index))  + 1
print (df1)
    A   B  id
0   1   2   1
1   5   6   1
2   9  10   1
3   3   4   2
4   7   8   2
5  11  12   2

EDIT:

编辑：

lreshapeis currently undocumented, but it is possible it might be removed(with pd.wide_to_long too).

lreshape目前没有记录，但它可能会被删除（也有 pd.wide_to_long）。

Possible solution is merging all 3 functions to one - maybe melt, but now it is not implementated. Maybe in some new version of pandas. Then my answer will be updated.

可能的解决方案是将所有 3 个功能合并为一个 - 也许melt，但现在没有实现。也许在一些新版本的熊猫中。然后我的答案会更新。

Answer 3

回答by mprat

I solved this in 3 steps:

我分三步解决了这个问题：

Make a new dataframe df2holding only the data you want to be added to the initial dataframe df.
Delete the data from dfthat will be added below (and that was used to make df2.
Append df2to df.

制作一个新的数据框，df2仅包含要添加到初始数据框的数据df。
从中删除数据df将在下面添加（并且用于制作df2.
附加df2到df.

Like so:

像这样：

# step 1: create new dataframe
df2 = df[['A1', 'B1']]
df2.columns = ['A', 'B']

# step 2: delete that data from original
df = df.drop(["A1", "B1"], 1)

# step 3: append
df = df.append(df2, ignore_index=True)

Note how when you do df.append()you need to specify ignore_index=Trueso the new columns get appended to the index rather than keep their old index.

请注意您何时df.append()需要指定ignore_index=True以便将新列附加到索引而不是保留它们的旧索引。

Your end result should be your original dataframe with the data rearranged like you wanted:

您的最终结果应该是您的原始数据框，并按照您的需要重新排列数据：

In [16]: df
Out[16]:
    A   B
0   1   2
1   5   6
2   9  10
3   3   4
4   7   8
5  11  12

Answer 4

回答by Matthew

Use pd.concat()like so:

pd.concat()像这样使用：

#Split into separate tables
df_1 = df[['A', 'B']]
df_2 = df[['A1', 'B1']]
df_2.columns = ['A', 'B'] # Make column names line up

# Add the ID column
df_1 = df_1.assign(id=1)
df_2 = df_2.assign(id=2)

# Concatenate
pd.concat([df_1, df_2])

Python 重塑熊猫数据框

提问by Moritz

采纳答案by Ted Petrou

回答by jezrael

回答by mprat

回答by Matthew

相关推荐

最近更新

标签

Python 重塑熊猫数据框

提问by Moritz

采纳答案by Ted Petrou

回答by jezrael

回答by mprat

回答by Matthew

相关推荐

如何在 Python 3 中用前导零填充字符串

管理 Python 虚拟环境的 requirements.txt 的内容

Python NumPy 版本的“指数加权移动平均线”，相当于 pandas.ewm().mean()

Python 相关热图

相关推荐

最近更新

标签