有没有一种简单的方法可以对 Pandas DataFrame 中的列进行分组？

Question

提问by lmjohns3

I am trying to use Pandas to represent motion-capture data, which has T measurements of the (x, y, z) locations of each of N markers. For example, with T=3 and N=4, the raw CSV data looks like:

我正在尝试使用 Pandas 来表示运动捕捉数据，该数据具有 N 个标记中每个标记的 (x, y, z) 位置的 T 测量值。例如，当 T=3 和 N=4 时，原始 CSV 数据如下所示：

T,Ax,Ay,Az,Bx,By,Bz,Cx,Cy,Cz,Dx,Dy,Dz
0,1,2,1,3,2,1,4,2,1,5,2,1
1,8,2,3,3,2,9,9,1,3,4,9,1
2,4,5,7,7,7,1,8,3,6,9,2,3

This is really simple to load into a DataFrame, and I've learned a few tricks that are easy (converting marker data to z-scores, or computing velocities, for example).

这真的很容易加载到 DataFrame 中，我已经学会了一些简单的技巧（例如，将标记数据转换为 z 分数或计算速度）。

One thing I'd like to do, though, is convert the "flat" data shown above into a format that has a hierarchical index on the column (marker), so that there would be N columns at level 0 (one for each marker), and each one of those would have 3 columns at level 1 (one each for x, y, and z).

不过，我想做的一件事是将上面显示的“平面”数据转换为在列（标记）上具有分层索引的格式，以便在 0 级有 N 列（每个标记一个)，并且其中的每一个在级别 1 上都有 3 列（x、y 和 z 各有一个）。

  A     B     C     D
  x y z x y z x y z x y z
0 1 2 1 3 2 1 4 2 1 5 2 1
1 8 2 3 3 2 9 9 1 3 4 9 1
2 4 5 7 7 7 1 8 3 6 9 2 3

I know how do this by loading up the flat file and then manipulating the Series objects directly, perhaps by using appendor just creating a new DataFrame using a manually-created MultiIndex.

我知道如何通过加载平面文件然后直接操作 Series 对象来做到这一点，也许是通过使用append或仅使用手动创建的 MultiIndex 创建一个新的 DataFrame。

As a Pandas learner, it feels like there must be a way to do this with less effort, but it's hard to discover. Is there an easier way?

作为 Pandas 学习者，感觉必须有一种更省力的方法，但很难发现。有更容易的方法吗？

Answer 1

回答by Ami Tavory

You basically just need to manipulate the column names, in your case.

在您的情况下，您基本上只需要操作列名。

Starting with your original DataFrame (and a tiny index manipulation):

从您的原始数据帧（和一个微小的索引操作）开始：

from StringIO import StringIO
import numpy as np
a = pd.read_csv(StringIO('T,Ax,Ay,Az,Bx,By,Bz,Cx,Cy,Cz,Dx,Dy,Dz\n\
    0,1,2,1,3,2,1,4,2,1,5,2,1\n\
    1,8,2,3,3,2,9,9,1,3,4,9,1\n\
    2,4,5,7,7,7,1,8,3,6,9,2,3'))
a.set_index('T', inplace=True)

So that:

以便：

>> a
Ax  Ay  Az  Bx  By  Bz  Cx  Cy  Cz  Dx  Dy  Dz
T                                               
0   1   2   1   3   2   1   4   2   1   5   2   1
1   8   2   3   3   2   9   9   1   3   4   9   1
2   4   5   7   7   7   1   8   3   6   9   2   3

Then simply create a list of tuples for your columns, and use MultiIndex.from_tuples:

然后只需为您的列创建一个元组列表，并使用MultiIndex.from_tuples：

a.columns = pd.MultiIndex.from_tuples([(c[0], c[1]) for c in a.columns])

>> a
    A           B           C           D
    x   y   z   x   y   z   x   y   z   x   y   z
T                                               
0   1   2   1   3   2   1   4   2   1   5   2   1
1   8   2   3   3   2   9   9   1   3   4   9   1
2   4   5   7   7   7   1   8   3   6   9   2   3

有没有一种简单的方法可以对 Pandas DataFrame 中的列进行分组？

提问by lmjohns3

回答by Ami Tavory

相关推荐

最近更新

标签

有没有一种简单的方法可以对 Pandas DataFrame 中的列进行分组？

提问by lmjohns3

回答by Ami Tavory

相关推荐

从通过 Pandas 创建的 html 表中删除边框

Python Pandas 时间序列插值和正则化

pandas 熊猫中的 NoneType 对象不是可迭代的错误

pandas 如何使用 XlsxWriter 将多种格式应用于一列

相关推荐

最近更新

标签