有没有一种简单的方法可以对 Pandas DataFrame 中的列进行分组?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/30791839/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:27:56  来源:igfitidea点击:

Is there an easy way to group columns in a Pandas DataFrame?

pandasdataframeindicescolumnname

提问by lmjohns3

I am trying to use Pandas to represent motion-capture data, which has T measurements of the (x, y, z) locations of each of N markers. For example, with T=3 and N=4, the raw CSV data looks like:

我正在尝试使用 Pandas 来表示运动捕捉数据,该数据具有 N 个标记中每个标记的 (x, y, z) 位置的 T 测量值。例如,当 T=3 和 N=4 时,原始 CSV 数据如下所示:

T,Ax,Ay,Az,Bx,By,Bz,Cx,Cy,Cz,Dx,Dy,Dz
0,1,2,1,3,2,1,4,2,1,5,2,1
1,8,2,3,3,2,9,9,1,3,4,9,1
2,4,5,7,7,7,1,8,3,6,9,2,3

This is really simple to load into a DataFrame, and I've learned a few tricks that are easy (converting marker data to z-scores, or computing velocities, for example).

这真的很容易加载到 DataFrame 中,我已经学会了一些简单的技巧(例如,将标记数据转换为 z 分数或计算速度)。

One thing I'd like to do, though, is convert the "flat" data shown above into a format that has a hierarchical index on the column (marker), so that there would be N columns at level 0 (one for each marker), and each one of those would have 3 columns at level 1 (one each for x, y, and z).

不过,我想做的一件事是将上面显示的“平面”数据转换为在列(标记)上具有分层索引的格式,以便在 0 级有 N 列(每个标记一个),并且其中的每一个在级别 1 上都有 3 列(x、y 和 z 各有一个)。

  A     B     C     D
  x y z x y z x y z x y z
0 1 2 1 3 2 1 4 2 1 5 2 1
1 8 2 3 3 2 9 9 1 3 4 9 1
2 4 5 7 7 7 1 8 3 6 9 2 3

I know how do this by loading up the flat file and then manipulating the Series objects directly, perhaps by using appendor just creating a new DataFrame using a manually-created MultiIndex.

我知道如何通过加载平面文件然后直接操作 Series 对象来做到这一点,也许是通过使用append或仅使用手动创建的 MultiIndex 创建一个新的 DataFrame。

As a Pandas learner, it feels like there must be a way to do this with less effort, but it's hard to discover. Is there an easier way?

作为 Pandas 学习者,感觉必须有一种更省力的方法,但很难发现。有更容易的方法吗?

回答by Ami Tavory

You basically just need to manipulate the column names, in your case.

在您的情况下,您基本上只需要操作列名。

Starting with your original DataFrame (and a tiny index manipulation):

从您的原始数据帧(和一个微小的索引操作)开始:

from StringIO import StringIO
import numpy as np
a = pd.read_csv(StringIO('T,Ax,Ay,Az,Bx,By,Bz,Cx,Cy,Cz,Dx,Dy,Dz\n\
    0,1,2,1,3,2,1,4,2,1,5,2,1\n\
    1,8,2,3,3,2,9,9,1,3,4,9,1\n\
    2,4,5,7,7,7,1,8,3,6,9,2,3'))
a.set_index('T', inplace=True)

So that:

以便:

>> a
Ax  Ay  Az  Bx  By  Bz  Cx  Cy  Cz  Dx  Dy  Dz
T                                               
0   1   2   1   3   2   1   4   2   1   5   2   1
1   8   2   3   3   2   9   9   1   3   4   9   1
2   4   5   7   7   7   1   8   3   6   9   2   3

Then simply create a list of tuples for your columns, and use MultiIndex.from_tuples:

然后只需为您的列创建一个元组列表,并使用MultiIndex.from_tuples

a.columns = pd.MultiIndex.from_tuples([(c[0], c[1]) for c in a.columns])

>> a
    A           B           C           D
    x   y   z   x   y   z   x   y   z   x   y   z
T                                               
0   1   2   1   3   2   1   4   2   1   5   2   1
1   8   2   3   3   2   9   9   1   3   4   9   1
2   4   5   7   7   7   1   8   3   6   9   2   3