pandas 大熊猫旋转数据框,重复行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/11400181/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 15:46:07  来源:igfitidea点击:

pandas pivoting a dataframe, duplicate rows

pythonpivotpivot-tablepandas

提问by tomas

I'm having a little trouble with pivoting in pandas. The dataframe(dates, location, data) I'm working on looks like:

我在 Pandas 中旋转时遇到了一些麻烦。dataframe我正在处理的(日期、位置、数据)如下所示:

dates    location    data
date1       A         X
date2       A         Y
date3       A         Z
date1       B         XX
date2       B         YY

Basically, I'm trying to pivot on location to end up with a dataframe like:

基本上,我试图以位置为中心以最终得到如下数据框:

dates   A    B    C
date1   X    XX   etc...
date2   Y    YY
date3   Z    ZZ 

Unfortunately when I pivot, the index, which is equivalent to the original dates column, does not change and I get:

不幸的是,当我旋转时,相当于原始日期列的索引不会改变,我得到:

dates  A   B   C
date1  X   NA  etc...
date2  Y   NA
date3  Z   NA
date1  NA  XX
date2  NA  YY

Does anyone know how I can fix this issue to get the dataframe formate I'm looking for?

有谁知道我如何解决这个问题以获得我正在寻找的数据框格式?

I'm current calling Pivot as such:

我现在这样称呼 Pivot:

df.pivot(index="dates", columns="location")

because I have a # of data columns I want to pivot (don't want to list each one as an argument). I believe by default pivot pivots the rest of the columns in the dataframe. Thanks.

因为我有一个数据列,我想旋转(不想将每个列都作为参数列出)。我相信默认情况下枢轴会旋转数据框中的其余列。谢谢。

回答by Chang She

If you have multiple data columns, calling pivot without the values columns should give you a pivoted frame with a MultiIndex as the columns:

如果您有多个数据列,则在没有值列的情况下调用 pivot 应该会为您提供一个以 MultiIndex 作为列的旋转框架:

In [3]: df
Out[3]: 
  columns     data1     data2 index
0       a -0.602398 -0.982524     x
1       a  0.880927  0.818551     y
2       b -0.238849  0.766986     z
3       b -1.304346  0.955031     x
4       c -0.094820  0.746046     y
5       c -0.835785  1.123243     z

In [4]: df.pivot('index', 'columns')
Out[4]: 
            data1                         data2                    
columns         a         b         c         a         b         c
index                                                              
x       -0.602398 -1.304346       NaN -0.982524  0.955031       NaN
y        0.880927       NaN -0.094820  0.818551       NaN  0.746046
z             NaN -0.238849 -0.835785       NaN  0.766986  1.123243

回答by Chang She

How are you calling DataFrame.pivot and what datatype is your dates column?

您如何调用 DataFrame.pivot 以及您的日期列是什么数据类型?

Suppose I have a DataFrame that's similar to yours, the dates columns contains datetime objects:

假设我有一个类似于你的 DataFrame,日期列包含日期时间对象:

In [52]: df
Out[52]: 
       data                dates loc
0  0.870900  2000-01-01 00:00:00   A
1  0.344999  2000-01-02 00:00:00   A
2  0.001729  2000-01-03 00:00:00   A
3  1.565684  2000-01-01 00:00:00   B
4 -0.851542  2000-01-02 00:00:00   B


In [53]: df.pivot('dates', 'loc', 'data')
Out[53]: 
loc                A         B
dates                         
2000-01-01  0.870900  1.565684
2000-01-02  0.344999 -0.851542
2000-01-03  0.001729       NaN

回答by tomas

Just answered my own question. I was using an old Sybase module to import data and I think it used an old DateTimeType object from mxDatetime. In that module, a datetime of Jan 01 2011 would not necessarily equal another datetime of Jan 01 2011 (e.g. each datetime was unique). Hence the dataframe pivot treated each column value as unique in the index.

刚刚回答了我自己的问题。我使用旧的 Sybase 模块导入数据,我认为它使用了来自 mxDatetime 的旧 DateTimeType 对象。在该模块中,2011 年 1 月 1 日的日期时间不一定等于 2011 年 1 月 1 日的另一个日期时间(例如,每个日期时间都是唯一的)。因此,数据帧枢轴将每个列值视为索引中的唯一值。

Thanks for the help.

谢谢您的帮助。