pandas 带有熊猫的 DataFrames 的 DataFrame

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28368598/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:55:20  来源:igfitidea点击:

DataFrame of DataFrames with pandas

pythonpandas

提问by Christophe

I have the following DataFrame gathering daily stats on 2 measures A and B :

我有以下 DataFrame 收集关于 2 个措施 A 和 B 的每日统计数据:

                  A             B
count  17266.000000  17266.000000
std        0.179003      0.178781
75%      101.102251    101.053214
min      100.700993    100.651956
mean     101.016747    100.964003
max      101.540214    101.491178
50%      100.988465    100.938694
25%      100.885251    100.830048

Below is a piece of code that creates it:

下面是一段创建它的代码:

day1 = {
    'A': {
    'count': 17266.0,
    'std': 0.17900265293286116,
    'min': 100.70099294189714,
    'max': 101.54021448871775,
    '50%': 100.98846526697825,
    '25%': 100.88525124427971,
    '75%': 101.10225131847992, 
    'mean': 101.01674677794136
    }, 
    'B': {
    'count': 17266.0, 
    'std': 0.17878125983374854, 
    'min': 100.65195609992342, 
    'max': 101.49117764674403, 
    '50%': 100.93869409089723, 
    '25%': 100.83004837814667, 
    '75%': 101.05321447650618, 
    'mean': 100.96400305527138
    }
}
df = pandas.DataFrame.from_dict(day1, orient='index').T

The data come right out from a describe(). I have several such describes (one for each day) and I would like to gather them all into a single dataframe that has the date as an index.

数据直接来自describe()。我有几个这样的描述(每天一个),我想将它们全部收集到一个以日期为索引的数据框中。

The most obvious way to obtain that would be to stack all the daily results into one dataframe, then group it by day and run the stats on the result. However I would like an alternate method because I run into a MemoryError with the amount of data I process.

获得它的最明显的方法是将所有每日结果堆叠到一个数据帧中,然后按天分组并运行结果的统计数据。但是,我想要一种替代方法,因为我遇到了处理数据量的 MemoryError。

The final outcome should look like this:

最终结果应如下所示:

                        A           B    
2014-12-24 count  15895.000000  15895.000000
        mean      99.943618     99.968860
        std        0.012468      0.011932
        min       99.877695     99.928778
        25%       99.934890     99.960445
        50%       99.943453     99.968847
        75%       99.952340     99.977571
        max       99.982930    100.002507
2014-12-25 count  16278.000000  16278.000000
        mean      99.937056     99.962203
        std        0.012395      0.012661
        min       99.884501     99.910567
        25%       99.928078     99.953758
        50%       99.936754     99.962411
        75%       99.945914     99.971473
        max       99.981512    100.003770

回答by joris

If you are able to make a dict of {date: describe_df_for_that_day}, then you can use pd.concat(dict).

如果您能够制作 {date: describe_df_for_that_day} 的字典,那么您可以使用pd.concat(dict).

Starting with your df:

从您的df

In [14]: d = {'2014-12-24': df, '2014-12-25': df}

In [15]: pd.concat(d)
Out[15]:
                             A             B
2014-12-24 count  17266.000000  17266.000000
           std        0.179003      0.178781
           75%      101.102251    101.053214
           min      100.700993    100.651956
           mean     101.016747    100.964003
           max      101.540214    101.491178
           50%      100.988465    100.938694
           25%      100.885251    100.830048
2014-12-25 count  17266.000000  17266.000000
           std        0.179003      0.178781
           75%      101.102251    101.053214
           min      100.700993    100.651956
           mean     101.016747    100.964003
           max      101.540214    101.491178
           50%      100.988465    100.938694
           25%      100.885251    100.830048

You can of course make the keys real dates instead of strings.

您当然可以使键成为真实日期而不是字符串。