pandas 带有熊猫的 DataFrames 的 DataFrame

Question

提问by Christophe

I have the following DataFrame gathering daily stats on 2 measures A and B :

我有以下 DataFrame 收集关于 2 个措施 A 和 B 的每日统计数据：

                  A             B
count  17266.000000  17266.000000
std        0.179003      0.178781
75%      101.102251    101.053214
min      100.700993    100.651956
mean     101.016747    100.964003
max      101.540214    101.491178
50%      100.988465    100.938694
25%      100.885251    100.830048

Below is a piece of code that creates it:

下面是一段创建它的代码：

day1 = {
    'A': {
    'count': 17266.0,
    'std': 0.17900265293286116,
    'min': 100.70099294189714,
    'max': 101.54021448871775,
    '50%': 100.98846526697825,
    '25%': 100.88525124427971,
    '75%': 101.10225131847992, 
    'mean': 101.01674677794136
    }, 
    'B': {
    'count': 17266.0, 
    'std': 0.17878125983374854, 
    'min': 100.65195609992342, 
    'max': 101.49117764674403, 
    '50%': 100.93869409089723, 
    '25%': 100.83004837814667, 
    '75%': 101.05321447650618, 
    'mean': 100.96400305527138
    }
}
df = pandas.DataFrame.from_dict(day1, orient='index').T

The data come right out from a describe(). I have several such describes (one for each day) and I would like to gather them all into a single dataframe that has the date as an index.

数据直接来自describe()。我有几个这样的描述（每天一个），我想将它们全部收集到一个以日期为索引的数据框中。

The most obvious way to obtain that would be to stack all the daily results into one dataframe, then group it by day and run the stats on the result. However I would like an alternate method because I run into a MemoryError with the amount of data I process.

获得它的最明显的方法是将所有每日结果堆叠到一个数据帧中，然后按天分组并运行结果的统计数据。但是，我想要一种替代方法，因为我遇到了处理数据量的 MemoryError。

The final outcome should look like this:

最终结果应如下所示：

                        A           B    
2014-12-24 count  15895.000000  15895.000000
        mean      99.943618     99.968860
        std        0.012468      0.011932
        min       99.877695     99.928778
        25%       99.934890     99.960445
        50%       99.943453     99.968847
        75%       99.952340     99.977571
        max       99.982930    100.002507
2014-12-25 count  16278.000000  16278.000000
        mean      99.937056     99.962203
        std        0.012395      0.012661
        min       99.884501     99.910567
        25%       99.928078     99.953758
        50%       99.936754     99.962411
        75%       99.945914     99.971473
        max       99.981512    100.003770

Answer 1

回答by joris

If you are able to make a dict of {date: describe_df_for_that_day}, then you can use pd.concat(dict).

如果您能够制作 {date: describe_df_for_that_day} 的字典，那么您可以使用pd.concat(dict).

Starting with your df:

从您的df：

In [14]: d = {'2014-12-24': df, '2014-12-25': df}

In [15]: pd.concat(d)
Out[15]:
                             A             B
2014-12-24 count  17266.000000  17266.000000
           std        0.179003      0.178781
           75%      101.102251    101.053214
           min      100.700993    100.651956
           mean     101.016747    100.964003
           max      101.540214    101.491178
           50%      100.988465    100.938694
           25%      100.885251    100.830048
2014-12-25 count  17266.000000  17266.000000
           std        0.179003      0.178781
           75%      101.102251    101.053214
           min      100.700993    100.651956
           mean     101.016747    100.964003
           max      101.540214    101.491178
           50%      100.988465    100.938694
           25%      100.885251    100.830048

You can of course make the keys real dates instead of strings.

您当然可以使键成为真实日期而不是字符串。

pandas 带有熊猫的 DataFrames 的 DataFrame

提问by Christophe

回答by joris

相关推荐

最近更新

标签

pandas 带有熊猫的 DataFrames 的 DataFrame

提问by Christophe

回答by joris

相关推荐

ipython pandas TypeError: read_csv() 得到了一个意外的关键字参数“delim-whitespace”

pandas 安装 Numpy 时出错

Pandas：如何将 int64 年的索引转换为日期时间

Pandas + scikit-learn K-means 无法正常工作 - 将所有数据帧行视为一个大的多维示例

相关推荐

最近更新

标签