pandas 带有熊猫的 DataFrames 的 DataFrame
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28368598/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
DataFrame of DataFrames with pandas
提问by Christophe
I have the following DataFrame gathering daily stats on 2 measures A and B :
我有以下 DataFrame 收集关于 2 个措施 A 和 B 的每日统计数据:
A B
count 17266.000000 17266.000000
std 0.179003 0.178781
75% 101.102251 101.053214
min 100.700993 100.651956
mean 101.016747 100.964003
max 101.540214 101.491178
50% 100.988465 100.938694
25% 100.885251 100.830048
Below is a piece of code that creates it:
下面是一段创建它的代码:
day1 = {
'A': {
'count': 17266.0,
'std': 0.17900265293286116,
'min': 100.70099294189714,
'max': 101.54021448871775,
'50%': 100.98846526697825,
'25%': 100.88525124427971,
'75%': 101.10225131847992,
'mean': 101.01674677794136
},
'B': {
'count': 17266.0,
'std': 0.17878125983374854,
'min': 100.65195609992342,
'max': 101.49117764674403,
'50%': 100.93869409089723,
'25%': 100.83004837814667,
'75%': 101.05321447650618,
'mean': 100.96400305527138
}
}
df = pandas.DataFrame.from_dict(day1, orient='index').T
The data come right out from a describe(). I have several such describes (one for each day) and I would like to gather them all into a single dataframe that has the date as an index.
数据直接来自describe()。我有几个这样的描述(每天一个),我想将它们全部收集到一个以日期为索引的数据框中。
The most obvious way to obtain that would be to stack all the daily results into one dataframe, then group it by day and run the stats on the result. However I would like an alternate method because I run into a MemoryError with the amount of data I process.
获得它的最明显的方法是将所有每日结果堆叠到一个数据帧中,然后按天分组并运行结果的统计数据。但是,我想要一种替代方法,因为我遇到了处理数据量的 MemoryError。
The final outcome should look like this:
最终结果应如下所示:
A B
2014-12-24 count 15895.000000 15895.000000
mean 99.943618 99.968860
std 0.012468 0.011932
min 99.877695 99.928778
25% 99.934890 99.960445
50% 99.943453 99.968847
75% 99.952340 99.977571
max 99.982930 100.002507
2014-12-25 count 16278.000000 16278.000000
mean 99.937056 99.962203
std 0.012395 0.012661
min 99.884501 99.910567
25% 99.928078 99.953758
50% 99.936754 99.962411
75% 99.945914 99.971473
max 99.981512 100.003770
回答by joris
If you are able to make a dict of {date: describe_df_for_that_day}, then you can use pd.concat(dict).
如果您能够制作 {date: describe_df_for_that_day} 的字典,那么您可以使用pd.concat(dict).
Starting with your df:
从您的df:
In [14]: d = {'2014-12-24': df, '2014-12-25': df}
In [15]: pd.concat(d)
Out[15]:
A B
2014-12-24 count 17266.000000 17266.000000
std 0.179003 0.178781
75% 101.102251 101.053214
min 100.700993 100.651956
mean 101.016747 100.964003
max 101.540214 101.491178
50% 100.988465 100.938694
25% 100.885251 100.830048
2014-12-25 count 17266.000000 17266.000000
std 0.179003 0.178781
75% 101.102251 101.053214
min 100.700993 100.651956
mean 101.016747 100.964003
max 101.540214 101.491178
50% 100.988465 100.938694
25% 100.885251 100.830048
You can of course make the keys real dates instead of strings.
您当然可以使键成为真实日期而不是字符串。

