将两列设置为 Pandas 数据框中的索引以进行时间序列分析

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/35331154/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:40:46  来源:igfitidea点击:

set two columns as the index in a pandas dataframe for time series analysis

pythonpandasindexingtime-series

提问by yoshiserry

In the case of weather or stock market data, temperatures and stock prices are both measured at multiple stations or stock tickers for any given date.

在天气或股票市场数据的情况下,温度和股票价格都是在任何给定日期的多个站点或股票行情中测量的。

Therefore what is the most effective way to set an index which contains two fields?

因此,设置包含两个字段的索引的最有效方法是什么?

For weather: the weather_station and then Date

对于天气:weather_station 然后是 Date

For Stock Data: the stock_code and then Date

对于股票数据:stock_code 然后是日期

Setting the index in this way would allow filtering such as:

以这种方式设置索引将允许过滤,例如:

  • stock_df["code"]["start_date":"end_date"]
  • weather_df["station"]["start_date":"end_date"]
  • stock_df["code"]["start_date":"end_date"]
  • weather_df["station"]["start_date":"end_date"]

回答by fabiosat

As mentioned by Anton you need to use MultiIndex as follows:

正如安东所提到的,您需要按如下方式使用 MultiIndex:

stock_df.index = pd.MultiIndex.from_arrays(stock_df[['code', 'date']].values.T, names=['idx1', 'idx2'])

weather_df.index = pd.MultiIndex.from_arrays(weather_df[['station', 'date']].values.T, names=['idx1', 'idx2'])

回答by Alexander

That functionality currently exists. Please refer to the documentationfor more examples.

该功能目前存在。有关更多示例,请参阅文档

stock_df = pd.DataFrame({'symbol': ['AAPL', 'AAPL', 'F', 'F', 'F'], 
                         'date': ['2016-1-1', '2016-1-2', '2016-1-1', '2016-1-2', '2016-1-3'], 
                         'price': [100., 101, 50, 47.5, 49]}).set_index(['symbol', 'date'])

>>> stock_df
                 price
symbol date           
AAPL   2016-1-1  100.0
       2016-1-2  101.0
F      2016-1-1   50.0
       2016-1-2   47.5
       2016-1-3   49.0

>>> stock_df.loc['AAPL']
          price
date           
2016-1-1    100
2016-1-2    101

>>> stock_df.loc['AAPL', '2016-1-2']
price    101
Name: (AAPL, 2016-1-2), dtype: float64