将两列设置为 Pandas 数据框中的索引以进行时间序列分析
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/35331154/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
set two columns as the index in a pandas dataframe for time series analysis
提问by yoshiserry
In the case of weather or stock market data, temperatures and stock prices are both measured at multiple stations or stock tickers for any given date.
在天气或股票市场数据的情况下,温度和股票价格都是在任何给定日期的多个站点或股票行情中测量的。
Therefore what is the most effective way to set an index which contains two fields?
因此,设置包含两个字段的索引的最有效方法是什么?
For weather: the weather_station and then Date
对于天气:weather_station 然后是 Date
For Stock Data: the stock_code and then Date
对于股票数据:stock_code 然后是日期
Setting the index in this way would allow filtering such as:
以这种方式设置索引将允许过滤,例如:
stock_df["code"]["start_date":"end_date"]
weather_df["station"]["start_date":"end_date"]
stock_df["code"]["start_date":"end_date"]
weather_df["station"]["start_date":"end_date"]
回答by fabiosat
As mentioned by Anton you need to use MultiIndex as follows:
正如安东所提到的,您需要按如下方式使用 MultiIndex:
stock_df.index = pd.MultiIndex.from_arrays(stock_df[['code', 'date']].values.T, names=['idx1', 'idx2'])
weather_df.index = pd.MultiIndex.from_arrays(weather_df[['station', 'date']].values.T, names=['idx1', 'idx2'])
回答by Alexander
That functionality currently exists. Please refer to the documentationfor more examples.
该功能目前存在。有关更多示例,请参阅文档。
stock_df = pd.DataFrame({'symbol': ['AAPL', 'AAPL', 'F', 'F', 'F'],
'date': ['2016-1-1', '2016-1-2', '2016-1-1', '2016-1-2', '2016-1-3'],
'price': [100., 101, 50, 47.5, 49]}).set_index(['symbol', 'date'])
>>> stock_df
price
symbol date
AAPL 2016-1-1 100.0
2016-1-2 101.0
F 2016-1-1 50.0
2016-1-2 47.5
2016-1-3 49.0
>>> stock_df.loc['AAPL']
price
date
2016-1-1 100
2016-1-2 101
>>> stock_df.loc['AAPL', '2016-1-2']
price 101
Name: (AAPL, 2016-1-2), dtype: float64