如何获取 Pandas DataFrame 最近 5 个月的数据?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/43678778/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:30:22  来源:igfitidea点击:

How to get last 5 months of data of a Pandas DataFrame?

pythonpandasdataframe

提问by Leonel

I have a Pandas dataframe that was generated by fetching data from a KDB database, using QPython.

我有一个 Pandas 数据框,它是通过使用QPython从 KDB 数据库中获取数据而生成的。

First, the Date column is returned as a strange dtype: dtype('<M8[ns]')

首先,Date 列作为一个奇怪的 dtype 返回: dtype('<M8[ns]')

df = conn.sync("select Date, Open, High, Low, Close from stocktable", pandas=True)
df["Date"].dtype
# dtype('<M8[ns]')

However, when I inspect the contents of the column, the bottom row displays the dtype as datetime.

但是,当我检查列的内容时,底行将 dtype 显示为日期时间。

0      2017-04-17
1      2017-04-13
2      2017-04-12
3      2017-04-11
4      2017-04-10
5      2017-04-07
6      2017-04-06
7      2017-04-05
8      2017-04-04
9      2017-04-03
10     2017-03-31
11     2017-03-30
          ...    

3180   2004-08-27
3181   2004-08-26
3182   2004-08-25
3183   2004-08-24
3184   2004-08-23
3185   2004-08-20
3186   2004-08-19
Name: Date, dtype: datetime64[ns]

Also, method last()is not working correctly. I ask for the last 5 months worth of data, but all of the data is returned.

此外,方法last()无法正常工作。我要求提供过去 5 个月的数据,但返回了所有数据。

# Expected to only return last 5 months of data, but returns it all.
df.set_index("Date").last("5M")

How do I get the last rows of this DataFrame?

如何获取此 DataFrame 的最后一行?

采纳答案by Leonel

Solved it. The issue was that the data returned by KDB was sorted in DESC order, which confused method last().

解决了。问题是KDB返回的数据是按DESC顺序排序的,混淆了方法last()

The solution is to either add a sort clause to the query (in the Q language, it's with a backtick followed by the keyword xasc)

解决方案是在查询中添加一个排序子句(在 Q 语言中,它带有一个反引号,后跟关键字 xasc

df = conn.sync("`Date xasc select Date, Open, High, Low, Close from stocktable", pandas=True) \
     .last("5M")

Or else, to sort the data in the Pandas dataframe itself.

或者,对 Pandas 数据帧本身中的数据进行排序。

df_sorted = stocktable.dataframe() \
    .sort_values(by="Date",ascending=True) \
    .set_index("Date")
    .last("5M")

回答by MaxU

It works properly for me.

它适合我。

Demo:

演示:

In [71]: from pandas_datareader import data as web

In [72]: df = web.DataReader('AAPL', 'yahoo', '2010-04-01')

In [73]: df
Out[73]:
                  Open        High         Low       Close     Volume   Adj Close
Date
2010-04-01  237.410000  238.730003  232.750000  235.969994  150786300   30.572166
2010-04-05  234.980011  238.509998  234.769993  238.489998  171126900   30.898657
2010-04-06  238.200005  240.239998  237.000004  239.540009  111754300   31.034696
2010-04-07  239.549995  241.920010  238.659988  240.600006  157125500   31.172029
2010-04-08  240.440002  241.540001  238.040001  239.950005  143247300   31.087815
2010-04-09  241.430012  241.889996  240.460003  241.789993   83545700   31.326203
2010-04-12  242.199989  243.069996  241.809994  242.290005   83256600   31.390984
2010-04-13  241.860008  242.800003  241.110004  242.430008   76552700   31.409123
2010-04-14  245.280006  245.810005  244.069992  245.690002  101019100   31.831486
2010-04-15  245.779991  249.029999  245.509998  248.920010   94196200   32.249965
...                ...         ...         ...         ...        ...         ...
2017-04-13  141.910004  142.380005  141.050003  141.050003   17652900  141.050003
2017-04-17  141.479996  141.880005  140.869995  141.830002   16424000  141.830002
2017-04-18  141.410004  142.039993  141.110001  141.199997   14660800  141.199997
2017-04-19  141.880005  142.000000  140.449997  140.679993   17271300  140.679993
2017-04-20  141.220001  142.919998  141.160004  142.440002   23251100  142.440002
2017-04-21  142.440002  142.679993  141.850006  142.270004   17245200  142.270004
2017-04-24  143.500000  143.949997  143.179993  143.639999   17099200  143.639999
2017-04-25  143.910004  144.899994  143.869995  144.529999   18290300  144.529999
2017-04-26  144.470001  144.600006  143.380005  143.679993   19769400  143.679993
2017-04-27  143.919998  144.160004  143.309998  143.789993   14106100  143.789993

[1781 rows x 6 columns]

In [74]: df.last('5M')
Out[74]:
                  Open        High         Low       Close    Volume   Adj Close
Date
2016-12-01  110.370003  110.940002  109.029999  109.489998  37086900  109.017344
2016-12-02  109.169998  110.089996  108.849998  109.900002  26528000  109.425578
2016-12-05  110.000000  110.029999  108.250000  109.110001  34324500  108.638987
2016-12-06  109.500000  110.360001  109.190002  109.949997  26195500  109.475358
2016-12-07  109.260002  111.190002  109.160004  111.029999  29998700  110.550697
2016-12-08  110.860001  112.430000  110.599998  112.120003  27068300  111.635996
2016-12-09  112.309998  114.699997  112.309998  113.949997  34402600  113.458090
2016-12-12  113.290001  115.000000  112.489998  113.300003  26374400  112.810902
2016-12-13  113.839996  115.919998  113.750000  115.190002  43733800  114.692743
2016-12-14  115.040001  116.199997  114.980003  115.190002  34031800  114.692743
...                ...         ...         ...         ...       ...         ...
2017-04-13  141.910004  142.380005  141.050003  141.050003  17652900  141.050003
2017-04-17  141.479996  141.880005  140.869995  141.830002  16424000  141.830002
2017-04-18  141.410004  142.039993  141.110001  141.199997  14660800  141.199997
2017-04-19  141.880005  142.000000  140.449997  140.679993  17271300  140.679993
2017-04-20  141.220001  142.919998  141.160004  142.440002  23251100  142.440002
2017-04-21  142.440002  142.679993  141.850006  142.270004  17245200  142.270004
2017-04-24  143.500000  143.949997  143.179993  143.639999  17099200  143.639999
2017-04-25  143.910004  144.899994  143.869995  144.529999  18290300  144.529999
2017-04-26  144.470001  144.600006  143.380005  143.679993  19769400  143.679993
2017-04-27  143.919998  144.160004  143.309998  143.789993  14106100  143.789993

[101 rows x 6 columns]

回答by jezrael

For me it works nice:

对我来说它很好用:

rng = pd.date_range('2017-04-03', periods=10, freq='20D')
df = pd.DataFrame({'Date': rng, 'a': range(10)})  
print (df)
        Date  a
0 2017-04-03  0
1 2017-04-23  1
2 2017-05-13  2
3 2017-06-02  3
4 2017-06-22  4
5 2017-07-12  5
6 2017-08-01  6
7 2017-08-21  7
8 2017-09-10  8
9 2017-09-30  9

df = df.set_index('Date').last('5M')
print (df)
            a
Date         
2017-05-13  2
2017-06-02  3
2017-06-22  4
2017-07-12  5
2017-08-01  6
2017-08-21  7
2017-09-10  8
2017-09-30  9


It works nice with duplicates also, only is necessary sort DateTimecolumn:

它也适用于重复项,只有必要的排序DateTime列:

rng = pd.date_range('2017-04-03', periods=10, freq='20D')
df = pd.DataFrame({'Date': rng, 'a': range(10)})  
df = pd.concat([df,df], ignore_index=True).sort_values('Date')
print (df)
         Date  a
0  2017-04-03  0
10 2017-04-03  0
1  2017-04-23  1
11 2017-04-23  1
2  2017-05-13  2
12 2017-05-13  2
3  2017-06-02  3
13 2017-06-02  3
4  2017-06-22  4
14 2017-06-22  4
5  2017-07-12  5
15 2017-07-12  5
6  2017-08-01  6
16 2017-08-01  6
17 2017-08-21  7
7  2017-08-21  7
18 2017-09-10  8
8  2017-09-10  8
9  2017-09-30  9
19 2017-09-30  9

df = df.set_index('Date').last('5M')
print (df)
            a
Date         
2017-05-13  2
2017-05-13  2
2017-06-02  3
2017-06-02  3
2017-06-22  4
2017-06-22  4
2017-07-12  5
2017-07-12  5
2017-08-01  6
2017-08-01  6
2017-08-21  7
2017-08-21  7
2017-09-10  8
2017-09-10  8
2017-09-30  9
2017-09-30  9
rng = pd.date_range('2017-04-03', periods=10, freq='20D')
df = pd.DataFrame({'Date': rng, 'a': range(10)})  
df = pd.concat([df,df], ignore_index=True)
print (df)
         Date  a
0  2017-04-03  0
1  2017-04-23  1
2  2017-05-13  2
3  2017-06-02  3
4  2017-06-22  4
5  2017-07-12  5
6  2017-08-01  6
7  2017-08-21  7
8  2017-09-10  8
9  2017-09-30  9
10 2017-04-03  0
11 2017-04-23  1
12 2017-05-13  2
13 2017-06-02  3
14 2017-06-22  4
15 2017-07-12  5
16 2017-08-01  6
17 2017-08-21  7
18 2017-09-10  8
19 2017-09-30  9

df = df.set_index('Date').last('5M')
print (df)
            a
Date         
2017-05-13  2
2017-06-02  3
2017-06-22  4
2017-07-12  5
2017-08-01  6
2017-08-21  7
2017-09-10  8
2017-09-30  9