如何获取 Pandas DataFrame 最近 5 个月的数据？

Question

提问by Leonel

I have a Pandas dataframe that was generated by fetching data from a KDB database, using QPython.

我有一个 Pandas 数据框，它是通过使用QPython从 KDB 数据库中获取数据而生成的。

First, the Date column is returned as a strange dtype: dtype('<M8[ns]')

首先，Date 列作为一个奇怪的 dtype 返回： dtype('<M8[ns]')

df = conn.sync("select Date, Open, High, Low, Close from stocktable", pandas=True)
df["Date"].dtype
# dtype('<M8[ns]')

However, when I inspect the contents of the column, the bottom row displays the dtype as datetime.

但是，当我检查列的内容时，底行将 dtype 显示为日期时间。

0      2017-04-17
1      2017-04-13
2      2017-04-12
3      2017-04-11
4      2017-04-10
5      2017-04-07
6      2017-04-06
7      2017-04-05
8      2017-04-04
9      2017-04-03
10     2017-03-31
11     2017-03-30
          ...    

3180   2004-08-27
3181   2004-08-26
3182   2004-08-25
3183   2004-08-24
3184   2004-08-23
3185   2004-08-20
3186   2004-08-19
Name: Date, dtype: datetime64[ns]

Also, method last()is not working correctly. I ask for the last 5 months worth of data, but all of the data is returned.

此外，方法last()无法正常工作。我要求提供过去 5 个月的数据，但返回了所有数据。

# Expected to only return last 5 months of data, but returns it all.
df.set_index("Date").last("5M")

How do I get the last rows of this DataFrame?

如何获取此 DataFrame 的最后一行？

Answer 1

采纳答案by Leonel

Solved it. The issue was that the data returned by KDB was sorted in DESC order, which confused method last().

解决了。问题是KDB返回的数据是按DESC顺序排序的，混淆了方法last()。

The solution is to either add a sort clause to the query (in the Q language, it's with a backtick followed by the keyword xasc)

解决方案是在查询中添加一个排序子句（在 Q 语言中，它带有一个反引号，后跟关键字 xasc）

df = conn.sync("`Date xasc select Date, Open, High, Low, Close from stocktable", pandas=True) \
     .last("5M")

Or else, to sort the data in the Pandas dataframe itself.

或者，对 Pandas 数据帧本身中的数据进行排序。

df_sorted = stocktable.dataframe() \
    .sort_values(by="Date",ascending=True) \
    .set_index("Date")
    .last("5M")

Answer 2

回答by MaxU

It works properly for me.

它适合我。

Demo:

演示：

In [71]: from pandas_datareader import data as web

In [72]: df = web.DataReader('AAPL', 'yahoo', '2010-04-01')

In [73]: df
Out[73]:
                  Open        High         Low       Close     Volume   Adj Close
Date
2010-04-01  237.410000  238.730003  232.750000  235.969994  150786300   30.572166
2010-04-05  234.980011  238.509998  234.769993  238.489998  171126900   30.898657
2010-04-06  238.200005  240.239998  237.000004  239.540009  111754300   31.034696
2010-04-07  239.549995  241.920010  238.659988  240.600006  157125500   31.172029
2010-04-08  240.440002  241.540001  238.040001  239.950005  143247300   31.087815
2010-04-09  241.430012  241.889996  240.460003  241.789993   83545700   31.326203
2010-04-12  242.199989  243.069996  241.809994  242.290005   83256600   31.390984
2010-04-13  241.860008  242.800003  241.110004  242.430008   76552700   31.409123
2010-04-14  245.280006  245.810005  244.069992  245.690002  101019100   31.831486
2010-04-15  245.779991  249.029999  245.509998  248.920010   94196200   32.249965
...                ...         ...         ...         ...        ...         ...
2017-04-13  141.910004  142.380005  141.050003  141.050003   17652900  141.050003
2017-04-17  141.479996  141.880005  140.869995  141.830002   16424000  141.830002
2017-04-18  141.410004  142.039993  141.110001  141.199997   14660800  141.199997
2017-04-19  141.880005  142.000000  140.449997  140.679993   17271300  140.679993
2017-04-20  141.220001  142.919998  141.160004  142.440002   23251100  142.440002
2017-04-21  142.440002  142.679993  141.850006  142.270004   17245200  142.270004
2017-04-24  143.500000  143.949997  143.179993  143.639999   17099200  143.639999
2017-04-25  143.910004  144.899994  143.869995  144.529999   18290300  144.529999
2017-04-26  144.470001  144.600006  143.380005  143.679993   19769400  143.679993
2017-04-27  143.919998  144.160004  143.309998  143.789993   14106100  143.789993

[1781 rows x 6 columns]

In [74]: df.last('5M')
Out[74]:
                  Open        High         Low       Close    Volume   Adj Close
Date
2016-12-01  110.370003  110.940002  109.029999  109.489998  37086900  109.017344
2016-12-02  109.169998  110.089996  108.849998  109.900002  26528000  109.425578
2016-12-05  110.000000  110.029999  108.250000  109.110001  34324500  108.638987
2016-12-06  109.500000  110.360001  109.190002  109.949997  26195500  109.475358
2016-12-07  109.260002  111.190002  109.160004  111.029999  29998700  110.550697
2016-12-08  110.860001  112.430000  110.599998  112.120003  27068300  111.635996
2016-12-09  112.309998  114.699997  112.309998  113.949997  34402600  113.458090
2016-12-12  113.290001  115.000000  112.489998  113.300003  26374400  112.810902
2016-12-13  113.839996  115.919998  113.750000  115.190002  43733800  114.692743
2016-12-14  115.040001  116.199997  114.980003  115.190002  34031800  114.692743
...                ...         ...         ...         ...       ...         ...
2017-04-13  141.910004  142.380005  141.050003  141.050003  17652900  141.050003
2017-04-17  141.479996  141.880005  140.869995  141.830002  16424000  141.830002
2017-04-18  141.410004  142.039993  141.110001  141.199997  14660800  141.199997
2017-04-19  141.880005  142.000000  140.449997  140.679993  17271300  140.679993
2017-04-20  141.220001  142.919998  141.160004  142.440002  23251100  142.440002
2017-04-21  142.440002  142.679993  141.850006  142.270004  17245200  142.270004
2017-04-24  143.500000  143.949997  143.179993  143.639999  17099200  143.639999
2017-04-25  143.910004  144.899994  143.869995  144.529999  18290300  144.529999
2017-04-26  144.470001  144.600006  143.380005  143.679993  19769400  143.679993
2017-04-27  143.919998  144.160004  143.309998  143.789993  14106100  143.789993

[101 rows x 6 columns]

Answer 3

回答by jezrael

For me it works nice:

对我来说它很好用：

rng = pd.date_range('2017-04-03', periods=10, freq='20D')
df = pd.DataFrame({'Date': rng, 'a': range(10)})  
print (df)
        Date  a
0 2017-04-03  0
1 2017-04-23  1
2 2017-05-13  2
3 2017-06-02  3
4 2017-06-22  4
5 2017-07-12  5
6 2017-08-01  6
7 2017-08-21  7
8 2017-09-10  8
9 2017-09-30  9

df = df.set_index('Date').last('5M')
print (df)
            a
Date         
2017-05-13  2
2017-06-02  3
2017-06-22  4
2017-07-12  5
2017-08-01  6
2017-08-21  7
2017-09-10  8
2017-09-30  9

It works nice with duplicates also, only is necessary sort DateTimecolumn:

它也适用于重复项，只有必要的排序DateTime列：

rng = pd.date_range('2017-04-03', periods=10, freq='20D')
df = pd.DataFrame({'Date': rng, 'a': range(10)})  
df = pd.concat([df,df], ignore_index=True).sort_values('Date')
print (df)
         Date  a
0  2017-04-03  0
10 2017-04-03  0
1  2017-04-23  1
11 2017-04-23  1
2  2017-05-13  2
12 2017-05-13  2
3  2017-06-02  3
13 2017-06-02  3
4  2017-06-22  4
14 2017-06-22  4
5  2017-07-12  5
15 2017-07-12  5
6  2017-08-01  6
16 2017-08-01  6
17 2017-08-21  7
7  2017-08-21  7
18 2017-09-10  8
8  2017-09-10  8
9  2017-09-30  9
19 2017-09-30  9

df = df.set_index('Date').last('5M')
print (df)
            a
Date         
2017-05-13  2
2017-05-13  2
2017-06-02  3
2017-06-02  3
2017-06-22  4
2017-06-22  4
2017-07-12  5
2017-07-12  5
2017-08-01  6
2017-08-01  6
2017-08-21  7
2017-08-21  7
2017-09-10  8
2017-09-10  8
2017-09-30  9
2017-09-30  9

rng = pd.date_range('2017-04-03', periods=10, freq='20D')
df = pd.DataFrame({'Date': rng, 'a': range(10)})  
df = pd.concat([df,df], ignore_index=True)
print (df)
         Date  a
0  2017-04-03  0
1  2017-04-23  1
2  2017-05-13  2
3  2017-06-02  3
4  2017-06-22  4
5  2017-07-12  5
6  2017-08-01  6
7  2017-08-21  7
8  2017-09-10  8
9  2017-09-30  9
10 2017-04-03  0
11 2017-04-23  1
12 2017-05-13  2
13 2017-06-02  3
14 2017-06-22  4
15 2017-07-12  5
16 2017-08-01  6
17 2017-08-21  7
18 2017-09-10  8
19 2017-09-30  9

df = df.set_index('Date').last('5M')
print (df)
            a
Date         
2017-05-13  2
2017-06-02  3
2017-06-22  4
2017-07-12  5
2017-08-01  6
2017-08-21  7
2017-09-10  8
2017-09-30  9

如何获取 Pandas DataFrame 最近 5 个月的数据？

提问by Leonel

采纳答案by Leonel

回答by MaxU

回答by jezrael

相关推荐

最近更新

标签

如何获取 Pandas DataFrame 最近 5 个月的数据？

提问by Leonel

采纳答案by Leonel

回答by MaxU

回答by jezrael

相关推荐

Pandas 月度滚动操作

Pandas 删除列包含 * 的行

pandas 如何创建 python 空数据帧，其中 df.empty 结果为 True

pandas 熊猫缺少 x 刻度标签

相关推荐

最近更新

标签