pandas 使用股票报价识别熊猫数据框中的价格波动/趋势

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/23614259/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:02:31  来源:igfitidea点击:

Identifying price swings/trends in pandas dataframe with stock quotes

pythonpandas

提问by Ovidiu Susan

I have a pandas Dataframe with DatetimeIndex and ohlcv stock quotes columns. I would like to extract price swings/trends that meet a certain threshold: up swings/trends/moves bigger than 0.3$ and down swings/trends/moves that go beyond -0.3$.

我有一个带有 DatetimeIndex 和 ohlcv 股票报价列的Pandas数据框。我想提取满足特定阈值的价格波动/趋势:大于 0.3 美元的向上波动/趋势/移动和超过 -0.3 美元的向下波动/趋势/移动。

df[:10]
                           close   high   low    open    volume
2014-05-09 09:30:00-04:00 187.5600 187.73 187.54 187.700 1922600
2014-05-09 09:31:00-04:00 187.4900 187.56 187.42 187.550 534400
2014-05-09 09:32:00-04:00 187.4200 187.51 187.35 187.490 224800
2014-05-09 09:33:00-04:00 187.5500 187.58 187.39 187.400 303700
2014-05-09 09:34:00-04:00 187.6700 187.67 187.53 187.560 438200
2014-05-09 09:35:00-04:00 187.6000 187.71 187.56 187.680 296400
2014-05-09 09:36:00-04:00 187.4100 187.67 187.38 187.600 329900
2014-05-09 09:37:00-04:00 187.3100 187.44 187.28 187.400 404000
2014-05-09 09:38:00-04:00 187.2600 187.37 187.26 187.300 912800
2014-05-09 09:39:00-04:00 187.2200 187.28 187.12 187.250 607700

After studying pandas documentation it looked like the Dataframe.apply() would be the approach, but I got stuck in building the function(s). As my coding abilities are limited overall I need a little help please.

在研究了 Pandas 文档后,看起来 Dataframe.apply() 将是一种方法,但我陷入了构建函数的困境。由于我的编码能力总体有限,我需要一些帮助。

global row_nr
row_nr = 1
def extract_swings()
    if row_nr == 1 : pivot = row.open ; row_nr += 1
    else : if (row.high-pivot) >= 0.3 : ????
    ... ????

df['swings'] = df.apply(extract_swings, axis=1)

The result should be this:

结果应该是这样的:

df['swings'][:10]
2014-05-09 09:30:00-04:00 NaN
2014-05-09 09:31:00-04:00 NaN
2014-05-09 09:32:00-04:00 -0.35
2014-05-09 09:33:00-04:00 NaN
2014-05-09 09:34:00-04:00 NaN
2014-05-09 09:35:00-04:00 0.36
2014-05-09 09:36:00-04:00 NaN
2014-05-09 09:37:00-04:00 NaN
2014-05-09 09:38:00-04:00 NaN
2014-05-09 09:39:00-04:00 -0.59

UPDATE: To avoid any confusion here is how the requested function should go through the dataframe:

更新:为了避免混淆,这里是请求的函数应该如何通过数据帧:

                           close    high   low    open    volume 
2014-05-09 09:30:00-04:00 187.5600 187.73 187.54 187.700 1922600
# this is the first line, first minute and we well take row.open 187.70 as \
# the starting point or first pivot
2014-05-09 09:31:00-04:00 187.4900 187.56 187.42 187.550 534400
# next minute we check if either (row.high - pivot) >= 0.3 or \
# (row.low-pivot) <= -0.3. Neither is true so nothing to do here.
2014-05-09 09:32:00-04:00 187.4200 187.51 187.35 187.490 224800
# next minute same check ... we see that row.low-pivot = -0.35. \
# We consider 187.35 a second pivot and the diff value -0.35 a first trend down
2014-05-09 09:33:00-04:00 187.5500 187.58 187.39 187.400 303700
# next minute we check if the identified trend/swing down goes further \
# down by having a row.low lower than previous row.low. If we would \
# have found here a new lower row.low that would be the second pivot \
# and we would forget about 187.35 as being a pivot ... and so on. \
# We don't see that on this row, instead we see prices are higher than \
# previous row, so we start checking the diff for a potential up trend \
# starting from second pivot 187.35. As long as we do not encounter a \
# higher high with over 0.3 above last pivot we are still within the identified down trend. 
2014-05-09 09:34:00-04:00 187.6700 187.67 187.53 187.560 438200
# we don't see a lower low to reconsider the second pivot neither \
# a (row.high- second_pivot) >= 0.3
2014-05-09 09:35:00-04:00 187.6000 187.71 187.56 187.680 296400
# here we see (row.high- second_pivot) = 0.36. We consider 187.71 as \
# a third_pivot and the diff value 0.36 as an up trend (from second pivot to here)
2014-05-09 09:36:00-04:00 187.4100 187.67 187.38 187.600 329900
# next minute we check if the identified trend/swing up goes further up \
# by having a row.high higher than third pivot. If we would have found here \
# a new higher row.high that would be the third pivot and we would forget \
# about 187.71 as being a pivot ... and so on. We don't see that on this row,\
# instead we see prices are lower than previous row, so we start \
# checking the diff for a potential down trend starting from third \
# pivot 187.71. As long as we do not encounter a lower low with \
# over 0.3 below last pivot we are still within the identified up trend.
2014-05-09 09:37:00-04:00 187.3100 187.44 187.28 187.400 404000
# we find here a (row.low - third_pivot) = 0.43 so we have identified \
# a new down trend starting from third pivot and now we have a potential\
# fourth pivot 187.28 
2014-05-09 09:38:00-04:00 187.2600 187.37 187.26 187.300 912800
# we find here a lower low so we don't consider 187.28 the fourth \
# pivot anymore but this lower low 187.26
2014-05-09 09:39:00-04:00 187.2200 187.28 187.12 187.250 607700
# we find here a lower low so we don't consider 187.26 the fourth pivot anymore \
# but this lower low 187.12. Being this the lowest low we consider this one \
# to be the fourth pivot and the diff 187.12-187.71=-0.59 as a downtrend with that value 

回答by Pawel Kozela

It's a bit tricky since you cannot mark a point as pivot until you find the next potential pivot (ie if you are in an upward trend, you can't say it's done until you find a low sufficiently low).

这有点棘手,因为在找到下一个潜在支点之前,您无法将一个点标记为支点(即,如果您处于上升趋势中,则在找到足够低的低点之前不能说它已经完成)。

This code does the trick - I've put your data in the tmpData.txt file for convenience, and get the desired result. Please check

这段代码可以解决问题 - 为方便起见,我已将您的数据放在 tmpData.txt 文件中,并获得所需的结果。请检查

def get_pivots():
    data = pd.DataFrame.from_csv('tmpData.txt')
    data['swings'] = np.nan

    pivot = data.irow(0).open
    last_pivot_id = 0
    up_down = 0

    diff = .3

    for i in range(0, len(data)):
        row = data.irow(i)

        # We don't have a trend yet
        if up_down == 0:
            if row.low < pivot - diff:
                data.ix[i, 'swings'] = row.low - pivot
                pivot, last_pivot_id = row.low, i
                up_down = -1
            elif row.high > pivot + diff:
                data.ix[i, 'swings'] = row.high - pivot
                pivot, last_pivot_id = row.high, i
                up_down = 1

        # Current trend is up
        elif up_down == 1:
            # If got higher than last pivot, update the swing
            if row.high > pivot:
                # Remove the last pivot, as it wasn't a real one
                data.ix[i, 'swings'] = data.ix[last_pivot_id, 'swings'] + (row.high - data.ix[last_pivot_id, 'high'])
                data.ix[last_pivot_id, 'swings'] = np.nan
                pivot, last_pivot_id = row.high, i
            elif row.low < pivot - diff:
                data.ix[i, 'swings'] = row.low - pivot
                pivot, last_pivot_id = row.low, i
                # Change the trend indicator
                up_down = -1

        # Current trend is down
        elif up_down == -1:
             # If got lower than last pivot, update the swing
            if row.low < pivot:
                # Remove the last pivot, as it wasn't a real one
                data.ix[i, 'swings'] = data.ix[last_pivot_id, 'swings'] + (row.low - data.ix[last_pivot_id, 'low'])
                data.ix[last_pivot_id, 'swings'] = np.nan
                pivot, last_pivot_id = row.low, i
            elif row.high > pivot - diff:
                data.ix[i, 'swings'] = row.high - pivot
                pivot, last_pivot_id = row.high, i
                # Change the trend indicator
                up_down = 1

    print data

Output:

输出:

date                  close  high    low     open    volume    swings                                            
2014-05-09 13:30:00  187.56  187.73  187.54  187.70  1922600     NaN
2014-05-09 13:31:00  187.49  187.56  187.42  187.55   534400     NaN
2014-05-09 13:32:00  187.42  187.51  187.35  187.49   224800   -0.35
2014-05-09 13:33:00  187.55  187.58  187.39  187.40   303700     NaN
2014-05-09 13:34:00  187.67  187.67  187.53  187.56   438200     NaN
2014-05-09 13:35:00  187.60  187.71  187.56  187.68   296400    0.36
2014-05-09 13:36:00  187.41  187.67  187.38  187.60   329900     NaN
2014-05-09 13:37:00  187.31  187.44  187.28  187.40   404000     NaN
2014-05-09 13:38:00  187.26  187.37  187.26  187.30   912800     NaN
2014-05-09 13:39:00  187.22  187.28  187.12  187.25   607700   -0.59

回答by tw0000

I updated the answer from @Pawel-Kozela to be compatible with the latest version of pandas, and added an easy way to pass column names.

我更新了@Pawel-Kozela 的答案以与最新版本的Pandas兼容,并添加了一种传递列名的简单方法。

def get_pivots(df, cols=['O','H','L', 'C']):

    df['swings'] = np.nan
    df.loc[df.index[0], 'swings'] = df.loc[df.index[0], cols[0]]
    df.loc[df.index[-1], 'swings'] = df.loc[df.index[-1], cols[0]]

    pivot = df.loc[df.index[0], cols[0]]
    df.loc[df.index[0], ]
    last_pivot_id = 0
    up_down = 0

    diff = .3

    for i, row in df.iterrows():

        # We don't have a trend yet
        if up_down == 0:
            if row[cols[2]] < pivot - diff:
                df.loc[i, 'swings'] = row[cols[2]] - pivot
                pivot, last_pivot_id = row[cols[2]], i
                up_down = -1
            elif row[cols[1]] > pivot + diff:
                df.loc[i, 'swings'] = row[cols[1]] - pivot
                pivot, last_pivot_id = row[cols[1]], i
                up_down = 1

        # Current trend is up
        elif up_down == 1:
            # If got higher than last pivot, update the swing
            if row[cols[1]] > pivot:
                # Remove the last pivot, as it wasn't a real one
                df.loc[i, 'swings'] = df.loc[i, 'swings']
                df.loc[last_pivot_id, 'swings'] = np.nan
                pivot, last_pivot_id = row[cols[1]], i
            elif row[cols[2]] < pivot - diff:
                df.loc[i, 'swings'] = row[cols[2]] - pivot
                pivot, last_pivot_id = row[cols[2]], i
                # Change the trend indicator
                up_down = -1

回答by Wessel dR

Updated the code of tw0000 as he had a little bug on the lines with 'O' instead of cols[0]

更新了 tw0000 的代码,因为他在用 'O' 而不是 cols[0] 的行上有一个小错误

def get_pivots(df, cols=['O','H','L', 'C']):

  df['swings'] = np.nan
  df.loc[df.index[0], 'swings'] = df.loc[df.index[0], cols[0]]
  df.loc[df.index[-1], 'swings'] = df.loc[df.index[-1], cols[0]]

  pivot = df.loc[df.index[0], cols[0]]
  df.loc[df.index[0], ]
  last_pivot_id = 0
  up_down = 0

  diff = .3

  for i, row in df.iterrows():

      # We don't have a trend yet
      if up_down == 0:
          if row[cols[2]] < pivot - diff:
              df.loc[i, 'swings'] = row[cols[2]] - pivot
              pivot, last_pivot_id = row[cols[2]], i
              up_down = -1
          elif row[cols[1]] > pivot + diff:
              df.loc[i, 'swings'] = row[cols[1]] - pivot
              pivot, last_pivot_id = row[cols[1]], i
              up_down = 1

      # Current trend is up
      elif up_down == 1:
          # If got higher than last pivot, update the swing
          if row[cols[1]] > pivot:
              # Remove the last pivot, as it wasn't a real one
              df.loc[i, 'swings'] = df.loc[i, 'swings']
              df.loc[last_pivot_id, 'swings'] = np.nan
              pivot, last_pivot_id = row[cols[1]], i
          elif row[cols[2]] < pivot - diff:
              df.loc[i, 'swings'] = row[cols[2]] - pivot
              pivot, last_pivot_id = row[cols[2]], i
              # Change the trend indicator
              up_down = -1

回答by acushner

how about, assuming you only care about highs for the moment:

怎么样,假设你现在只关心高点:

startPx = df.open.iloc[0]
level = ((df.high - startPx) / .3).astype(int)
df['swings'] = level - level.shift(1)

now, to find out what the differences are, you would just do something like:

现在,要找出不同之处,您只需执行以下操作:

changes = df[df.swings != 0]
diffs = changes.high - changes.open.shift(1)

回答by acushner

so i haven't tested this, but something like this will get you what you want. what happens if both the low < pivot - diffand high > pivot + diffin the same minute?

所以我还没有测试过这个,但是这样的事情会让你得到你想要的。如果low < pivot - diffhigh > pivot + diff在同一分钟内会发生什么?

def f(df):
    pivot = df.open.iloc[0]
    diff = .3
    def proc(ser):
        res = np.nan
        if ser.low < pivot - diff:
            res, pivot = ser.low - pivot, ser.low
        elif ser.high > pivot + diff:
            res, pivot = ser.high - pivot, ser.high
        return res

    df['swings'] = df.apply(proc, axis=1)