使用其他行中的值将函数应用于 Pandas 数据帧行

Question

提问by lukewitmer

I have a situation where I have a dataframe row to perform calculations with, and I need to use values in following (potentially preceding) rows to do these calculations (essentially a perfect forecast based on the real data set). I get each row from an earlier df.applycall, so I could pass the whole df along to the downstream objects, but that seems less than ideal based on the complexity of objects in my analysis.

我有一种情况，我有一个数据框行来执行计算，我需要使用后续（可能在前面）行中的值来进行这些计算（基本上是基于真实数据集的完美预测）。我从较早的df.apply调用中获取每一行，因此我可以将整个 df 传递给下游对象，但根据我分析中对象的复杂性，这似乎不太理想。

I found one closely related question and answer [1], but the problem is actually fundamentally different in the sense that I do not need the whole df for my calcs, simply the following xnumber of rows (which might matter for large dfs).

我发现了一个密切相关的问题和答案 [1]，但问题实际上是根本不同的，因为我不需要整个 df 来计算我的计算，只需要以下x行数（这对于大型 dfs 可能很重要）。

So, for example:

因此，例如：

df = pd.DataFrame([100, 200, 300, 400, 500, 600, 700, 800, 900, 1000], 
                  columns=['PRICE'])
horizon = 3

I need to access values in the following 3 (horizon) rows in my row-wise df.applycall. How can I get a naive forecast of the next 3 data points dynamically in my row-wise apply calcs? e.g. for row the first row, where the PRICEis 100, I need to use [200, 300, 400]as a forecast in my calcs.

我需要horizon在我的按行df.apply调用中访问以下 3 ( ) 行中的值。如何在我的按行应用计算中动态获得接下来 3 个数据点的天真预测？例如，对于第一行，其中PRICE是100，我需要[200, 300, 400]在我的计算中用作预测。

[1] apply a function to a pandas Dataframe whose returned value is based on other rows

[1]将函数应用于返回值基于其他行的 Pandas Dataframe

Answer 1

回答by lukewitmer

By getting the row's index inside of the df.applycall using row.name[1], you can generate the 'forecast' data relative to which row you are currently on. This is effectively a preprocessing step to put the 'forecast' onto the relevant row, or it could be done as part of the initial df.applycall if the df is available downstream.

通过df.apply使用row.name[1]在调用中获取行的索引，您可以生成与当前所在行相关的“预测”数据。这实际上是将“预测”放在相关行上的预处理步骤，或者df.apply如果 df 在下游可用，则它可以作为初始调用的一部分完成。

df = pd.DataFrame([100, 200, 300, 400, 500, 600, 700, 800, 900, 1000], columns=['PRICE'])
horizon = 3

df['FORECAST'] = df.apply(lambda x: [df['PRICE'][x.name+1:x.name+horizon+1]], axis=1)

Results in this:

结果如下：

   PRICE          FORECAST
0    100   [200, 300, 400]
1    200   [300, 400, 500]
2    300   [400, 500, 600]
3    400   [500, 600, 700]
4    500   [600, 700, 800]
5    600   [700, 800, 900]
6    700  [800, 900, 1000]
7    800       [900, 1000]
8    900            [1000]
9   1000                []

Which can be used in your row-wise df.applycalcs.

可以在您的行式df.apply计算中使用。

EDIT:If you want to strip the index from the resulting 'Forecast':

编辑：如果您想从结果“预测”中删除索引：

df['FORECAST'] = df.apply(lambda x: [df['PRICE'][x.name+1:x.name+horizon+1].reset_index(drop=True)], axis=1)

[1] getting the index of a row in a pandas apply function

[1]在pandas apply函数中获取一行的索引

Answer 2

回答by piRSquared

You may find this useful as well.

您可能会发现这也很有用。

keys = range(horizon + 1)
pd.concat([df.shift(-i) for i in keys], axis=1, keys=keys)

      0       1       2       3
  PRICE   PRICE   PRICE   PRICE
0   100   200.0   300.0   400.0
1   200   300.0   400.0   500.0
2   300   400.0   500.0   600.0
3   400   500.0   600.0   700.0
4   500   600.0   700.0   800.0
5   600   700.0   800.0   900.0
6   700   800.0   900.0  1000.0
7   800   900.0  1000.0     NaN
8   900  1000.0     NaN     NaN
9  1000     NaN     NaN     NaN

if you assign the concatto df_c

如果你分配concat给df_c

keys = range(horizon + 1)
df_c = pd.concat([df.shift(-i) for i in keys], axis=1, keys=keys)

df_c.apply(lambda x: pd.Series([x[0].values, x[1:].values]), axis=1)

          0                       1
0   [100.0]   [200.0, 300.0, 400.0]
1   [200.0]   [300.0, 400.0, 500.0]
2   [300.0]   [400.0, 500.0, 600.0]
3   [400.0]   [500.0, 600.0, 700.0]
4   [500.0]   [600.0, 700.0, 800.0]
5   [600.0]   [700.0, 800.0, 900.0]
6   [700.0]  [800.0, 900.0, 1000.0]
7   [800.0]    [900.0, 1000.0, nan]
8   [900.0]      [1000.0, nan, nan]
9  [1000.0]         [nan, nan, nan]

使用其他行中的值将函数应用于 Pandas 数据帧行

提问by lukewitmer

回答by lukewitmer

回答by piRSquared

相关推荐

最近更新

标签

使用其他行中的值将函数应用于 Pandas 数据帧行

提问by lukewitmer

回答by lukewitmer

回答by piRSquared

相关推荐

如果一个值是 NaN，Pandas 用 NaN 替换一行中的所有项目

如何在 Pandas 数据框中提取元组值以使用 matplotlib？

零值的 Pandas groupby

pandas 如何从熊猫数据帧计算 jaccard 相似度

相关推荐

最近更新

标签