python 预测事件顺序的机器学习算法?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2524608/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-04 00:50:35  来源:igfitidea点击:

Machine Learning Algorithm for Predicting Order of Events?

pythoncompressionmachine-learningneural-networkevolutionary-algorithm

提问by user213060

Simple machine learning question. Probably numerous ways to solve this:

简单的机器学习问题。可能有很多方法可以解决这个问题:

There is an infinitestream of 4 possible events:

有4 种可能的事件的无限流:

'event_1', 'event_2', 'event_4', 'event_4'

'event_1', 'event_2', 'event_4', 'event_4'

The events do not come in in completely random order. We will assume that there are some complex patterns to the order that most events come in, and the rest of the events are just random. We do not know the patterns ahead of time though.

事件不会以完全随机的顺序出现。我们将假设大多数事件进入的顺序有一些复杂的模式,其余的事件只是随机的。虽然我们不知道提前的模式。

After each event is received, I want to predict what the next event will be based on the order that events have come in in the past. So my question is: What machine learning algorithm should I use for this predictor?

收到每个事件后,我想根据过去事件发生的顺序来预测下一个事件是什么。所以我的问题是:我应该为这个预测器使用什么机器学习算法?

The predictor will then be told what the next event actually was:

然后预测器将被告知下一个事件实际上是什么:

Predictor=new_predictor()

prev_event=False
while True:
    event=get_event()
    if prev_event is not False:
        Predictor.last_event_was(prev_event)
    predicted_event=Predictor.predict_next_event(event)

The question arises of how long of a history that the predictor should maintain, since maintaining infinite history will not be possible. I'll leave this up to you to answer. The answer can't be infinte though for practicality.

问题是预测器应该保持多长时间的历史,因为不可能保持无限的历史。我会把这个留给你来回答。尽管出于实用性,答案不可能是无限的。

So I believe that the predictions will have to be done with some kind of rolling history. Adding a new event and expiring an old event should therefore be rather efficient, and not require rebuilding the entire predictor model, for example.

所以我相信预测必须通过某种滚动历史来完成。因此,添加新事件并使旧事件过期应该相当有效,并且不需要重建整个预测模型,例如。

Specific code, instead of research papers, would add for me immense valueto your responses. Python or C libraries are nice, but anything will do.

特定的代码,而不是研究论文,会对我的回答增加巨大的价值。Python 或 C 库很好,但什么都行。

Update:And what if more than one event can happen simultaneously on each round. Does that change the solution?

更新:如果每一轮可以同时发生多个事件怎么办。这会改变解决方案吗?

回答by bayer

This is essentially a sequence prediction problem, so you want Recurrent neural networks or hidden Markov models.

这本质上是一个序列预测问题,所以你需要循环神经网络或隐马尔可夫模型。

If you only have a fixed time to look back, time window approaches might suffice. You take the sequence data and split it into overlapping windows of length n. (eg. you split a sequence ABCDEFG into ABC, BCD, CDE, DEF, EFG). Then you train a function approximator (e.g. neural network or linear regression) to map the first n-1 parts of that window onto the nth part.

如果您只有固定的时间回顾,时间窗口法可能就足够了。您获取序列数据并将其拆分为长度为 n 的重叠窗口。(例如,您将序列 ABCDEFG 拆分为 ABC、BCD、CDE、DEF、EFG)。然后训练函数逼近器(例如神经网络或线性回归)将该窗口的前 n-1 部分映射到第 n 部分。

Your predictor will not be able to look back in time longer than the size of your window. RNNs and HMMs can do so in theory, but are hard to tune or sometimes just don't work.

您的预测器将无法回溯时间超过您的窗口大小。RNNs 和 HMMs 理论上可以这样做,但很难调整或者有时根本不起作用。

(State of the art RNN implementations can be found in PyBrain http://pybrain.org)

(最先进的 RNN 实现可以在 PyBrain http://pybrain.org 中找到)

Update: Here is the pybrain code for your problem. (I haven't tested it, there might be some typos and stuff, but the overall structure should work.)

更新:这是您的问题的 pybrain 代码。(我还没有测试过,可能有一些错别字和东西,但整体结构应该可以工作。)

from pybrain.datasets import SequentialDataSet
from pybrain.supervised.trainers import BackpropTrainer
from pybrain.tools.shortcuts import buildNetwork
from pybrain.structure import SigmoidLayer

INPUTS = 4
HIDDEN = 10
OUTPUTS = 4

net = buildNetwork(INPUTS, HIDDEN, OUTPUTS, hiddenclass=LSTMLayer, outclass=SigmoidLayer, recurrent=True)

ds = SequentialDataSet(INPUTS, OUTPUTS)

# your_sequences is a list of lists of tuples which each are a bitmask
# indicating the event (so 1.0 at position i if event i happens, 0.0 otherwise)

for sequence in your_sequences:
    for (inpt, target) in zip(sequence, sequence[1:]):
        ds.newSequence()
        ds.appendLinked(inpt, target)

net.randomize()

trainer = BackpropTrainer(net, ds, learningrate=0.05, momentum=0.99)
for _ in range(1000):
    print trainer.train()

This will train the recurrent network for 1000 epochs and print out the error after every epochs. Afterwards you can check for correct predictions like this:

这将训练循环网络 1000 个 epochs,并在每个 epochs 后打印出错误。之后,您可以像这样检查正确的预测:

net.reset()
for i in sequence:
  next_item = net.activate(i) > 0.5
  print next_item

This will print an array of booleans for every event.

这将为每个事件打印一组布尔值。

回答by mjv

Rather than keeping a full history, one can keep aggregated informationabout the past (along with a relatively short sliding history, to be used as input to the Predictor logic).

可以保留有关过去的汇总信息,而不是保留完整的历史记录(以及相对较短的滑动历史记录,用作预测器逻辑的输入)。

A tentative implementation could go like this:
In a nutshell: Managing a set of Markov chains of increasing order, and gradingand averagingtheir predictions

一个尝试性的实现可能是这样的:
简而言之:管理一组递增顺序的马尔可夫链,并对它们的预测进行分级平均

  • keep a table of individual event counts, the purpose is to calculate the probability of any of the 4 different events, without regards to any sequence.
  • keep a table of bigram counts, i.e. a cumulative count the events observed [so far]
    Table starts empty, upon the second event observe, we can store the first bigram, with a count of 1. upond the third event, the bigram made of the 2nd and 3rd events is "added" to the table: either incrementing the count of an existing bigram or added with original count 1, as a new (never-seen-so-far) bigram. etc.
    In parallel, keep a total count of bigrams in the table.
    This table and the total tally allow calculating the probability of a given event, based on the one preceding event.
  • In a similar fashion keep a table of trigram counts, and a running tally of total trigram seen (note that this would be equal to the number of bigrams, minus one, since the first trigram is added one event after the first bigram, and after that one of each is added with each new event). This trigram table allows calculating the probability of a given event based on the two preceding events.
  • likewise, keep tables for N-Grams, up to, say, 10-grams (the algorithm will tell if we need to increase or decrease this).
  • keep an sliding windows into the last 10 events.
  • The above tables provide the basis for prediction; the general idea are to:
    • use a formula which expresses the probabilities of the next event as a weighed average of the individual probabilities based on the different N-grams.
    • reward the better individual N-gram length by increasing the corresponding weight in the formula; punish the worse lengths in the reverse fashion. (Beware the marginal probability of individual events needs to be taken into account lest we favor N-grams which happen to predict the most frequent events, regardless of the relative poor predicting value associated with them them)
    • Once the system has "seen" enough events, see the current values for the weights associated with the long N-Grams, and if these are relatively high, consider adding tables to keep aggregate info about bigger N-Grams. (This unfortunately hurts the algorightm both in terms of space and time)
  • 保留单个事件计数表,目的是计算 4 个不同事件中任何一个的概率,不考虑任何顺序。
  • 保留一个二元组计数表,即观察到的事件的累积计数 [到目前为止]
    表开始为空,在第二个事件观察时,我们可以存储第一个二元组,计数为 1。在第三个事件时,二元组由第二个和第三个事件被“添加”到表中:要么增加现有二元组的计数,要么添加原始计数 1,作为新的(迄今为止从未见过的)二元组。等等
    。同时,保持表中二元组的总数。
    该表和总计数允许根据前一个事件计算给定事件的概率。
  • 以类似的方式保留一个三元组计数表,以及看到的总三元组的运行记录(请注意,这将等于二元组的数量减一,因为第一个三元组在第一个二元组之后添加一个事件,并且在每个新事件都会添加其中一个)。此三元组表允许根据前两个事件计算给定事件的概率。
  • 同样,保留 N-Grams 的表格,最多为 10-grams(算法会告诉我们是否需要增加或减少它)。
  • 保持滑动窗口进入最近 10 个事件。
  • 上表为预测提供了依据;总体思路是:
    • 使用一个公式将下一个事件的概率表示为基于不同 N-gram 的单个概率的加权平均值。
    • 通过增加公式中的相应权重来奖励更好的个体 N-gram 长度;以相反的方式惩罚更糟糕的长度。(注意需要考虑单个事件的边际概率,以免我们偏爱恰好预测最频繁事件的 N-gram,而不管与它们相关的相对较差的预测值)
    • 一旦系统“看到”了足够多的事件,请查看与长 N-Gram 相关的权重的当前值,如果这些值相对较高,请考虑添加表格以保留有关更大 N-Gram 的汇总信息。(不幸的是,这在空间和时间方面都损害了算法)

There can be several variations on the general logic described above. In particular in the choice of the particular metric used to "grade" the quality of prediction of the individual N-Gram lengths.
Other considerations should be put with regards to detecting and adapting to possible shifts in the events distribution(the above assumes a generally ergodic event source). One possible approach is to use two sets of tables (combining the probabilities accordingly), and periodically dropping the contents of all tables of one of the sets. Choosing the right period for these resets is a tricky business, essentially balancing the need for statistically significant volumes of history and the need for short enough period lest me miss on the shorter modulations...

上面描述的一般逻辑可以有多种变化。特别是在选择用于“分级”单个 N-Gram 长度的预测质量的特定度量时。
检测和适应事件分布中可能发生的变化时,还应考虑其他因素(上述假设通常是遍历事件源)。一种可能的方法是使用两组表(相应地组合概率),并定期删除其中一组的所有表的内容。为这些重置选择正确的时间段是一项棘手的工作,从本质上平衡了对具有统计意义的大量历史记录的需求和足够短的时间段的需求,以免我错过更短的调制......

回答by Nathan

Processors use a few really lightweight tricks to predict whether a branch statement will branch or not. This helps them with efficient pipe-lining. They may not be as general as Markov models for instance, but they are interesting because of their simplicity. Here is the Wikipedia article on branch prediction. See the Saturating Counter, and the Two-Level Adaptive Predictor

处理器使用一些非常轻量级的技巧来预测分支语句是否会分支。这有助于他们实现高效的管道内衬。例如,它们可能不像马尔可夫模型那样通用,但它们很有趣,因为它们很简单。这是关于分支预测的维基百科文章。请参阅饱和计数器两级自适应预测器

回答by gingerbreadboy

The question arises of how long of a history that the predictor should maintain

问题是预测器应该保持多长时间的历史

The only answer is "it depends".

唯一的答案是“视情况而定”。

It depends on how accurate this needs to be. I don't believe this strategy could ever be 100% accurate even with an infinite history. Try out a history of 10 and you'll get x% accuracy, then try 100 and you'll get y% accuracy, etc etc...

这取决于这需要有多准确。我不相信即使有无限的历史,这个策略也不会是 100% 准确的。尝试 10 的历史记录,您将获得 x% 的准确度,然后尝试 100,您将获得 y% 的准确度,等等......

Eventually you should find either the system is as accurate as you desire it to be or you will find the rise in accuracy will not be worth the increase in history length (and increased memory usage, processing time etc...). At this point either job done, or you need to find a new strategy.

最终,您应该会发现系统要么与您希望的一样准确,要么您会发现准确性的提高不值得增加历史长度(以及增加的内存使用量、处理时间等......)。此时要么工作完成,要么需要寻找新的策略。

For what it is worth i think looking into a simple "soft" neural net might be a better plan.

对于值得的,我认为研究一个简单的“软”神经网络可能是一个更好的计划。

回答by rlb.usa

We just studied about branch-predictorsin computer architecture (Because the processor would take too long to actually evaluate a condition if(EXPRESSION), it tries to 'guess' and save some time that way). I am sure more research has been done in this area, but that's all I can think of at the moment.

我们刚刚研究了计算机体系结构中的分支预测器(因为处理器需要很长时间才能实际评估条件 if(EXPRESSION),它会尝试“猜测”并以这种方式节省一些时间)。我相信在这方面已经做了更多的研究,但目前我能想到的就这些。

I haven't seen a unique setup like yours, so I think you might need to do some preliminary experimentation on your own. Try running your solution for X number of seconds with a history of N slots, what is the correctness ratio? And compare that with the same fixed X and varying N history slots to try to find the best memory-history ratio (graphing them out ).

我还没有看到像你这样独特的设置,所以我认为你可能需要自己做一些初步的实验。尝试使用 N 个插槽的历史运行您的解决方案 X 秒,正确率是多少?并将其与相同的固定 X 和不同的 N 历史插槽进行比较,以尝试找到最佳内存历史比率(将它们绘制出来)。

If more than one event can happen simulataneously... that's a little mind bending, there has to be some constraints there : what if infinite number of events happen at a time? Uhoh, that's computationally impossible for you. I'd try the same approach as just one event at a time, except where the predictor is enabled predict multiple events at a time.

如果可以同时发生多个事件……这有点令人费解,那里必须有一些限制:如果一次发生无限多个事件怎么办?呃,这对你来说在计算上是不可能的。我会尝试与一次只发生一个事件相同的方法,除非启用了预测器一次预测多个事件。