Python 我应该使用“random.seed”还是“numpy.random.seed”来控制“scikit-learn”中的随机数生成？

Question

提问by shadowtalker

I'm using scikit-learn and numpy and I want to set the global seed so that my work is reproducible.

我正在使用 scikit-learn 和 numpy，我想设置全局种子，以便我的工作可重复。

Should I use numpy.random.seedor random.seed?

我应该使用numpy.random.seed还是random.seed？

Edit:From the link in the comments, I understand that they are different, and that the numpy version is not thread-safe. I want to know specifically which one to use to create IPython notebooks for data analysis. Some of the algorithms from scikit-learn involve generating random numbers, and I want to be sure that the notebook shows the same results on every run.

编辑：从评论中的链接，我了解到它们是不同的，并且 numpy 版本不是线程安全的。我想具体知道使用哪个来创建 IPython notebooks 以进行数据分析。scikit-learn 的一些算法涉及生成随机数，我想确保笔记本在每次运行时显示相同的结果。

Answer 1

采纳答案by ali_m

Should I use np.random.seed or random.seed?

我应该使用 np.random.seed 还是 random.seed？

That depends on whether in your code you are using numpy's random number generator or the one in random.

这取决于您在代码中使用的是 numpy 的随机数生成器还是random.

The random number generators in numpy.randomand randomhave totally separate internal states, so numpy.random.seed()will not affect the random sequences produced by random.random(), and likewise random.seed()will not affect numpy.random.randn()etc. If you are using both randomand numpy.randomin your code then you will need to separately set the seeds for both.

随机数生成器中numpy.random，并random具有完全独立的内部状态，所以numpy.random.seed()不会影响所产生的随机序列random.random()，同样random.seed()不会影响numpy.random.randn()如果你同时使用等random，并numpy.random在你的代码，那么你就需要分别设置两个种子。

Update

更新

Your question seems to be specifically about scikit-learn's random number generators. As far as I can tell, scikit-learn uses numpy.randomthroughout, so you should use np.random.seed()rather than random.seed().

您的问题似乎专门针对 scikit-learn 的随机数生成器。据我所知，scikit-learnnumpy.random贯穿始终，所以你应该使用np.random.seed()而不是random.seed().

One important caveat is that np.randomis not threadsafe - if you set a global seed, then launch several subprocesses and generate random numbers within them using np.random, each subprocess will inherit the RNG state from its parent, meaning that you will get identical random variates in each subprocess. The usual way around this problem is to pass a different seed (or numpy.random.Randominstance) to each subprocess, such that each one has a separate local RNG state.

一个重要的警告是它np.random不是线程安全的——如果你设置一个全局种子，然后启动几个子进程并使用它们在其中生成随机数np.random，每个子进程都将从其父进程继承 RNG 状态，这意味着你将在每个子进程中获得相同的随机变量. 解决这个问题的常用方法是将不同的种子（或numpy.random.Random实例）传递给每个子进程，这样每个子进程都有一个单独的本地 RNG 状态。

Since some parts of scikit-learn can run in parallel using joblib, you will see that some classes and functions have an option to pass either a seed or an np.random.RandomStateinstance (e.g. the random_state=parameter to sklearn.decomposition.MiniBatchSparsePCA). I tend to use a single global seed for a script, then generate new random seeds based on the global seed for any parallel functions.

由于 scikit-learn 的某些部分可以使用 joblib 并行运行，您将看到某些类和函数可以选择传递种子或np.random.RandomState实例（例如，random_state=参数 to sklearn.decomposition.MiniBatchSparsePCA）。我倾向于为脚本使用单个全局种子，然后根据任何并行函数的全局种子生成新的随机种子。

Python 我应该使用“random.seed”还是“numpy.random.seed”来控制“scikit-learn”中的随机数生成？

提问by shadowtalker

采纳答案by ali_m

Update

更新

相关推荐

最近更新

标签

Python 我应该使用“random.seed”还是“numpy.random.seed”来控制“scikit-learn”中的随机数生成？

提问by shadowtalker

采纳答案by ali_m

Update

更新

相关推荐

Python 类型错误：不能在 re.findall() 中的类似字节的对象上使用字符串模式

python中的海龟-试图让海龟移动到鼠标点击位置并打印其坐标

Python if not == vs if !=

Python 使用 Numpy 查找输入数字集的均值、中值、众数或范围

相关推荐

最近更新

标签