Python 我应该使用“random.seed”还是“numpy.random.seed”来控制“scikit-learn”中的随机数生成?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/31057197/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Should I use `random.seed` or `numpy.random.seed` to control random number generation in `scikit-learn`?
提问by shadowtalker
I'm using scikit-learn and numpy and I want to set the global seed so that my work is reproducible.
我正在使用 scikit-learn 和 numpy,我想设置全局种子,以便我的工作可重复。
Should I use numpy.random.seed
or random.seed
?
我应该使用numpy.random.seed
还是random.seed
?
Edit:From the link in the comments, I understand that they are different, and that the numpy version is not thread-safe. I want to know specifically which one to use to create IPython notebooks for data analysis. Some of the algorithms from scikit-learn involve generating random numbers, and I want to be sure that the notebook shows the same results on every run.
编辑:从评论中的链接,我了解到它们是不同的,并且 numpy 版本不是线程安全的。我想具体知道使用哪个来创建 IPython notebooks 以进行数据分析。scikit-learn 的一些算法涉及生成随机数,我想确保笔记本在每次运行时显示相同的结果。
采纳答案by ali_m
Should I use np.random.seed or random.seed?
我应该使用 np.random.seed 还是 random.seed?
That depends on whether in your code you are using numpy's random number generator or the one in random
.
这取决于您在代码中使用的是 numpy 的随机数生成器还是random
.
The random number generators in numpy.random
and random
have totally separate internal states, so numpy.random.seed()
will not affect the random sequences produced by random.random()
, and likewise random.seed()
will not affect numpy.random.randn()
etc. If you are using both random
and numpy.random
in your code then you will need to separately set the seeds for both.
随机数生成器中numpy.random
,并random
具有完全独立的内部状态,所以numpy.random.seed()
不会影响所产生的随机序列random.random()
,同样random.seed()
不会影响numpy.random.randn()
如果你同时使用等random
,并numpy.random
在你的代码,那么你就需要分别设置两个种子。
Update
更新
Your question seems to be specifically about scikit-learn's random number generators. As far as I can tell, scikit-learn uses numpy.random
throughout, so you should use np.random.seed()
rather than random.seed()
.
您的问题似乎专门针对 scikit-learn 的随机数生成器。据我所知,scikit-learnnumpy.random
贯穿始终,所以你应该使用np.random.seed()
而不是random.seed()
.
One important caveat is that np.random
is not threadsafe - if you set a global seed, then launch several subprocesses and generate random numbers within them using np.random
, each subprocess will inherit the RNG state from its parent, meaning that you will get identical random variates in each subprocess. The usual way around this problem is to pass a different seed (or numpy.random.Random
instance) to each subprocess, such that each one has a separate local RNG state.
一个重要的警告是它np.random
不是线程安全的——如果你设置一个全局种子,然后启动几个子进程并使用它们在其中生成随机数np.random
,每个子进程都将从其父进程继承 RNG 状态,这意味着你将在每个子进程中获得相同的随机变量. 解决这个问题的常用方法是将不同的种子(或numpy.random.Random
实例)传递给每个子进程,这样每个子进程都有一个单独的本地 RNG 状态。
Since some parts of scikit-learn can run in parallel using joblib, you will see that some classes and functions have an option to pass either a seed or an np.random.RandomState
instance (e.g. the random_state=
parameter to sklearn.decomposition.MiniBatchSparsePCA
). I tend to use a single global seed for a script, then generate new random seeds based on the global seed for any parallel functions.
由于 scikit-learn 的某些部分可以使用 joblib 并行运行,您将看到某些类和函数可以选择传递种子或np.random.RandomState
实例(例如,random_state=
参数 to sklearn.decomposition.MiniBatchSparsePCA
)。我倾向于为脚本使用单个全局种子,然后根据任何并行函数的全局种子生成新的随机种子。