Pandas random_state 究竟是做什么的？

Question

提问by Newskooler

I have the following code where I use the Pandas random_state

我有以下代码，我使用 Pandas random_state

randomState = 123
sampleSize = 750
df = pd.read_csv(filePath, delim_whitespace=True)
df_s = df.sample(n=sampleSize, random_state=randomState)

This generates a sample dataframe df_s. Every time I run the code with the same randomState, I get the same sample df_s. When I change the value from 123to 12the sample changes as well, so I guess that's what the random_statedoes.

这会生成一个示例数据帧df_s。每次我用相同的代码运行代码时randomState，我都会得到相同的样本df_s。当我从改变值123，以12样品的变化一样，所以我想这是什么random_state呢。

My silly question: How do the number change affect the sample change? I read the Pandas documentationand the Numpy documentation, but could not get a clear picture.

我的愚蠢问题：数量变化如何影响样本变化？我阅读了Pandas 文档和Numpy 文档，但无法获得清晰的画面。

Any straight forward explanation with an example will be much appreciated.

任何带有示例的直接解释将不胜感激。

Answer 1

采纳答案by jotasi

As described in the documentation of pandas.DataFrame.sample, the random_stateparameter accepts either an integer (as in your case) or a numpy.random.RandomState, which is a container for a Mersenne Twister pseudo random number generator.

如的文档中所述pandas.DataFrame.sample，该random_state参数接受整数（如您的情况）或 a numpy.random.RandomState，它是 Mersenne Twister 伪随机数生成器的容器。

If you pass it an integer, it will use this as a seedfor a pseudo random number generator. As the name already says, the generator does not produce true randomness. It rather has an internal state (that you can get by calling np.random.get_state()) which is initialized based on a seed. When initialized by the same seed, it will reproduce the same sequence of "random numbers".

如果你传递给它一个整数，它会使用它作为伪随机数生成器的种子。顾名思义，生成器不会产生真正的随机性。它有一个np.random.get_state()基于种子初始化的内部状态（您可以通过调用获得）。当由相同的种子初始化时，它将重现相同的“随机数”序列。

If you pass it a RandomState it will use this (already initialized/seeded) RandomState to generate pseudo random numbers. This also allows you to get reproducible results by setting a fixed seed when initializing the RandomState and then passing this RandomState around. Actually you should prefer this over setting the seed of numpys internal RandomState. The reasoning being explained in this answerby Robert Kern and the comments to it. The idea is to have an independent stream which prevents other parts of the program to mess up your reproducibility by changing the seed of numpys internal RandomState.

如果你传递给它一个 RandomState，它将使用这个（已经初始化/种子化的）RandomState 来生成伪随机数。这还允许您通过在初始化 RandomState 时设置固定种子然后传递此 RandomState 来获得可重复的结果。实际上你应该更喜欢这个而不是设置 numpys 内部 RandomState 的种子。罗伯特·克恩 (Robert Kern)在此回答中解释的推理及其评论。这个想法是有一个独立的流，通过改变 numpys 内部 RandomState 的种子来防止程序的其他部分弄乱你的可重复性。

Pandas random_state 究竟是做什么的？

提问by Newskooler

采纳答案by jotasi

相关推荐

最近更新

标签

Pandas random_state 究竟是做什么的？

提问by Newskooler

采纳答案by jotasi

相关推荐

pandas 重命名没有列名的熊猫数据框的列

pandas AttributeError:'list' 对象没有属性 'size'

pandas 如何将熊猫数据框转换为一维数组？

在 Pandas 中向现有数据帧添加新行时出错

相关推荐

最近更新

标签