Python Pandas:使用范围内的随机整数在 df 中创建新列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/30327417/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas: create new column in df with random integers from range
提问by screechOwl
I have a pandas data frame with 50k rows. I'm trying to add a new column that is a randomly generated integer from 1 to 5.
我有一个包含 50k 行的 Pandas 数据框。我正在尝试添加一个新列,它是一个从 1 到 5 的随机生成的整数。
If I want 50k random numbers I'd use:
如果我想要 50k 个随机数,我会使用:
df1['randNumCol'] = random.sample(xrange(50000), len(df1))
but for this I'm not sure how to do it.
但为此,我不知道该怎么做。
Side note in R, I'd do:
R 中的旁注,我会这样做:
sample(1:5, 50000, replace = TRUE)
Any suggestions?
有什么建议?
采纳答案by Matt
One solution is to use numpy.random.randint
:
一种解决方案是使用numpy.random.randint
:
import numpy as np
df1['randNumCol'] = np.random.randint(1, 6, df1.shape[0])
Or if the numbers are non-consecutive (albeit slower), you can use this:
或者,如果数字不连续(虽然速度较慢),您可以使用:
df1['randNumCol'] = np.random.choice([1, 9, 20], df1.shape[0])
In order to make the results reproducible you can set the seed with numpy.random.seed
(e.g. np.random.seed(42)
)
为了使结果可重复,您可以使用numpy.random.seed
(例如np.random.seed(42)
)设置种子
回答by smci
To add a column of random integers, use randint(low, high, size)
. There's no need to waste memory allocating range(low, high)
; that could be a lot of memory if high
is large.
要添加一列随机整数,请使用randint(low, high, size)
。没有必要浪费内存分配range(low, high)
;如果很大,那可能是很多内存high
。
df1['randNumCol'] = np.random.randint(0,5, size=len(df1))
(Note also that when we're just adding a single column, size
is just an integer. In general if we want to generate an array/dataframe of randint()s
, size can be a tuple, as in Pandas: How to create a data frame of random integers?)
(还要注意,当我们只添加一列时,size
它只是一个整数。一般来说,如果我们想生成一个数组/数据帧randint()s
,大小可以是一个元组,如Pandas: How to create a data frame of random整数?)