如何在 SQL Server 表中插入随机值?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1468159/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How can I insert random values into a SQL Server table?
提问by Dan Herbert
I'm trying to randomly insert values from a list of pre-defined values into a table for testing. I tried using the solution found on this StackOverflow question:
我正在尝试将预定义值列表中的值随机插入到表中进行测试。我尝试使用在此 StackOverflow 问题上找到的解决方案:
stackoverflow.com/.../update-sql-table-with-random-value-from-other-table
stackoverflow.com/.../update-sql-table-with-random-value-from-other-table
When I I tried this, all of my "random" values that are inserted are exactly the same for all 3000 records.
当我尝试这个时,我插入的所有“随机”值对于所有 3000 条记录都完全相同。
When I run the part of the query that actually selects the random row, it does select a random record every time I run it by hand, so I know the query works. My best guesses as to what is happening are:
当我运行实际选择随机行的查询部分时,每次我手动运行它时它都会选择一条随机记录,所以我知道查询有效。我对正在发生的事情的最佳猜测是:
- SQL Server is optimizing the
SELECT
somehow, not allowing the subquery to be evaluated more than once - The random value's seed is the same on every record the query updates
- SQL Server 正在以
SELECT
某种方式优化,不允许多次评估子查询 - 随机值的种子在查询更新的每条记录上都是相同的
I'm stuck on what my options are. Am I doing something wrong, or is there another way I should be doing this?
我被困在我的选择上。我做错了什么,还是我应该这样做的另一种方式?
This is the code I'm using:
这是我正在使用的代码:
DECLARE @randomStuff TABLE ([id] INT, [val] VARCHAR(100))
INSERT INTO @randomStuff ([id], [val])
VALUES ( 1, 'Test Value 1' )
INSERT INTO @randomStuff ([id], [val])
VALUES ( 2, 'Test Value 2' )
INSERT INTO @randomStuff ([id], [val])
VALUES ( 3, 'Test Value 3' )
INSERT INTO @randomStuff ([id], [val])
VALUES ( 4, 'Test Value 4' )
INSERT INTO @randomStuff ([id], [val])
VALUES ( 5, 'Test Value 5' )
INSERT INTO @randomStuff ([id], [val])
VALUES ( 6, null )
INSERT INTO @randomStuff ([id], [val])
VALUES ( 7, null )
INSERT INTO @randomStuff ([id], [val])
VALUES ( 8, null )
INSERT INTO @randomStuff ([id], [val])
VALUES ( 9, null )
INSERT INTO @randomStuff ([id], [val])
VALUES ( 10, null )
UPDATE MyTable
SET MyColumn = (SELECT TOP 1 [val] FROM @randomStuff ORDER BY NEWID())
回答by Alex Papadimoulis
When the query engine sees this...
当查询引擎看到这个...
(SELECT TOP 1 [val] FROM @randomStuff ORDER BY NEWID())
... it's all like, "ooooh, a cachable scalar subquery, I'm gonna cache that!"
...就像,“哦,一个可缓存的标量子查询,我要缓存它!”
You need to trick the query engine into thinking it's non-cachable. jfar's answerwas close, but the query engine was smart enough to see the tautalogy of MyTable.MyColumn = MyTable.MyColumn
, but it ain't smart enough to see through this.
您需要欺骗查询引擎认为它是不可缓存的。jfar 的回答很接近,但查询引擎足够聪明,可以看到 的重言式MyTable.MyColumn = MyTable.MyColumn
,但它不够聪明,无法看透这个。
UPDATE MyTable
SET MyColumn = (SELECT TOP 1 val
FROM @randomStuff r
INNER JOIN MyTable _MT
ON M.Id = _MT.Id
ORDER BY NEWID())
FROM MyTable M
By bringing in the outer table (MT) into the subquery, the query engine assumes subquery will need to be re-evaluated. Anything will work really, but I went with the (assumed) primary key of MyTable.Id since it'd be indexed and would add very little overhead.
通过将外部表 (MT) 引入子查询,查询引擎假定子查询需要重新评估。任何东西都可以真正起作用,但是我使用了 MyTable.Id 的(假定的)主键,因为它会被索引并且会增加很少的开销。
A cursor would probably be just as fast, but is most certainly not as fun.
游标可能同样快,但肯定没有那么有趣。
回答by Dan Herbert
use a cross join to generate random data
使用交叉连接生成随机数据
回答by Cowan
I've had a play with this, and found a rather hacky way to do it with the use of an intermediate table variable.
我玩过这个,并找到了一种使用中间表变量的相当笨拙的方法。
Once @randomStuff is set up, we do this (note in my case, @MyTable is a table variable, adjust accordingly for your normal table):
设置@randomStuff 后,我们执行此操作(注意在我的情况下,@MyTable 是一个表变量,请根据您的普通表进行相应调整):
DECLARE @randomMappings TABLE (id INT, val VARCHAR(100), sorter UNIQUEIDENTIFIER)
INSERT INTO @randomMappings
SELECT M.id, val, NEWID() AS sort
FROM @MyTable AS M
CROSS JOIN @randomstuff
so at this point, we have an intermediate table with every combination of (mytable id, random value), and a random sort value for each row specific to that combination. Then
所以在这一点上,我们有一个中间表,其中包含 (mytable id, random value) 的每个组合,以及特定于该组合的每一行的随机排序值。然后
DELETE others FROM @randomMappings AS others
INNER JOIN @randomMappings AS lower
ON (lower.id = others.id) AND (lower.sorter < others.sorter)
This is an old trick which deletes all rows for a given MyTable.id except for the one with the lower sort value -- join the table to itself where the value is smaller, and delete any where such a join succeeded. This just leaves behind the lowest value. So for each MyTable.id, we just have one (random) value left.. Then we just plug it back into the table:
这是一个老技巧,它删除给定 MyTable.id 的所有行,除了具有较低排序值的行 - 将表连接到值较小的自身,并删除任何此类连接成功的地方。这只会留下最低值。因此,对于每个 MyTable.id,我们只剩下一个(随机)值。然后我们只需将其插入表中:
UPDATE @MyTable
SET MyColumn = random.val
FROM @MyTable m, @randomMappings AS random
WHERE (random.id = m.id)
And you're done!
你完成了!
I saidit was hacky...
我说这是哈克...
回答by Dan Herbert
I came up with a solution which is a bit of a hack and very inefficient (10~ seconds to update 3000 records). Because this is being used to generate test data, I don't have to be concerned about speed however.
我想出了一个解决方案,该解决方案有点麻烦且效率很低(更新 3000 条记录需要 10 秒)。因为这是用于生成测试数据,所以我不必担心速度。
In this solution, I iterate over every row in the table and update the values one row at a time. It seems to work:
在此解决方案中,我遍历表中的每一行并一次更新一行的值。它似乎有效:
DECLARE @rows INT
DECLARE @currentRow INT
SELECT @rows = COUNT(*) FROM dbo.MyTable
SET @currentRow = 1
WHILE @currentRow < @rows
BEGIN
UPDATE MyTable
SET MyColumn = (SELECT TOP 1 [val] FROM @randomStuff ORDER BY NEWID())
WHERE MyPrimaryKey = (SELECT b.MyPrimaryKey
FROM(SELECT a.MyPrimaryKey, ROW_NUMBER() OVER (ORDER BY MyPrimaryKey) AS rownumber
FROM MyTable a) AS b
WHERE @currentRow = b.rownumber
)
SET @currentRow = @currentRow + 1
END
回答by tster
I don't have time to check this right now, but my gut tells me that if you were to create a function on the server to get the random value that it would not optimize it out.
我现在没有时间检查这个,但我的直觉告诉我,如果你要在服务器上创建一个函数来获取随机值,它不会优化它。
then you would have
那么你会有
UPDATE MyTable
Set MyColumn = dbo.RANDOM_VALUE()
回答by John Farrell
There is no optimization going on here.
这里没有进行优化。
Your using a subquery that selects a single value, there is nothing to optimize.
您使用选择单个值的子查询,无需优化。
You can also try putting a column from the table your updating in the select and see if that changes anything. That may trigger an evaluation for every row in MyTable
您还可以尝试将更新的表中的列放入选择中,看看是否有任何改变。这可能会触发对 MyTable 中每一行的评估
UPDATE MyTable
SET MyColumn = (SELECT TOP 1 [val] FROM @randomStuff ORDER BY NEWID()
WHERE MyTable.MyColumn = MyTable.MyColumn )