SQL 如何随机更新行?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2380514/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How can I update rows at random?
提问by jcollum
I'd like to go through a table and randomly wipe out some of the data. I'm doing some data randomizing, turning real first names into fake ones etc. Well one of the tables involved has a column that is null about 40% of the time. My name randomizer app could do a coin toss somewhere in it when it assigns new first names. But I'd prefer to just do it at the end: randomly prune some of the data.
我想通过一张表随机清除一些数据。我正在做一些数据随机化,把真实的名字变成假的等等。嗯,其中一个涉及的表有一个大约 40% 的时间为空的列。我的名字随机化应用程序在分配新的名字时可以在其中的某个地方掷硬币。但我更愿意在最后做:随机修剪一些数据。
I have this code, which doesn't work, but sure looks like it should to me:
我有这个代码,它不起作用,但对我来说确实应该:
Use MyDb
go
CREATE VIEW vRandNumber
AS
SELECT RAND() as RandNumber
go
CREATE FUNCTION RandNumber()
RETURNS float
AS
BEGIN
RETURN (SELECT RandNumber FROM vRandNumber)
END
go
select dbo.RandNumber()
update names set nickname = null
where ((select dbo.RandNumber()) > 0.5)
When I run the RandNumber function it's fine, plenty random. But when I do the update it's updating all the rows half the time and none of the rows the other half of the time.
当我运行 RandNumber 函数时,它很好,很随机。但是当我进行更新时,它有一半时间更新所有行,另一半时间没有更新任何行。
I want it to update a random number of rows every time I run the script. I really thought that a function like RandNumber would be run once for every row in the table. Apparently not.
我希望它在每次运行脚本时更新随机数量的行。我真的认为像 RandNumber 这样的函数会为表中的每一行运行一次。显然不是。
Is this possible without a loop and without a console app?
如果没有循环和控制台应用程序,这可能吗?
Edit: I also tried it with a couple of variations of RAND() in the where directly and got the same results.
编辑:我还直接在 where 中尝试了 RAND() 的几个变体,并得到了相同的结果。
回答by Ian Nelson
Assuming your Names table has a primary key field called Id, this will nullify the nickname in a random 50 percent of the rows:
假设您的 Names 表有一个名为 Id 的主键字段,这将使昵称在随机 50% 的行中无效:
update dbo.Names set Nickname = null where Id in
(
select top 50 percent id from dbo.Names order by NEWID()
)
回答by David Pfeffer
RandNumber
is a function. Functions in SQL must output the same result every time for the same input, unless underlying database data has changed. This is the mathematical definition of a function (as opposed to how a normal programming language treats a "function," which is more of a function-like construct).
RandNumber
是一个函数。SQL 中的函数每次对于相同的输入都必须输出相同的结果,除非底层数据库数据已更改。这是函数的数学定义(与普通编程语言如何对待“函数”相反,后者更像是一种类似函数的构造)。
Since the result of your function should never change during an update statement (which is an atomic operation) the query plan compiler for your query only calls RandNumber
once and then caches the result.
由于您的函数的结果在更新语句(这是一个原子操作)期间不应更改,因此您查询的查询计划编译器只调用RandNumber
一次,然后缓存结果。
You may be able to get away with just referencing RAND
directly in your query, but if that still doesn't work, you'll have to do this iteratively in a stored procedure.
您可能可以RAND
直接在查询中引用,但如果仍然不起作用,则必须在存储过程中迭代地执行此操作。
回答by Quassnoi
RAND()
persists within a query.
RAND()
在查询中持续存在。
SELECT RAND()
FROM names
will give you a set of equal numbers.
会给你一组相等的数字。
You need to do something like this:
你需要做这样的事情:
WITH q AS
(
SELECT *,
ABS(CHECKSUM(NEWID())) % 2 AS r
FROM names
)
UPDATE q
SET nickname = NULL
WHERE r = 0
回答by Quassnoi
This is normal distributing (not random) solution. It assigns vehicles to branches according to Vehicle.ID % 10 + 1 = branch_number
:
这是正态分布(非随机)解决方案。它根据以下条件将车辆分配到分支Vehicle.ID % 10 + 1 = branch_number
:
; WITH mytbl AS (
SELECT TOP 10 *, ROW_NUMBER() OVER (ORDER BY NEWID()) num
FROM Branch
ORDER BY num
)
UPDATE v
SET BranchID = mytbl.ID
FROM Vehicle v
INNER JOIN mytbl ON mytbl.num = v.ID % 10 + 1
SELECT BranchID, COUNT(*) FROM Vehicle GROUP BY BranchID
回答by KM.
try something like this:
尝试这样的事情:
WHERE DATEPART(ms,CreateDate)>500
where "CreateDate" is a column already in the table that has an actual date and time in it. Tthe millisecond should be fairly random
其中“CreateDate”是表中已有的列,其中包含实际日期和时间。毫秒应该是相当随机的
EDIThere's another approach:
编辑这里的另一种方法:
DECLARE @YourTable table (RowID int, RowValue varchar(5))
INSERT INTO @YourTable VALUES (1,'one')
INSERT INTO @YourTable VALUES (2,'two')
INSERT INTO @YourTable VALUES (3,'three')
SELECT
RAND(row_number() over(order by RowID)+DATEPART(ms,GETDATE())),*
FROM @YourTable
OUTPUT run 1:
输出运行 1:
RowID RowValue
---------------------- ----------- --------
0.716200609189072 1 one
0.71621924216033 2 two
0.716237875131588 3 three
(3 row(s) affected)
(3 行受影响)
OUTPUT run 2:
输出运行 2:
RowID RowValue
---------------------- ----------- --------
0.727007732518828 1 one
0.727026365490086 2 two
0.727044998461344 3 three
(3 row(s) affected)
回答by Hogan
How about
怎么样
update names set nickname = null
where abs(checksum(nickname) % 2) = 0
回答by Damien_The_Unbeliever
RAND() (and GetDate/CURRENT_TIMESTAMP) is evaluated once per statement. You need someway to get around that. One way is (if you have a convenient row valued integer, e.g. an ID column), is to call RAND(ID) instead.
RAND()(和 GetDate/CURRENT_TIMESTAMP)每条语句评估一次。你需要某种方式来解决这个问题。一种方法是(如果您有一个方便的行值整数,例如 ID 列),则改为调用 RAND(ID)。