SQL 如何随机更新行?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2380514/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 05:35:36  来源:igfitidea点击:

How can I update rows at random?

sqlsql-server-2008

提问by jcollum

I'd like to go through a table and randomly wipe out some of the data. I'm doing some data randomizing, turning real first names into fake ones etc. Well one of the tables involved has a column that is null about 40% of the time. My name randomizer app could do a coin toss somewhere in it when it assigns new first names. But I'd prefer to just do it at the end: randomly prune some of the data.

我想通过一张表随机清除一些数据。我正在做一些数据随机化,把真实的名字变成假的等等。嗯,其中一个涉及的表有一个大约 40% 的时间为空的列。我的名字随机化应用程序在分配新的名字时可以在其中的某个地方掷硬币。但我更愿意在最后做:随机修剪一些数据。

I have this code, which doesn't work, but sure looks like it should to me:

我有这个代码,它不起作用,但对我来说确实应该:

Use MyDb
go 

CREATE VIEW vRandNumber 
AS 
SELECT RAND() as RandNumber 

go  

CREATE FUNCTION RandNumber() 
RETURNS float 
AS 
  BEGIN 
  RETURN (SELECT RandNumber FROM vRandNumber) 
  END 

go  

select dbo.RandNumber()

update names set nickname = null 
where ((select dbo.RandNumber()) > 0.5)

When I run the RandNumber function it's fine, plenty random. But when I do the update it's updating all the rows half the time and none of the rows the other half of the time.

当我运行 RandNumber 函数时,它很好,很随机。但是当我进行更新时,它有一半时间更新所有行,另一半时间没有更新任何行。

I want it to update a random number of rows every time I run the script. I really thought that a function like RandNumber would be run once for every row in the table. Apparently not.

我希望它在每次运行脚本时更新随机数量的行。我真的认为像 RandNumber 这样的函数会为表中的每一行运行一次。显然不是。

Is this possible without a loop and without a console app?

如果没有循环和控制台应用程序,这可能吗?

Edit: I also tried it with a couple of variations of RAND() in the where directly and got the same results.

编辑:我还直接在 where 中尝试了 RAND() 的几个变体,并得到了相同的结果。

回答by Ian Nelson

Assuming your Names table has a primary key field called Id, this will nullify the nickname in a random 50 percent of the rows:

假设您的 Names 表有一个名为 Id 的主键字段,这将使昵称在随机 50% 的行中无效:

update dbo.Names set Nickname = null where Id in
(
    select top 50 percent id from dbo.Names order by NEWID()
)

回答by David Pfeffer

RandNumberis a function. Functions in SQL must output the same result every time for the same input, unless underlying database data has changed. This is the mathematical definition of a function (as opposed to how a normal programming language treats a "function," which is more of a function-like construct).

RandNumber是一个函数。SQL 中的函数每次对于相同的输入都必须输出相同的结果,除非底层数据库数据已更改。这是函数的数学定义(与普通编程语言如何对待“函数”相反,后者更像是一种类似函数的构造)。

Since the result of your function should never change during an update statement (which is an atomic operation) the query plan compiler for your query only calls RandNumberonce and then caches the result.

由于您的函数的结果在更新语句(这是一个原子操作)期间不应更改,因此您查询的查询计划编译器只调用RandNumber一次,然后缓存结果。

You may be able to get away with just referencing RANDdirectly in your query, but if that still doesn't work, you'll have to do this iteratively in a stored procedure.

您可能可以RAND直接在查询中引用,但如果仍然不起作用,则必须在存储过程中迭代地执行此操作。

回答by Quassnoi

RAND()persists within a query.

RAND()在查询中持续存在。

SELECT  RAND()
FROM    names

will give you a set of equal numbers.

会给你一组相等的数字。

You need to do something like this:

你需要做这样的事情:

WITH    q AS
        (
        SELECT  *,
                ABS(CHECKSUM(NEWID())) % 2 AS r
        FROM    names
        )
UPDATE  q
SET     nickname = NULL
WHERE   r = 0

回答by Quassnoi

This is normal distributing (not random) solution. It assigns vehicles to branches according to Vehicle.ID % 10 + 1 = branch_number:

这是正态分布(非随机)解决方案。它根据以下条件将车辆分配到分支Vehicle.ID % 10 + 1 = branch_number

; WITH mytbl AS (
    SELECT TOP 10 *, ROW_NUMBER() OVER (ORDER BY NEWID()) num
    FROM Branch
    ORDER BY num
)

UPDATE v
SET BranchID = mytbl.ID
FROM Vehicle v
    INNER JOIN mytbl ON mytbl.num = v.ID % 10 + 1

SELECT BranchID, COUNT(*) FROM Vehicle GROUP BY BranchID

回答by KM.

try something like this:

尝试这样的事情:

WHERE DATEPART(ms,CreateDate)>500

where "CreateDate" is a column already in the table that has an actual date and time in it. Tthe millisecond should be fairly random

其中“CreateDate”是表中已有的列,其中包含实际日期和时间。毫秒应该是相当随机的

EDIThere's another approach:

编辑这里的另一种方法:

DECLARE @YourTable table (RowID int, RowValue varchar(5))
INSERT INTO @YourTable VALUES (1,'one')
INSERT INTO @YourTable VALUES (2,'two')
INSERT INTO @YourTable VALUES (3,'three')

SELECT 
    RAND(row_number() over(order by RowID)+DATEPART(ms,GETDATE())),* 
    FROM @YourTable

OUTPUT run 1:

输出运行 1:

                       RowID       RowValue
---------------------- ----------- --------
0.716200609189072      1           one
0.71621924216033       2           two
0.716237875131588      3           three

(3 row(s) affected)

(3 行受影响)

OUTPUT run 2:

输出运行 2:

                       RowID       RowValue
---------------------- ----------- --------
0.727007732518828      1           one
0.727026365490086      2           two
0.727044998461344      3           three

(3 row(s) affected)

回答by Hogan

How about

怎么样

update names set nickname = null
where  abs(checksum(nickname) % 2) = 0

回答by Damien_The_Unbeliever

RAND() (and GetDate/CURRENT_TIMESTAMP) is evaluated once per statement. You need someway to get around that. One way is (if you have a convenient row valued integer, e.g. an ID column), is to call RAND(ID) instead.

RAND()(和 GetDate/CURRENT_TIMESTAMP)每条语句评估一次。你需要某种方式来解决这个问题。一种方法是(如果您有一个方便的行值整数,例如 ID 列),则改为调用 RAND(ID)。