如何比较两列以在 MySQL 中查找不匹配的记录

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/7628337/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 21:13:51  来源:igfitidea点击:

How to compare two columns to find unmatched records in MySQL

mysql

提问by ronniekap

I have a MySQL table with 2 columns and each column has thousands of records

我有一个有 2 列的 MySQL 表,每列有数千条记录

For Example 15000 Email addresses in Column1 and 15005 Email addresses in column 2

例如,第 1 列中的 15000 个电子邮件地址和第 2 列中的 15005 个电子邮件地址

How to find those 5 records from 15005 which are unmatched in column1?

如何从 15005 中找到在 column1 中不匹配的那 5 条记录?

I wish MySql query to compare both columns and give result of only 5 unmatched records

我希望 MySql 查询比较两列并只给出 5 个不匹配记录的结果

Thanks

谢谢

回答by Mosty Mostacho

Not sure if I got it right... but would it be something like?

不确定我是否做对了……但会是这样吗?

select column2 from table
where column2 not in (select column1 from table)

回答by Jamie McCarthy

Richard, it's highly unusual to find matching/missing rows from one column in a table compared against another column in the same table.

理查德,从表中的一列中找到匹配/缺失的行与同一表中的另一列进行比较是非常不寻常的。

You can think of a table as being a collection of facts, with each row being one fact. Converting values into predicates is how we understand the data. The value "12" in one table may mean "there exists a day on which 12 widgets were made," or "12 people bought widgets on Jan. 1," or "on Jan. 12, no widgets were sold," but whatever the table's corresponding predicate is, "12" should represent a fact.

您可以将表视为事实的集合,每一行都是一个事实。将值转换为谓词是我们理解数据的方式。一张表中的值“12”可能意味着“存在一天制作了 12 个小部件”,或“12 人在 1 月 1 日购买了小部件”,或“在 1 月 12 日,没有小部件售出”,但无论如何该表对应的谓词是,“12”应该代表一个事实。

It's common to want to find the difference between two tables: "what facts are in B that aren't in A?" But in a table with two columns, each row should conceptually be a fact about that pair of values. Perhaps the predicate for the row (12, 13) might be "on Jan. 12, we sold 13 widgets." But in that case I doubt you'd be asking for this information.

想要找出两个表之间的区别是很常见的:“B 中哪些事实不在 A 中?” 但是在一个有两列的表中,每一行在概念上应该是关于那对值的一个事实。也许行 (12, 13) 的谓词可能是“在 1 月 12 日,我们售出了 13 个小部件”。但在那种情况下,我怀疑你会要求提供这些信息。

So, if (12,13) is really two of the same predicate -- "someone in district 12 bought widgets, and also, someone in district 13 bought widgets" -- in the long run life will be easier if those are one column, not two. And if it's two different predicates, it would make more sense for them to be in two tables. SQL's flexible and can handle these situations, but you may run into more problems later. If you're interested in more about this subject, searching on "normalization" will find you way more than you want to know :)

所以,如果 (12,13)​​ 真的是两个相同的谓词——“12 区有人买了小部件,还有 13 区有人买了小部件”——从长远来看,如果这些是一列,生活会更容易,不是两个。如果是两个不同的谓词,将它们放在两个表中会更有意义。SQL 很灵活,可以处理这些情况,但以后可能会遇到更多问题。如果您对此主题感兴趣,那么搜索“规范化”会发现您想知道的更多:)

Anyway, I think the query you're looking for uses a LEFT JOIN to compare the table against itself. I added the values 1-15000 to col1 and 1-15005 to col2 in this table:

无论如何,我认为您要查找的查询使用 LEFT JOIN 将表与自身进行比较。我在此表中将值 1-15000 添加到 col1 和 1-15005 到 col2:

CREATE TABLE `foo` (
  `col1` int(11) DEFAULT NULL,
  `col2` int(11) DEFAULT NULL,
  KEY `idx_col1` (`col1`),
  KEY `idx_col2` (`col2`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

mysql> select count(distinct col1), count(distinct col2) from foo;
+----------------------+----------------------+
| count(distinct col1) | count(distinct col2) |
+----------------------+----------------------+
|                15000 |                15005 |
+----------------------+----------------------+
1 row in set (0.01 sec)

By giving the same table two names, I can compare its two columns against each other, and find the col2 values that have no corresponding col1 values -- in those cases, f1.col1 will be NULL:

通过给同一个表提供两个名称,我可以将它的两列相互比较,并找到没有对应 col1 值的 col2 值——在这些情况下,f1.col1 将为 NULL:

mysql> select f2.col2
from foo as f2 left join foo as f1 on (f2.col2=f1.col1)
where f1.col1 is null;
+-------+
| col2  |
+-------+
| 15001 |
| 15002 |
| 15003 |
| 15004 |
| 15005 |
+-------+
5 rows in set (0.03 sec)

Regarding Mosty's solution yesterday, I'm not sure it's correct. I try not to use subqueries, so I'm a little out of my depth here. But it doesn't seem to work for at least my attempt to replicate your data set:

关于昨天的莫斯蒂解决方案,我不确定它是否正确。我尽量不使用子查询,所以我在这里有点超出我的深度。但至少我尝试复制您的数据集似乎不起作用:

mysql> select col2 from foo where col2 not in
(select col1 from foo);
Empty set (0.02 sec)

It works if I exclude the 5 NULLs from the subquery, which suggests to me that "NOT IN (NULL)" doesn't necessarily work the way one might think it works:

如果我从子查询中排除 5 个 NULL,它就可以工作,这向我表明“NOT IN (NULL)”不一定像人们认为的那样工作:

mysql> select col2 from foo where col2 not in
(select col1 from foo where col1 is not null);
+-------+
| col2  |
+-------+
| 15001 |
| 15002 |
| 15003 |
| 15004 |
| 15005 |
+-------+
5 rows in set (0.02 sec)

The main reason I avoid subqueries in MySQL is that they have unpredictable performance characteristics, or at least, complex enough that Ican't predict them. For more information, see the "O(MxN)" comment in http://dev.mysql.com/doc/refman/5.5/en/subquery-restrictions.htmland the advice on the short webpage http://dev.mysql.com/doc/refman/5.5/en/rewriting-subqueries.html.

我避免在 MySQL 中使用子查询的主要原因是它们具有不可预测的性能特征,或者至少,复杂到无法预测它们。有关更多信息,请参阅http://dev.mysql.com/doc/refman/5.5/en/subquery-restrictions.html 中的“O(MxN)”注释和短网页http://dev上的建议。 mysql.com/doc/refman/5.5/en/rewriting-subqueries.html