SQL 防止 LEFT JOIN 中的重复值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/30410622/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 03:40:10  来源:igfitidea点击:

Prevent duplicate values in LEFT JOIN

sqljoin

提问by Gautam Kumar Samal

I faced a situation where I got duplicate values from LEFT JOIN. I think this might be a desired behavior but unlike from what I want.

我遇到了从LEFT JOIN. 我认为这可能是一种理想的行为,但与我想要的不同。

I have three tables: person, departmentand contact.

我有三个表:person,departmentcontact

person :

人 :

id bigint,
person_name character varying(255)

department :

部门 :

person_id bigint,
department_name character varying(255)

contact :

接触 :

person_id bigint,
phone_number character varying(255)

Sql Query :

查询:

SELECT p.id, p.person_name, d.department_name, c.phone_number 
FROM person p
  LEFT JOIN department d 
    ON p.id = d.person_id
  LEFT JOIN contact c 
    ON p.id = c.person_id;

Result :

结果 :

id|person_name|department_name|phone_number
--+-----------+---------------+------------
1 |"John"     |"Finance"      |"023451"
1 |"John"     |"Finance"      |"99478"
1 |"John"     |"Finance"      |"67890"
1 |"John"     |"Marketing"    |"023451"
1 |"John"     |"Marketing"    |"99478"
1 |"John"     |"Marketing"    |"67890"
2 |"Barbara"  |"Finance"      |""
3 |"Michelle" |""             |"005634"

I know it's what joins do, keeping multiplied with selected rows. But It gives a sense like phone numbers 023451,99478,67890are for both departments while they are only related to person john with unnecessary repeated values which will escalate the problem with larger data set.
So, here is what I want:

我知道这就是连接所做的,保持与选定行相乘。但它给人的感觉就像电话号码023451, 99478,67890是两个部门的,而它们只与人 john 相关,具有不必要的重复值,这会在更大的数据集上升级问题。
所以,这就是我想要的:

id|person_name|department_name|phone_number
--+-----------+---------------+------------
1 |"John"     |"Finance"      |"023451"
1 |"John"     |"Marketing"    |"99478"
1 |"John"     |""             |"67890"
2 |"Barbara"  |"Finance"      |""
3 |"Michelle" |""             |"005634"

This is a sample of my situation and I am using a large set of tables and queries. So, kind of need a generic solution.

这是我的情况示例,我使用了大量表和查询。所以,有点需要一个通用的解决方案。

采纳答案by Erwin Brandstetter

I like to call this problem "cross join by proxy". Since there is no information (WHEREor JOINcondition) how the tables departmentand contactare supposed to match up, they are cross-joined via the proxy table person- giving you the Cartesian product. Very similar to this one:

我喜欢称这个问题为“通过代理交叉连接”。由于没有信息(WHEREJOIN条件)表departmentcontact应该如何匹配,它们通过代理表交叉连接person- 为您提供笛卡尔积。非常类似于这个:

More explanation there.

那里有更多解释。

Solution for your query:

您的查询的解决方案:

SELECT p.id, p.person_name, d.department_name, c.phone_number
FROM   person p
LEFT   JOIN (
  SELECT person_id, min(department_name) AS department_name
  FROM   department
  GROUP  BY person_id
  ) d ON d.person_id = p.id
LEFT   JOIN (
  SELECT person_id, min(phone_number) AS phone_number
  FROM   contact
  GROUP  BY person_id
  ) c ON c.person_id = p.id;

You did not define whichdepartment or phone number to pick, so I arbitrarily chose the first. You can have it any other way ...

你没有定义选择哪个部门或电话号码,所以我随意选择了第一个。你可以用任何其他方式...

回答by alexpods

I think you just need to get lists of departments and phones for particular person. So just use array_agg(or string_aggor json_agg):

我认为您只需要获取特定人员的部门和电话列表。所以只需使用array_agg(或string_aggjson_agg):

SELECT
    p.id,
    p.person_name,
    array_agg(d.department_name) as "department_names",
    array_agg(c.phone_number) as "phone_numbers"
FROM person AS p
LEFT JOIN department AS d ON p.id = d.person_id
LEFT JOIN contact AS c on p.id = c.person_id
GROUP BY p.id, p.person_name

回答by KiloVoltaire

Although the tables are obviously simplified for discussion, it appears they are structurally flawed. Tables should be structured to show relationships between entities, rather than be merely lists of entities and/or attributes. And I would consider a phone number to be an attribute (of a person or department entity) in this case.

尽管这些表格显然是为了讨论而简化的,但它们似乎在结构上存在缺陷。表格的结构应该显示实体之间的关系,而不仅仅是实体和/或属性的列表。在这种情况下,我会将电话号码视为(个人或部门实体的)属性。

The first step would be to create tables with relationships, each having a primary key and possibly a foreign key. In this example, it would be helpful to have the person table use person_id for the primary key, and the department table to use department_id for its primary key. Next look for one-to-many or many-to-many relationships, and set your foreign keys accordingly:

第一步是创建具有关系的表,每个表都有一个主键,可能还有一个外键。在此示例中,将 person 表使用 person_id 作为主键,让 Department 表使用 Department_id 作为其主键会很有帮助。接下来寻找一对多或多对多关系,并相应地设置外键:

  • If one person can only be in one department at a time, then you have a one(department)-to-many(persons). No foreign key in the department table, but department_id will be a foreign key in the persons table.
  • If one person can be in more than one department, they you have a many-to-many, and you'll need an additional junction table with person_id and department_id as foreign keys.
  • 如果一个人一次只能在一个部门,那么你就有一个(部门)对多(人)。部门表中没有外键,但部门 ID 将是人员表中的外键。
  • 如果一个人可以在多个部门,那么他们就是多对多,你需要一个额外的以 person_id 和 department_id 作为外键的联结表。

To summarize, there should only be two tables in your scenario: one table for the person and the other table for the department. Even allowing for personal phone numbers (a column in the persons table) and department numbers in the department table, this would be a better approach.

总而言之,您的场景中应该只有两张表:一张表用于个人,另一张表用于部门。即使允许个人电话号码(人员表中的一列)和部门表中的部门号码,这也是一种更好的方法。

The only caveat is when one department has many numbers (or more than one department shares a single phone number), but this would be beyond the scope of the original question.

唯一的警告是当一个部门有多个号码(或多个部门共享一个电话号码)时,但这超出了原始问题的范围。

回答by JumboClip

SELECT p.id, p.person_name, d.department_name, c.phone_number 
FROM person p
  LEFT JOIN department d 
    ON p.id = d.person_id
  LEFT JOIN contact c 
    ON p.id = c.person_id 
group by p.id, p.person_name, d.department_name, c.phone_number

回答by shA.t

Use this type of query: SQL Server
(You can change idof ORDER BY idto each column that you want it)

使用这种类型的查询:SQL Server
(您可以更改idORDER BY id您想要的每一列)

SELECT 
    p.id, 
    p.person_name, 
    d.department_name, 
    c.phone_number
FROM
    person p
    LEFT JOIN 
    (SELECT *, ROW_NUMBER() OVER (PARTITION BY person_id ORDER BY id) AS seq
     FROM department) d 
    ON d.person_id = p.id And d.seq = 1
    LEFT JOIN 
    ( SELECT *, ROW_NUMBER() OVER (PARTITION BY person_id ORDER BY id) AS seq
     FROM contact) c 
    ON c.person_id = p.id And c.seq = 1;