database 数据沿袭和数据来源之间有什么区别?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/43383197/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What are the differences between Data Lineage and Data Provenance?
提问by CSY
From wiki,
来自维基,
Data lineage is defined as a data life cycle that includes the data's origins and where it moves over time. It describes what happens to data as it goes through diverse processes. It helps provide visibility into the analytics pipeline and simplifies tracing errors back to their sources.
数据沿袭被定义为一个数据生命周期,其中包括数据的来源及其随时间移动的位置。它描述了数据在经历不同的过程时会发生什么。它有助于提供对分析管道的可见性,并简化将错误追溯到其源头的过程。
Data provenance documents the inputs, entities, systems, and processes that influence data of interest, in effect providing a historical record of the data and its origins.
数据来源记录了影响感兴趣数据的输入、实体、系统和过程,实际上提供了数据及其来源的历史记录。
It seems that both concepts are talking about about where the data comes from but I'm still confused about the differences. Are both the concepts the same? If they are different, can someone shares an example?
似乎这两个概念都在谈论数据的来源,但我仍然对差异感到困惑。这两个概念是一样的吗?如果它们不同,有人可以分享一个例子吗?
Thanks,
谢谢,
回答by Jan Andrs
From our experience, data provenanceincludes only high level view of the system for business users, so they can roughly navigate where their data come from. It's provided by variety of modeling tools or just simple custom tables and charts. Data lineageis a more specific term and includes two sides - business (data) lineage and technical (data) lineage. Business lineage pictures data flows on a business-term level and it's provided by solutions like Collibra, Alation and many others. Technical data lineage is created from actual technical metadata and tracks data flows on the lowest level - actual tables, scripts and statements. Technical data lineage is being provided by solutions such as MANTA or Informatica Metadata Manager.
根据我们的经验,数据来源仅包括业务用户对系统的高级视图,因此他们可以粗略地导航数据的来源。它由各种建模工具或只是简单的自定义表格和图表提供。数据沿袭是一个更具体的术语,包括两个方面 - 业务(数据)沿袭和技术(数据)沿袭。业务沿袭在业务术语级别描绘数据流,它由 Collibra、Alation 和许多其他解决方案提供。技术数据沿袭是根据实际技术元数据创建的,并在最低级别跟踪数据流 - 实际表、脚本和语句。MANTA 或 Informatica Metadata Manager 等解决方案正在提供技术数据沿袭。
回答by Nicholas Car
See this section in the Wikipedia articl on provenance: https://en.wikipedia.org/wiki/Provenance#Science. It links to collections of academic and industry work on provenance.
请参阅维基百科关于出处的文章中的这一部分:https: //en.wikipedia.org/wiki/Provenance#Science。它链接到有关出处的学术和行业工作的集合。
To succinctly answer your question: in general, there's not enough context known to differentiate between data lineageand data provenance. Within a specific context, you could look for, or create, specific and possibly different, definitions.
简洁地回答您的问题:一般来说,没有足够的已知上下文来区分数据沿袭和数据来源。在特定上下文中,您可以查找或创建特定且可能不同的定义。
回答by Sam M
Data Provenance is,
数据来源是,
data lineage (what is the genealogy,history of its journey, where did it begin, how did it come into being, how did it change over time, where has it been, systems it has traveled, any loss or gain) (i.e. data oriented, metadata)
数据谱系(系谱是什么,其旅程的历史,它从哪里开始,它是如何形成的,它是如何随时间变化的,它在哪里,它所经过的系统,任何损失或收益)(即数据面向,元数据)
PLUS
加
the inputs, entities, systems and processes that influenced the data (i.e. process oriented) which can be used to reproduce the data.
影响数据的输入、实体、系统和过程(即面向过程),可用于复制数据。
回答by Jojhn A
I believe a more simple explanation is who owns it, who touched it, and where is it going.
我相信一个更简单的解释是谁拥有它,谁接触了它,它要去哪里。
In a Business sense, that can be summed up in Data Flow Diagrams.
在商业意义上,这可以用数据流图来概括。
In a Technical sense, that's a whole lot of baggage to start adding onto data as it flows from system to system. There has to be some HUGE justification to carry that mountain around and for what purpose? To see some pretty graphs? Not going to happen in large real world environments. The justification in $$$ for what??
从技术意义上讲,当数据从一个系统流向另一个系统时,开始添加数据是一大堆包袱。必须有一些巨大的理由来携带这座山,目的是什么?看一些漂亮的图表?在大型现实世界环境中不会发生。$$$ 的理由是什么?
It's one thing to tag data with a simple 2 - 4 byte code of origin as it moves from system to system, but to keep all of that other technical jumbo, the cost in system performance degradation / dasd / backups / etc. for a pretty graph? No way....
当数据从一个系统移动到另一个系统时,用一个简单的 2 - 4 字节的原始代码标记数据是一回事,但为了保持所有其他技术巨无霸,系统性能下降/dasd/备份/等方面的成本图形?没门....
回答by Raj
Data Provenance is the point of origin for the data term, Data Lineage is the complete data transformation journey from point of origin to current observation point in system.
Data Provenance是数据项的起源点,Data Lineage是系统中从起源点到当前观察点的完整数据转换过程。

