database hadoop vs teradata 有什么区别
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/14621862/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
hadoop vs teradata what is the difference
提问by John
I've touched a Teradata. I've never touched hadoop, but since yesterday, I am doing some research on that. By description of both, they seem quite interchangable, but in some papers it is written that they serve for different purposes. But all I found is vague. I am confused.
我接触过 Teradata。我从未接触过 hadoop,但从昨天开始,我正在对此进行一些研究。通过对两者的描述,它们似乎可以互换,但在一些论文中,它们用于不同的目的。但我发现的一切都是模糊的。我很迷惑。
Has anybody experience with both of them? What is the serious difference between them?
有没有人有他们两个的经验?它们之间的严重区别是什么?
Simple Example: I want to build ETL which will transform billions rows of raw data and organize them to DWH. Then do some resources expensive analysis on them. Why use TD? Why Hadoop? or why not?
简单示例:我想构建 ETL,它将转换数十亿行原始数据并将它们组织到 DWH。然后对它们做一些资源昂贵的分析。为什么要使用TD?为什么是 Hadoop?或者为什么不呢?
回答by ryanbwork
I think this articletitled 'MapReduce and Parallel DBMSs: Friends or Foes' does quite a good job describing the situations where each technology works best. In a nutshell, Hadoop is excellent for storing unstructured data and running parallel transformations to 'sanitize' incoming data, where DBMSs excel at executing complex queries quickly.
我认为这篇题为“MapReduce 和并行 DBMS:朋友或敌人”的文章很好地描述了每种技术最有效的情况。简而言之,Hadoop 非常适合存储非结构化数据和运行并行转换以“清理”传入数据,其中 DBMS 擅长快速执行复杂查询。
回答by Yaniv
Hadoop, Hadoop with Extensions, RDBMS Feature/Property Comparison
Hadoop、带有扩展的 Hadoop、RDBMS 功能/属性比较
I am not an expert in this area, but in the coursera.com course, Introduction to Data Science, there is a lecture titled: Comparing MapReduce and Databases as well as a lecture on Parallel databases within the map reduce section of the course.
我不是这方面的专家,但是在 coursera.com 的课程数据科学导论中,有一个名为:比较 MapReduce 和数据库的讲座以及课程的 map reduce 部分中关于并行数据库的讲座。
Here is a summary from these lectures on the comparison of MapReduce vs. RDBMS (not necessarily parallel RDMBS). One point to remember is that the comparison is different if you include extensions to Hadoop like PIG, Hive, etc. I will put in () MapReduce extensions that add some of these functionality/properties.
以下是这些讲座的摘要,关于 MapReduce 与 RDBMS(不一定是并行 RDMBS)的比较。要记住的一点是,如果您包含对 Hadoop 的扩展(如 PIG、Hive 等),则比较是不同的。我将放入 () MapReduce 扩展,以添加其中一些功能/属性。
Some functionality/properties that RDBMS have but not native MapReduce:
RDBMS 具有但本机 MapReduce 没有的一些功能/属性:
- Declaritive query languages -(Pig, HIVE)
- Schemas (Hive, Pig, DyradLINQ, Hadapt)
- Logical Data Independence
- Indexing (Hbase)
- Algebraic Optimization (Pig, Dryad, HIVE)
- Caching/Materialized Views
- ACID/Transactions
- 声明式查询语言 -(Pig、HIVE)
- 模式(Hive、Pig、DyradLINQ、Hadapt)
- 逻辑数据独立性
- 索引(Hbase)
- 代数优化(Pig、Dryad、HIVE)
- 缓存/物化视图
- 酸/交易
MapReduce (relative to regular RDBMS not necessarily Parallel RDMBS)
MapReduce(相对于常规 RDBMS 不一定是并行 RDMBS)
- High Scalability
- Fault-tolerance
- “One-person deployment”
- 高可扩展性
- 容错
- “一人部署”
回答by shazin
To Begin with, Vanilla Apache Hadoop is 100% open source. But if you need commercial support along with consultancy there are companies like Cloudera, MapR, HortonWorks, etc.
首先,Vanilla Apache Hadoop 是 100% 开源的。但是,如果您需要商业支持和咨询服务,可以使用 Cloudera、MapR、HortonWorks 等公司。
Hadoop is backed by a growing community fixing bugs and making improvements on a consistent basis. Hadoop storage model HDFS is based on Google's GFSarchitecture which is proven to handle large quantities of data. Furthermore Hadoop analysis model Map Reduce is based on Google's Map Reduce Model.
Hadoop 得到了不断增长的社区的支持,他们在一致的基础上修复错误并进行改进。Hadoop 存储模型 HDFS 基于 Google 的GFS架构,该架构已被证明可以处理大量数据。此外,Hadoop 分析模型 Map Reduce 基于 Google 的Map Reduce 模型。
Hadoop is used by Tech Giants like Facebook, Yahoo, Twitter, EBay etc to store and analysis they high volume of data real time as well as passively.
Hadoop 被 Facebook、雅虎、Twitter、EBay 等科技巨头用来实时和被动地存储和分析大量数据。
For your question ETL systems read these slideswhere you will see.
对于您的 ETL 系统问题,请阅读您将看到的这些幻灯片。
Ok now Why Hadoop?
好的,为什么是 Hadoop?
- Open Source
- Proven Storage and Analysis model for Large Quantities of data
- Minimum Hardware Requirement to setup and run.
- 开源
- 适用于大量数据的经过验证的存储和分析模型
- 设置和运行的最低硬件要求。
Ok now Why TD?
好的,为什么选择TD?
- Commercial Support
- 商业支持
回答by GMc
I've been asked this question several times, the answer that I usually give is a car analogy (which is pretty silly because I'm not a car person - but it seems to work)
我被问过好几次这个问题,我通常给出的答案是一个汽车类比(这很愚蠢,因为我不是汽车人 - 但它似乎有效)
- Teradata is the car/dbms for the masses - it is reliable, mature, works well and is there when you need it. It is difficult (compared to Hadoop) to customise and add functionality to the base product.
- Hadoop is the car/dbms for the enthusiast - it isn't as reliable or mature, it works well so long as you attend to it. It is easy (compared to Teradata) to customise and add functionality to the base product.
- Teradata 是面向大众的汽车/数据库管理系统 - 它可靠、成熟、运行良好,并且在您需要时随时可用。很难(与 Hadoop 相比)为基础产品定制和添加功能。
- Hadoop 是爱好者的汽车/数据库管理系统——它不那么可靠或成熟,只要你关注它,它就可以很好地工作。自定义基础产品并向其添加功能(与 Teradata 相比)很容易。
Put another way, Teradata is the reliable workhorse where you put your mission critical process (operational reporting, enterprise reporting, decision support etc). Hadoop is the place where you can do alot of this stuff, but don't be surprised if you come in one morning and find that your regulatory reports can't be produced because someone applied a patch or you've suddenly got a "too many small files" problem.
换句话说,Teradata 是您放置关键任务流程(运营报告、企业报告、决策支持等)的可靠主力。Hadoop 是您可以做很多这些事情的地方,但是如果您有一天早上来到这里并发现您的监管报告无法生成,因为有人应用了补丁或者您突然“太”了,请不要感到惊讶许多小文件”问题。
To loop back into the analogy, if you don't want to be too techy and the manufacturers product (dbms and/or car) works for you out of the box, Teradata is a good option. On the other hand, if you like to tinker under the hood, swap out the carburettor (or whatever), adjust the gear ratios, tweak the fuel air mixture depending on whether you are country or city driving, bolt on a Turbo charger and/or your family complain about how long you spend in the garage on weekends - Hadoop is the place for you.
回到类比,如果您不想太技术化并且制造商的产品(dbms 和/或汽车)开箱即用,Teradata 是一个不错的选择。另一方面,如果您喜欢在引擎盖下修修补补,请更换化油器(或其他任何东西),调整齿轮比,根据您是在乡村还是城市驾驶来调整燃料空气混合物,拧上涡轮增压器和/或者您的家人抱怨您周末在车库里待了多久——Hadoop 就是您的最佳选择。
IMHO, Most, if not all organisations need both. I hope this helps :-)
恕我直言,大多数,如果不是所有组织都需要两者。我希望这有帮助 :-)

