java Elasticsearch 中的索引是什么
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/15025876/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What is an index in Elasticsearch
提问by LuckyLuke
What is an index in Elasticsearch? Does one application have multiple indexes or just one? Let say you built a system for some car manufacturer. It deals with people, cars, spare parts etc. Do you have one index named manufacturer, or do you have one index for people, one for cars and a third for spare parts? Could someone explain?
Elasticsearch 中的索引是什么?一个应用程序有多个索引还是只有一个索引?假设您为某个汽车制造商构建了一个系统。它涉及人员、汽车、备件等。您是否有一个名为制造商的索引,或者您是否有一个用于人员的索引,一个用于汽车,第三个用于备件?有人能解释一下吗?
回答by Zach
Good question, and the answer is a lot more nuanced than one might expect. You can use indices for several different purposes.
好问题,答案比人们想象的要微妙得多。您可以将索引用于多种不同的目的。
Indices for Relations
关系指数
The easiest and most familiar layout clones what you would expect from a relational database. You can (very roughly) think of an index like a database.
最简单和最熟悉的布局克隆了您对关系数据库的期望。您可以(非常粗略地)将索引视为数据库。
- MySQL => Databases => Tables => Rows/Columns
- ElasticSearch => Indices => Types => Documents with Properties
- MySQL => 数据库 => 表 => 行/列
- ElasticSearch => 索引 => 类型 => 具有属性的文档
An ElasticSearch cluster can contain multiple Indices
(databases), which in turn contain multiple Types
(tables). These types hold multiple Documents
(rows), and each document has Properties
(columns).
ElasticSearch 集群可以包含多个Indices
(数据库),而这些数据库又包含多个Types
(表)。这些类型包含多个Documents
(行),每个文档都有Properties
(列)。
So in your car manufacturing scenario, you may have a SubaruFactory
index. Within this index, you have three different types:
所以在你的汽车制造场景中,你可能有一个SubaruFactory
索引。在这个索引中,你有三种不同的类型:
People
Cars
Spare_Parts
People
Cars
Spare_Parts
Each type then contains documents that correspond to that type (e.g. a Subaru Imprezza doc lives inside of the Cars
type. This doc contains all the details about that particular car).
然后,每种类型都包含对应于该类型的文档(例如,Subaru Imprezza 文档位于该Cars
类型中。该文档包含有关该特定汽车的所有详细信息)。
Searching and querying takes the format of: http://localhost:9200/[index]/[type]/[operation]
搜索查询格式为:http://localhost:9200/[index]/[type]/[operation]
So to retrieve the Subaru document, I may do this:
所以要检索斯巴鲁文件,我可以这样做:
$ curl -XGET localhost:9200/SubaruFactory/Cars/SubaruImprezza
.
.
Indices for Logging
记录索引
Now, the reality is that Indices/Types are much more flexible than the Database/Table abstractions we are used to in RDBMs. They can be considered convenient data organization mechanisms, with added performance benefits depending on how you set up your data.
现在,实际情况是索引/类型比我们在 RDBM 中使用的数据库/表抽象灵活得多。它们可以被视为方便的数据组织机制,根据您设置数据的方式增加了性能优势。
To demonstrate a radically different approach, a lot of people use ElasticSearch for logging. A standard format is to assign a new index for each day. Your list of indices may look like this:
为了演示一种完全不同的方法,很多人使用 ElasticSearch 进行日志记录。标准格式是为每一天分配一个新索引。您的索引列表可能如下所示:
- logs-2013-02-22
- logs-2013-02-21
- logs-2013-02-20
- 日志-2013-02-22
- 日志-2013-02-21
- 日志-2013-02-20
ElasticSearch allows you to query multiple indices at the same time, so it isn't a problem to do:
ElasticSearch 允许您同时查询多个索引,因此这样做不是问题:
$ curl -XGET localhost:9200/logs-2013-02-22,logs-2013-02-21/Errors/_search=q:"Error Message"
Which searches the logs from the last two days at the same time. This format has advantages due to the nature of logs - most logs are never looked at and they are organized in a linear flow of time. Making an index per log is more logical and offers better performance for searching.
同时搜索最近两天的日志。由于日志的性质,这种格式具有优势 - 大多数日志从未被查看过,并且它们以线性时间流进行组织。为每个日志创建索引更合乎逻辑,并提供更好的搜索性能。
.
.
Indices for Users
用户指数
Another radically different approach is to create an index per user. Imagine you have some social networking site, and each users has a large amount of random data. You can create a single index for each user. Your structure may look like:
另一种完全不同的方法是为每个用户创建一个索引。想象一下你有一个社交网站,每个用户都有大量的随机数据。您可以为每个用户创建一个索引。您的结构可能如下所示:
- Zach's Index
- Hobbies Type
- Friends Type
- Pictures Type
- Fred's Index
- Hobbies Type
- Friends Type
- Pictures Type
- 扎克指数
- 爱好类型
- 好友类型
- 图片类型
- 弗雷德指数
- 爱好类型
- 好友类型
- 图片类型
Notice how this setup could easily be done in a traditional RDBM fashion (e.g. "Users" Index, with hobbies/friends/pictures as types). All users would then be thrown into a single, giant index.
请注意如何以传统的 RDBM 方式轻松完成此设置(例如“用户”索引,将爱好/朋友/图片作为类型)。然后所有用户都将被放入一个单一的巨大索引中。
Instead, it sometimes makes sense to split data apart for data organization and performance reasons. In this scenario, we are assuming each user has a lotof data, and we want them separate. ElasticSearch has no problem letting us create an index per user.
相反,有时出于数据组织和性能原因将数据分开是有意义的。在这种情况下,我们假设每个用户都有很多数据,我们希望它们分开。ElasticSearch 让我们为每个用户创建索引没有问题。
回答by Shirish Kadam
@Zach's answer is valid for elasticsearch 5.X and below. Since elasticsearch 6.X Type
has been deprecated and will be completely removed in 7.X. Quoting the elasticsearch docs:
@Zach 的回答对 elasticsearch 5.X 及以下版本有效。由于 elasticsearch 6.XType
已被弃用,并将在 7.X 中完全删除。引用 elasticsearch 文档:
Initially, we spoke about an “index” being similar to a “database” in an SQL database, and a “type” being equivalent to a “table”. This was a bad analogy that led to incorrect assumptions.
最初,我们谈到“索引”类似于 SQL 数据库中的“数据库”,而“类型”等同于“表”。这是一个糟糕的类比,导致了错误的假设。
Further to explain, two columns with the same name in SQL from two different tables can be independent of each other. But in an elasticsearch index that is not possible since they are backed by the same Lucene field. Thus, "index" in elasticsearch is not quite same as a "database" in SQL. If there are any same fields in an index they will end up having conflicts of field types. To avoid this the elasticsearch documentation recommends storing index per document type.
进一步说明一下,SQL 中来自两个不同表的两个同名列可以相互独立。但是在弹性搜索索引中这是不可能的,因为它们由相同的 Lucene 字段支持。因此,elasticsearch 中的“索引”与 SQL 中的“数据库”并不完全相同。如果索引中有任何相同的字段,它们最终会出现字段类型冲突。为了避免这种情况,elasticsearch 文档建议按文档类型存储索引。
Refer: Removal of mapping types
参考:删除映射类型
回答by Filipe Miguel Fonseca
An index is a data structure for storing the mapping of fields to the corresponding documents. The objective is to allow faster searches, often at the expense of increased memory usage and preprocessing time.
索引是一种数据结构,用于存储字段到相应文档的映射。目标是允许更快的搜索,通常以增加内存使用和预处理时间为代价。
The number of indexes you create is a design decision that you should take according to your application requirements. You can have an index for each business concept... You can an index for each month of the year...
您创建的索引数量是您应该根据应用程序要求做出的设计决策。您可以为每个业务概念创建一个索引...您可以为一年中的每个月创建一个索引...
You should invest some time getting acquainted with lucene and elasticsearch concepts.
您应该花一些时间熟悉 lucene 和 elasticsearch 概念。
Take a look at the introductory videoand to this one with some data design patterns
回答by TheExorcist
Above one is too detailed in very short it could be defined as
上面一个太详细了,很短可以定义为
Index:It is a collection of different type of documents and document properties. Index also uses the concept of shards to improve the performance. For example, a set of document contains data of a social networking application. Answer from tutorialpoints.com
索引:它是不同类型文档和文档属性的集合。索引也使用分片的概念来提高性能。例如,一组文档包含社交网络应用程序的数据。来自tutorialpoints.com的回答
Since index is collection of different type of documents as per question depends how you want to categorize.
由于索引是每个问题的不同类型文档的集合,这取决于您要如何分类。
Do you have one index named manufacturer? Yes , we will keep one document with manufacturer thing.
你有一个名为制造商的索引吗?是的,我们将保留一份与制造商有关的文件。
do you have one index for people, one for cars and a third for spare parts? Could someone explain? Think of instance car given by same manufacturer to many people driving it on road .So there could be many indices depending upon number of use.
你有一个人的索引,一个汽车的索引,第三个备件的索引吗?有人能解释一下吗?想想同一制造商提供给许多人在道路上驾驶它的实例汽车。因此根据使用次数可能会有很多指标。
If we think deeply we will found except first question all are invalid ones. Elastic-search documents are much different that SQL docs or csv or spreadsheet docs ,from one indices and by good powerful query language you can create millions type of data categorised documents in CSV style.
如果我们深入思考,我们会发现除了第一个问题都是无效的。Elastic-search 文档与 SQL 文档或 csv 或电子表格文档有很大不同,从一个索引和强大的查询语言,您可以创建数百万种 CSV 样式的数据分类文档。
Due to its blazingly fast and indexed capability we create one index only for one customer , from that we create many type of documnets as per our need . For example:
由于其极快的索引能力,我们仅为一位客户创建了一个索引,然后我们根据需要创建了多种类型的文档。例如:
All old people using same model.Or One Old people using all model .
使用相同型号的所有老人。或使用所有型号的一位老人。
Permutation is inifinite.
排列是无限的。