database Elasticsearch 查询返回所有记录
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/8829468/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Elasticsearch query to return all records
提问by John Livermore
I have a small database in Elasticsearch and for testing purposes would like to pull all records back. I am attempting to use a URL of the form...
我在 Elasticsearch 中有一个小型数据库,出于测试目的,我想拉回所有记录。我正在尝试使用表单的 URL...
http://localhost:9200/foo/_search?pretty=true&q={'matchAll':{''}}
Can someone give me the URL you would use to accomplish this, please?
有人可以给我您用来完成此操作的网址吗?
回答by Steve Casey
I think lucene syntax is supported so:
我认为支持 lucene 语法,因此:
http://localhost:9200/foo/_search?pretty=true&q=*:*
http://localhost:9200/foo/_search?pretty=true&q=*:*
size defaults to 10, so you may also need &size=BIGNUMBERto get more than 10 items. (where BIGNUMBER equals a number you believe is bigger than your dataset)
size 默认为 10,因此您可能还需要&size=BIGNUMBER获取 10 个以上的项目。(其中 BIGNUMBER 等于您认为大于数据集的数字)
BUT, elasticsearch documentation suggestsfor large result sets, using the scan search type.
但是,elasticsearch 文档建议使用扫描搜索类型来处理大型结果集。
EG:
例如:
curl -XGET 'localhost:9200/foo/_search?search_type=scan&scroll=10m&size=50' -d '
{
"query" : {
"match_all" : {}
}
}'
and then keep requesting as per the documentation link above suggests.
然后按照上面的文档链接继续请求。
EDIT: scanDeprecated in 2.1.0.
编辑:scan在 2.1.0 中已弃用。
scandoes not provide any benefits over a regular scrollrequest sorted by _doc. link to elastic docs(spotted by @christophe-roussy)
scan与scroll按 排序的常规请求相比没有任何好处_doc。弹性文档链接(由@christophe-roussy 发现)
回答by lfender6445
http://127.0.0.1:9200/foo/_search/?size=1000&pretty=1
^
Note the size param, which increases the hits displayed from the default (10) to 1000 per shard.
请注意 size param,它将每个分片显示的命中数从默认值 (10) 增加到 1000。
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-from-size.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-from-size.html
回答by Prerak Diwan
elasticsearch(ES) supports both a GET or a POST request for getting the data from the ES cluster index.
elasticsearch(ES) 支持 GET 或 POST 请求从 ES 集群索引中获取数据。
When we do a GET:
当我们执行 GET 时:
http://localhost:9200/[your index name]/_search?size=[no of records you want]&q=*:*
When we do a POST:
当我们执行 POST 时:
http://localhost:9200/[your_index_name]/_search
{
"size": [your value] //default 10
"from": [your start index] //default 0
"query":
{
"match_all": {}
}
}
I would suggest to use a UI plugin with elasticsearch http://mobz.github.io/elasticsearch-head/This will help you get a better feeling of the indices you create and also test your indices.
我建议使用带有 elasticsearch http://mobz.github.io/elasticsearch-head/的 UI 插件 这将帮助您更好地了解您创建的索引并测试您的索引。
回答by vjpandian
Note:The answer relates to an older version of Elasticsearch
0.90. Versions released since then have an updated syntax. Please refer to other answers that may provide a more accurate answer to the latest answer that you are looking for.
注意:答案与旧版本的 Elasticsearch 相关
0.90。此后发布的版本具有更新的语法。请参阅其他答案,这些答案可能会为您正在寻找的最新答案提供更准确的答案。
The query below would return the NO_OF_RESULTS you would like to be returned..
下面的查询将返回您希望返回的 NO_OF_RESULTS ..
curl -XGET 'localhost:9200/foo/_search?size=NO_OF_RESULTS' -d '
{
"query" : {
"match_all" : {}
}
}'
Now, the question here is that you want allthe records to be returned. So naturally, before writing a query, you wont know the value of NO_OF_RESULTS.
现在,这里的问题是您希望返回所有记录。所以很自然地,在编写查询之前,您不会知道NO_OF_RESULTS的值。
How do we know how many records exist in your document? Simply type the query below
我们如何知道您的文档中存在多少条记录?只需在下面输入查询
curl -XGET 'localhost:9200/foo/_search' -d '
This would give you a result that looks like the one below
这会给你一个看起来像下面的结果
{
hits" : {
"total" : 2357,
"hits" : [
{
..................
The result totaltells you how many records are available in your document. So, that's a nice way to know the value of NO_OF RESULTS
结果总数告诉您文档中有多少记录可用。所以,这是了解NO_OF RESULTS值的好方法
curl -XGET 'localhost:9200/_search' -d '
Search all types in all indices
搜索所有索引中的所有类型
curl -XGET 'localhost:9200/foo/_search' -d '
Search all types in the foo index
搜索 foo 索引中的所有类型
curl -XGET 'localhost:9200/foo1,foo2/_search' -d '
Search all types in the foo1 and foo2 indices
搜索 foo1 和 foo2 索引中的所有类型
curl -XGET 'localhost:9200/f*/_search
Search all types in any indices beginning with f
在以 f 开头的任何索引中搜索所有类型
curl -XGET 'localhost:9200/_all/type1,type2/_search' -d '
Search types user and tweet in all indices
在所有索引中搜索类型用户和推文
回答by HungUnicorn
This is the best solution I found using python client
这是我使用 python 客户端找到的最佳解决方案
# Initialize the scroll
page = es.search(
index = 'yourIndex',
doc_type = 'yourType',
scroll = '2m',
search_type = 'scan',
size = 1000,
body = {
# Your query's body
})
sid = page['_scroll_id']
scroll_size = page['hits']['total']
# Start scrolling
while (scroll_size > 0):
print "Scrolling..."
page = es.scroll(scroll_id = sid, scroll = '2m')
# Update the scroll ID
sid = page['_scroll_id']
# Get the number of results that we returned in the last scroll
scroll_size = len(page['hits']['hits'])
print "scroll size: " + str(scroll_size)
# Do something with the obtained page
https://gist.github.com/drorata/146ce50807d16fd4a6aa
https://gist.github.com/drorata/146ce50807d16fd4a6aa
Using java client
使用java客户端
import static org.elasticsearch.index.query.QueryBuilders.*;
QueryBuilder qb = termQuery("multi", "test");
SearchResponse scrollResp = client.prepareSearch(test)
.addSort(FieldSortBuilder.DOC_FIELD_NAME, SortOrder.ASC)
.setScroll(new TimeValue(60000))
.setQuery(qb)
.setSize(100).execute().actionGet(); //100 hits per shard will be returned for each scroll
//Scroll until no hits are returned
do {
for (SearchHit hit : scrollResp.getHits().getHits()) {
//Handle the hit...
}
scrollResp = client.prepareSearchScroll(scrollResp.getScrollId()).setScroll(new TimeValue(60000)).execute().actionGet();
} while(scrollResp.getHits().getHits().length != 0); // Zero hits mark the end of the scroll and the while loop.
https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/java-search-scrolling.html
https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/java-search-scrolling.html
回答by WoodyDRN
Elasticsearch will get significantslower if you just add some big number as size, one method to use to get all documents is using scan and scroll ids.
如果您只添加一些大数字作为大小,Elasticsearch 会显着变慢,用于获取所有文档的一种方法是使用扫描和滚动 ID。
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html
In Elasticsearch v7.2, you do it like this:
在 Elasticsearch v7.2 中,您可以这样做:
POST /foo/_search?scroll=1m
{
"size": 100,
"query": {
"match_all": {}
}
}
The results from this would contain a _scroll_id which you have to query to get the next 100 chunk.
这样做的结果将包含一个 _scroll_id,您必须查询它才能获得下一个 100 块。
POST /_search/scroll
{
"scroll" : "1m",
"scroll_id" : "<YOUR SCROLL ID>"
}
回答by Oussama L.
use server:9200/_statsalso to get statistics about all your aliases.. like size and number of elements per alias, that's very useful and provides helpful information
使用server:9200/_stats也得到统计您所有的别名..喜欢的大小和每别名元素的数量,这是非常有用的,并提供有用的信息
回答by Somum
If you want to pull many thousands of records then... a few people gave the right answer of using 'scroll' (Note: Some people also suggested using "search_type=scan". This was deprecated, and in v5.0 removed. You don't need it)
如果您想提取数千条记录,那么...一些人给出了使用“滚动”的正确答案(注意:有些人还建议使用“search_type=scan”。这已被弃用,并在 v5.0 中删除。你不需要它)
Start with a 'search' query, but specifying a 'scroll' parameter (here I'm using a 1 minute timeout):
从“搜索”查询开始,但指定“滚动”参数(这里我使用的是 1 分钟超时):
curl -XGET 'http://ip1:9200/myindex/_search?scroll=1m' -d '
{
"query": {
"match_all" : {}
}
}
'
That includes your first 'batch' of hits. But we are not done here. The output of the above curl command would be something like this:
这包括您的第一批“命中”。但我们还没有到此为止。上面 curl 命令的输出将是这样的:
{"_scroll_id":"c2Nhbjs1OzUyNjE6NU4tU3BrWi1UWkNIWVNBZW43bXV3Zzs1Mzc3OkhUQ0g3VGllU2FhemJVNlM5d2t0alE7NTI2Mjo1Ti1TcGtaLVRaQ0hZU0FlbjdtdXdnOzUzNzg6SFRDSDdUaWVTYWF6YlU2Uzl3a3RqUTs1MjYzOjVOLVNwa1otVFpDSFlTQWVuN211d2c7MTt0b3RhbF9oaXRzOjIyNjAxMzU3Ow==","took":109,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":22601357,"max_score":0.0,"hits":[]}}
It's important to have _scroll_id handy as next you should run the following command:
使用 _scroll_id 很重要,接下来您应该运行以下命令:
curl -XGET 'localhost:9200/_search/scroll' -d'
{
"scroll" : "1m",
"scroll_id" : "c2Nhbjs2OzM0NDg1ODpzRlBLc0FXNlNyNm5JWUc1"
}
'
However, passing the scroll_id around is not something designed to be done manually. Your best bet is to write code to do it. e.g. in java:
但是,传递 scroll_id 并不是设计为手动完成的。最好的办法是编写代码来做到这一点。例如在java中:
private TransportClient client = null;
private Settings settings = ImmutableSettings.settingsBuilder()
.put(CLUSTER_NAME,"cluster-test").build();
private SearchResponse scrollResp = null;
this.client = new TransportClient(settings);
this.client.addTransportAddress(new InetSocketTransportAddress("ip", port));
QueryBuilder queryBuilder = QueryBuilders.matchAllQuery();
scrollResp = client.prepareSearch(index).setSearchType(SearchType.SCAN)
.setScroll(new TimeValue(60000))
.setQuery(queryBuilder)
.setSize(100).execute().actionGet();
scrollResp = client.prepareSearchScroll(scrollResp.getScrollId())
.setScroll(new TimeValue(timeVal))
.execute()
.actionGet();
Now LOOP on the last command use SearchResponse to extract the data.
现在 LOOP 在最后一个命令上使用 SearchResponse 来提取数据。
回答by Aminah Nuraini
Simple! You can use sizeand fromparameter!
简单的!您可以使用size和from参数!
http://localhost:9200/[your index name]/_search?size=1000&from=0
then you change the fromgradually until you get all of the data.
然后from逐渐更改,直到获得所有数据。

