database Elasticsearch 查询返回所有记录

Question

提问by John Livermore

I have a small database in Elasticsearch and for testing purposes would like to pull all records back. I am attempting to use a URL of the form...

我在 Elasticsearch 中有一个小型数据库，出于测试目的，我想拉回所有记录。我正在尝试使用表单的 URL...

http://localhost:9200/foo/_search?pretty=true&q={'matchAll':{''}}

Can someone give me the URL you would use to accomplish this, please?

有人可以给我您用来完成此操作的网址吗？

Answer 1

回答by Steve Casey

I think lucene syntax is supported so:

我认为支持 lucene 语法，因此：

http://localhost:9200/foo/_search?pretty=true&q=*:*

size defaults to 10, so you may also need &size=BIGNUMBERto get more than 10 items. (where BIGNUMBER equals a number you believe is bigger than your dataset)

size 默认为 10，因此您可能还需要&size=BIGNUMBER获取 10 个以上的项目。（其中 BIGNUMBER 等于您认为大于数据集的数字）

BUT, elasticsearch documentation suggestsfor large result sets, using the scan search type.

但是，elasticsearch 文档建议使用扫描搜索类型来处理大型结果集。

EG:

例如：

curl -XGET 'localhost:9200/foo/_search?search_type=scan&scroll=10m&size=50' -d '
{
    "query" : {
        "match_all" : {}
    }
}'

and then keep requesting as per the documentation link above suggests.

然后按照上面的文档链接继续请求。

EDIT: scanDeprecated in 2.1.0.

编辑：scan在 2.1.0 中已弃用。

scandoes not provide any benefits over a regular scrollrequest sorted by _doc. link to elastic docs(spotted by @christophe-roussy)

scan与scroll按排序的常规请求相比没有任何好处_doc。弹性文档链接（由@christophe-roussy 发现）

Answer 2

回答by lfender6445

http://127.0.0.1:9200/foo/_search/?size=1000&pretty=1
                                   ^

Note the size param, which increases the hits displayed from the default (10) to 1000 per shard.

请注意 size param，它将每个分片显示的命中数从默认值 (10) 增加到 1000。

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-from-size.html

Answer 3

回答by Prerak Diwan

elasticsearch(ES) supports both a GET or a POST request for getting the data from the ES cluster index.

elasticsearch(ES) 支持 GET 或 POST 请求从 ES 集群索引中获取数据。

When we do a GET:

当我们执行 GET 时：

http://localhost:9200/[your index name]/_search?size=[no of records you want]&q=*:*

When we do a POST:

当我们执行 POST 时：

http://localhost:9200/[your_index_name]/_search
{
  "size": [your value] //default 10
  "from": [your start index] //default 0
  "query":
   {
    "match_all": {}
   }
}

I would suggest to use a UI plugin with elasticsearch http://mobz.github.io/elasticsearch-head/This will help you get a better feeling of the indices you create and also test your indices.

我建议使用带有 elasticsearch http://mobz.github.io/elasticsearch-head/的 UI 插件这将帮助您更好地了解您创建的索引并测试您的索引。

Answer 4

回答by vjpandian

Note:The answer relates to an older version of Elasticsearch 0.90. Versions released since then have an updated syntax. Please refer to other answers that may provide a more accurate answer to the latest answer that you are looking for.

注意：答案与旧版本的 Elasticsearch 相关0.90。此后发布的版本具有更新的语法。请参阅其他答案，这些答案可能会为您正在寻找的最新答案提供更准确的答案。

The query below would return the NO_OF_RESULTS you would like to be returned..

下面的查询将返回您希望返回的 NO_OF_RESULTS ..

curl -XGET 'localhost:9200/foo/_search?size=NO_OF_RESULTS' -d '
{
"query" : {
    "match_all" : {}
  }
}'

Now, the question here is that you want allthe records to be returned. So naturally, before writing a query, you wont know the value of NO_OF_RESULTS.

现在，这里的问题是您希望返回所有记录。所以很自然地，在编写查询之前，您不会知道NO_OF_RESULTS的值。

How do we know how many records exist in your document? Simply type the query below

我们如何知道您的文档中存在多少条记录？只需在下面输入查询

curl -XGET 'localhost:9200/foo/_search' -d '

This would give you a result that looks like the one below

这会给你一个看起来像下面的结果

 {
hits" : {
  "total" :       2357,
  "hits" : [
    {
      ..................

The result totaltells you how many records are available in your document. So, that's a nice way to know the value of NO_OF RESULTS

结果总数告诉您文档中有多少记录可用。所以，这是了解NO_OF RESULTS值的好方法

curl -XGET 'localhost:9200/_search' -d '

Search all types in all indices

搜索所有索引中的所有类型

curl -XGET 'localhost:9200/foo/_search' -d '

Search all types in the foo index

搜索 foo 索引中的所有类型

curl -XGET 'localhost:9200/foo1,foo2/_search' -d '

Search all types in the foo1 and foo2 indices

搜索 foo1 和 foo2 索引中的所有类型

curl -XGET 'localhost:9200/f*/_search

Search all types in any indices beginning with f

在以 f 开头的任何索引中搜索所有类型

curl -XGET 'localhost:9200/_all/type1,type2/_search' -d '

Search types user and tweet in all indices

在所有索引中搜索类型用户和推文

Answer 5

回答by HungUnicorn

This is the best solution I found using python client

这是我使用 python 客户端找到的最佳解决方案

  # Initialize the scroll
  page = es.search(
  index = 'yourIndex',
  doc_type = 'yourType',
  scroll = '2m',
  search_type = 'scan',
  size = 1000,
  body = {
    # Your query's body
    })
  sid = page['_scroll_id']
  scroll_size = page['hits']['total']

  # Start scrolling
  while (scroll_size > 0):
    print "Scrolling..."
    page = es.scroll(scroll_id = sid, scroll = '2m')
    # Update the scroll ID
    sid = page['_scroll_id']
    # Get the number of results that we returned in the last scroll
    scroll_size = len(page['hits']['hits'])
    print "scroll size: " + str(scroll_size)
    # Do something with the obtained page

https://gist.github.com/drorata/146ce50807d16fd4a6aa

Using java client

使用java客户端

import static org.elasticsearch.index.query.QueryBuilders.*;

QueryBuilder qb = termQuery("multi", "test");

SearchResponse scrollResp = client.prepareSearch(test)
        .addSort(FieldSortBuilder.DOC_FIELD_NAME, SortOrder.ASC)
        .setScroll(new TimeValue(60000))
        .setQuery(qb)
        .setSize(100).execute().actionGet(); //100 hits per shard will be returned for each scroll
//Scroll until no hits are returned
do {
    for (SearchHit hit : scrollResp.getHits().getHits()) {
        //Handle the hit...
    }

    scrollResp = client.prepareSearchScroll(scrollResp.getScrollId()).setScroll(new TimeValue(60000)).execute().actionGet();
} while(scrollResp.getHits().getHits().length != 0); // Zero hits mark the end of the scroll and the while loop.

https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/java-search-scrolling.html

Answer 6

回答by WoodyDRN

Elasticsearch will get significantslower if you just add some big number as size, one method to use to get all documents is using scan and scroll ids.

如果您只添加一些大数字作为大小，Elasticsearch 会显着变慢，用于获取所有文档的一种方法是使用扫描和滚动 ID。

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html

In Elasticsearch v7.2, you do it like this:

在 Elasticsearch v7.2 中，您可以这样做：

POST /foo/_search?scroll=1m
{
    "size": 100,
    "query": {
        "match_all": {}
    }
}

The results from this would contain a _scroll_id which you have to query to get the next 100 chunk.

这样做的结果将包含一个 _scroll_id，您必须查询它才能获得下一个 100 块。

POST /_search/scroll 
{
    "scroll" : "1m", 
    "scroll_id" : "<YOUR SCROLL ID>" 
}

Answer 7

回答by Oussama L.

use server:9200/_statsalso to get statistics about all your aliases.. like size and number of elements per alias, that's very useful and provides helpful information

使用server:9200/_stats也得到统计您所有的别名..喜欢的大小和每别名元素的数量，这是非常有用的，并提供有用的信息

Answer 8

回答by Somum

If you want to pull many thousands of records then... a few people gave the right answer of using 'scroll' (Note: Some people also suggested using "search_type=scan". This was deprecated, and in v5.0 removed. You don't need it)

如果您想提取数千条记录，那么...一些人给出了使用“滚动”的正确答案（注意：有些人还建议使用“search_type=scan”。这已被弃用，并在 v5.0 中删除。你不需要它）

Start with a 'search' query, but specifying a 'scroll' parameter (here I'm using a 1 minute timeout):

从“搜索”查询开始，但指定“滚动”参数（这里我使用的是 1 分钟超时）：

curl -XGET 'http://ip1:9200/myindex/_search?scroll=1m' -d '
{
    "query": {
            "match_all" : {}
    }
}
'

That includes your first 'batch' of hits. But we are not done here. The output of the above curl command would be something like this:

这包括您的第一批“命中”。但我们还没有到此为止。上面 curl 命令的输出将是这样的：

{"_scroll_id":"c2Nhbjs1OzUyNjE6NU4tU3BrWi1UWkNIWVNBZW43bXV3Zzs1Mzc3OkhUQ0g3VGllU2FhemJVNlM5d2t0alE7NTI2Mjo1Ti1TcGtaLVRaQ0hZU0FlbjdtdXdnOzUzNzg6SFRDSDdUaWVTYWF6YlU2Uzl3a3RqUTs1MjYzOjVOLVNwa1otVFpDSFlTQWVuN211d2c7MTt0b3RhbF9oaXRzOjIyNjAxMzU3Ow==","took":109,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":22601357,"max_score":0.0,"hits":[]}}

It's important to have _scroll_id handy as next you should run the following command:

使用 _scroll_id 很重要，接下来您应该运行以下命令：

    curl -XGET  'localhost:9200/_search/scroll'  -d'
    {
        "scroll" : "1m", 
        "scroll_id" : "c2Nhbjs2OzM0NDg1ODpzRlBLc0FXNlNyNm5JWUc1" 
    }
    '

However, passing the scroll_id around is not something designed to be done manually. Your best bet is to write code to do it. e.g. in java:

但是，传递 scroll_id 并不是设计为手动完成的。最好的办法是编写代码来做到这一点。例如在java中：

    private TransportClient client = null;
    private Settings settings = ImmutableSettings.settingsBuilder()
                  .put(CLUSTER_NAME,"cluster-test").build();
    private SearchResponse scrollResp  = null;

    this.client = new TransportClient(settings);
    this.client.addTransportAddress(new InetSocketTransportAddress("ip", port));

    QueryBuilder queryBuilder = QueryBuilders.matchAllQuery();
    scrollResp = client.prepareSearch(index).setSearchType(SearchType.SCAN)
                 .setScroll(new TimeValue(60000))                            
                 .setQuery(queryBuilder)
                 .setSize(100).execute().actionGet();

    scrollResp = client.prepareSearchScroll(scrollResp.getScrollId())
                .setScroll(new TimeValue(timeVal))
                .execute()
                .actionGet();

Now LOOP on the last command use SearchResponse to extract the data.

现在 LOOP 在最后一个命令上使用 SearchResponse 来提取数据。

Answer 9

回答by Aminah Nuraini

Simple! You can use sizeand fromparameter!

简单的！您可以使用size和from参数！

http://localhost:9200/[your index name]/_search?size=1000&from=0

then you change the fromgradually until you get all of the data.

然后from逐渐更改，直到获得所有数据。

Answer 10

回答by Daniel

You can use the _countAPI to get the value for the sizeparameter:

您可以使用_countAPI 获取size参数的值：

http://localhost:9200/foo/_count?q=<your query>

Returns {count:X, ...}. Extract value 'X' and then do the actual query:

返回{count:X, ...}。提取值 'X' 然后进行实际查询：

http://localhost:9200/foo/_search?q=<your query>&size=X

database Elasticsearch 查询返回所有记录

提问by John Livermore

回答by Steve Casey

回答by lfender6445

回答by Prerak Diwan

回答by vjpandian

回答by HungUnicorn

回答by WoodyDRN

回答by Oussama L.

回答by Somum

回答by Aminah Nuraini

回答by Daniel

相关推荐

最近更新

标签

database Elasticsearch 查询返回所有记录

提问by John Livermore

回答by Steve Casey

回答by lfender6445

回答by Prerak Diwan

回答by vjpandian

回答by HungUnicorn

回答by WoodyDRN

回答by Oussama L.

回答by Somum

回答by Aminah Nuraini

回答by Daniel

相关推荐

database Oracle - 创建索引或添加列后是否需要计算统计信息？

database 两阶段提交

database 如何为 Jetbrains PHPStorm 和 WebStorm 配置 DB Navigator

database 消息传递应用程序的数据库架构

相关推荐

最近更新

标签