Elasticsearch 批量索引 JSON 数据

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/33340153/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-03 18:07:04  来源:igfitidea点击:

Elasticsearch Bulk Index JSON Data

jsonelasticsearch

提问by Amit P

I am trying to bulk index a JSON file into a new Elasticsearch index and am unable to do so. I have the following sample data inside the JSON

我正在尝试将 JSON 文件批量索引到新的 Elasticsearch 索引中,但无法这样做。我在 JSON 中有以下示例数据

[{"Amount": "480", "Quantity": "2", "Id": "975463711", "Client_Store_sk": "1109"},
{"Amount": "2105", "Quantity": "2", "Id": "975463943", "Client_Store_sk": "1109"},
{"Amount": "2107", "Quantity": "3", "Id": "974920111", "Client_Store_sk": "1109"},
{"Amount": "2115", "Quantity": "2", "Id": "975463798", "Client_Store_sk": "1109"},
{"Amount": "2116", "Quantity": "1", "Id": "975463827", "Client_Store_sk": "1109"},
{"Amount": "648", "Quantity": "3", "Id": "975464139", "Client_Store_sk": "1109"},
{"Amount": "2126", "Quantity": "2", "Id": "975464805", "Client_Store_sk": "1109"},
{"Amount": "2133", "Quantity": "1", "Id": "975464061", "Client_Store_sk": "1109"},
{"Amount": "1339", "Quantity": "4", "Id": "974919458", "Client_Store_sk": "1109"},
{"Amount": "1196", "Quantity": "5", "Id": "974920538", "Client_Store_sk": "1109"},
{"Amount": "1198", "Quantity": "4", "Id": "975463638", "Client_Store_sk": "1109"},
{"Amount": "1345", "Quantity": "4", "Id": "974919522", "Client_Store_sk": "1109"},
{"Amount": "1347", "Quantity": "2", "Id": "974919563", "Client_Store_sk": "1109"},
{"Amount": "673", "Quantity": "2", "Id": "975464359", "Client_Store_sk": "1109"},
{"Amount": "2153", "Quantity": "1", "Id": "975464511", "Client_Store_sk": "1109"},
{"Amount": "3896", "Quantity": "4", "Id": "977289342", "Client_Store_sk": "1109"},
{"Amount": "3897", "Quantity": "4", "Id": "974920602", "Client_Store_sk": "1109"}]

I am using

我在用

 curl -XPOST localhost:9200/index_local/my_doc_type/_bulk --data-binary --data @/home/data1.json 

When I try to use the standard bulk index API from Elasticsearch I get this error

当我尝试使用 Elasticsearch 的标准批量索引 API 时,出现此错误

 error: {"message":"ActionRequestValidationException[Validation Failed: 1: no requests added;]"}

Can anyone help with indexing this type of JSON?

任何人都可以帮助索引这种类型的 JSON 吗?

回答by Val

What you need to do is to read that JSON file and then build a bulk request with the format expected by the _bulkendpoint, i.e. one line for the command and one line for the document, separated by a newline character... rinse and repeat for each document:

您需要做的是读取该 JSON 文件,然后使用_bulk端点预期的格式构建批量请求,即一行用于命令,一行用于文档,由换行符分隔......冲洗并重复每个文件:

curl -XPOST localhost:9200/your_index/_bulk -d '
{"index": {"_index": "your_index", "_type": "your_type", "_id": "975463711"}}
{"Amount": "480", "Quantity": "2", "Id": "975463711", "Client_Store_sk": "1109"}
{"index": {"_index": "your_index", "_type": "your_type", "_id": "975463943"}}
{"Amount": "2105", "Quantity": "2", "Id": "975463943", "Client_Store_sk": "1109"}
... etc for all your documents
'

Just make sure to replace your_indexand your_typewith the actual index and type names you're using.

只需确保用您正在使用的实际索引和类型名称替换your_indexyour_type

UPDATE

更新

Note that the command-line can be shortened, by removing _indexand _typeif those are specified in your URL. It is also possible to remove _idif you specify the path to your id fieldin your mapping (note that this feature will be deprecated in ES 2.0, though). At the very least, your command line can look like {"index":{}}for all documents but it will always be mandatory in order to specify which kind of operation you want to perform (in this case indexthe document)

请注意,可以通过删除_index_type在您的 URL 中指定的方式来缩短命令行。_id如果您在映射中指定id 字段路径,也可以删除(请注意,此功能将在 ES 2.0 中弃用)。至少,您的命令行可以{"index":{}}用于所有文档,但它始终是强制性的,以指定您要执行的操作类型(在本例中index为文档)

UPDATE 2

更新 2

curl -XPOST localhost:9200/index_local/my_doc_type/_bulk --data-binary  @/home/data1.json

/home/data1.jsonshould look like this:

/home/data1.json应该是这样的:

{"index":{}}
{"Amount": "480", "Quantity": "2", "Id": "975463711", "Client_Store_sk": "1109"}
{"index":{}}
{"Amount": "2105", "Quantity": "2", "Id": "975463943", "Client_Store_sk": "1109"}
{"index":{}}
{"Amount": "2107", "Quantity": "3", "Id": "974920111", "Client_Store_sk": "1109"}

回答by Thomas

As of today, 6.1.2 is the latest version of ElasticSearch, and the curl command that works for me on Windows (x64) is

截至今天,6.1.2 是 ElasticSearch 的最新版本,适用于我在 Windows (x64) 上的 curl 命令是

curl -s -XPOST localhost:9200/my_index/my_index_type/_bulk -H "Content-Type: 
application/x-ndjson" --data-binary @D:\data\mydata.json

The format of the data that should be present in mydata.json remains the same as shown in @val's answer

应存在于 mydata.json 中的数据格式与@val 的回答中显示的相同

回答by Tadej

A valid Elasticsearch bulk APIrequest would be something like (ending with a newline):

有效的Elasticsearch 批量 API请求类似于(以换行符结尾):

POST http://localhost:9200/products_slo_development_temp_2/productModel/_bulk

POST http://localhost:9200/products_slo_development_temp_2/productModel/_bulk

{ "index":{ } } 
{"RequestedCountry":"slo","Id":1860,"Title":"Stol"} 
{ "index":{ } } 
{"RequestedCountry":"slo","Id":1860,"Title":"Miza"} 

Elasticsearch bulk api documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html

Elasticsearch 批量 API 文档:https: //www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html

This is how I do it

这就是我的做法

I send a POST http request with the urivaliable as the URI/URL of the http request and elasticsearchJsonvariable is the JSON sent in the body of the http request formatted for the Elasticsearch bulk api:

我发送了一个 POST http 请求,其中urivaliable 作为http 请求的 URI/URL,elasticsearchJson变量是在为 Elasticsearch 批量 api 格式化的 http 请求正文中发送的 JSON:

var uri = @"/" + indexName + "/productModel/_bulk";
var json = JsonConvert.SerializeObject(sqlResult);
var elasticsearchJson = GetElasticsearchBulkJsonFromJson(json, "RequestedCountry");

Helper method for generating the required json format for the Elasticsearch bulk api:

生成 Elasticsearch 批量 api 所需的 json 格式的 Helper 方法:

public string GetElasticsearchBulkJsonFromJson(string jsonStringWithArrayOfObjects, string firstParameterNameOfObjectInJsonStringArrayOfObjects)
{
  return @"{ ""index"":{ } } 
" + jsonStringWithArrayOfObjects.Substring(1, jsonStringWithArrayOfObjects.Length - 2).Replace(@",{""" + firstParameterNameOfObjectInJsonStringArrayOfObjects + @"""", @" 
{ ""index"":{ } } 
{""" + firstParameterNameOfObjectInJsonStringArrayOfObjects + @"""") + @"
";
}

The first property/field in my JSON object is the RequestedCountryproperty that's why I use it in this example.

我的 JSON 对象中的第一个属性/字段是我RequestedCountry在本示例中使用它的原因。

productModelis my Elasticsearch document type. sqlResultis a C# generic list with products.

productModel是我的 Elasticsearch 文档类型。 sqlResult是带有产品的 C# 通用列表。