Elasticsearch 批量索引 JSON 数据
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/33340153/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Elasticsearch Bulk Index JSON Data
提问by Amit P
I am trying to bulk index a JSON file into a new Elasticsearch index and am unable to do so. I have the following sample data inside the JSON
我正在尝试将 JSON 文件批量索引到新的 Elasticsearch 索引中,但无法这样做。我在 JSON 中有以下示例数据
[{"Amount": "480", "Quantity": "2", "Id": "975463711", "Client_Store_sk": "1109"},
{"Amount": "2105", "Quantity": "2", "Id": "975463943", "Client_Store_sk": "1109"},
{"Amount": "2107", "Quantity": "3", "Id": "974920111", "Client_Store_sk": "1109"},
{"Amount": "2115", "Quantity": "2", "Id": "975463798", "Client_Store_sk": "1109"},
{"Amount": "2116", "Quantity": "1", "Id": "975463827", "Client_Store_sk": "1109"},
{"Amount": "648", "Quantity": "3", "Id": "975464139", "Client_Store_sk": "1109"},
{"Amount": "2126", "Quantity": "2", "Id": "975464805", "Client_Store_sk": "1109"},
{"Amount": "2133", "Quantity": "1", "Id": "975464061", "Client_Store_sk": "1109"},
{"Amount": "1339", "Quantity": "4", "Id": "974919458", "Client_Store_sk": "1109"},
{"Amount": "1196", "Quantity": "5", "Id": "974920538", "Client_Store_sk": "1109"},
{"Amount": "1198", "Quantity": "4", "Id": "975463638", "Client_Store_sk": "1109"},
{"Amount": "1345", "Quantity": "4", "Id": "974919522", "Client_Store_sk": "1109"},
{"Amount": "1347", "Quantity": "2", "Id": "974919563", "Client_Store_sk": "1109"},
{"Amount": "673", "Quantity": "2", "Id": "975464359", "Client_Store_sk": "1109"},
{"Amount": "2153", "Quantity": "1", "Id": "975464511", "Client_Store_sk": "1109"},
{"Amount": "3896", "Quantity": "4", "Id": "977289342", "Client_Store_sk": "1109"},
{"Amount": "3897", "Quantity": "4", "Id": "974920602", "Client_Store_sk": "1109"}]
I am using
我在用
curl -XPOST localhost:9200/index_local/my_doc_type/_bulk --data-binary --data @/home/data1.json
When I try to use the standard bulk index API from Elasticsearch I get this error
当我尝试使用 Elasticsearch 的标准批量索引 API 时,出现此错误
error: {"message":"ActionRequestValidationException[Validation Failed: 1: no requests added;]"}
Can anyone help with indexing this type of JSON?
任何人都可以帮助索引这种类型的 JSON 吗?
回答by Val
What you need to do is to read that JSON file and then build a bulk request with the format expected by the _bulkendpoint, i.e. one line for the command and one line for the document, separated by a newline character... rinse and repeat for each document:
您需要做的是读取该 JSON 文件,然后使用_bulk端点预期的格式构建批量请求,即一行用于命令,一行用于文档,由换行符分隔......冲洗并重复每个文件:
curl -XPOST localhost:9200/your_index/_bulk -d '
{"index": {"_index": "your_index", "_type": "your_type", "_id": "975463711"}}
{"Amount": "480", "Quantity": "2", "Id": "975463711", "Client_Store_sk": "1109"}
{"index": {"_index": "your_index", "_type": "your_type", "_id": "975463943"}}
{"Amount": "2105", "Quantity": "2", "Id": "975463943", "Client_Store_sk": "1109"}
... etc for all your documents
'
Just make sure to replace your_indexand your_typewith the actual index and type names you're using.
只需确保用您正在使用的实际索引和类型名称替换your_index和your_type。
UPDATE
更新
Note that the command-line can be shortened, by removing _indexand _typeif those are specified in your URL. It is also possible to remove _idif you specify the path to your id fieldin your mapping (note that this feature will be deprecated in ES 2.0, though). At the very least, your command line can look like {"index":{}}for all documents but it will always be mandatory in order to specify which kind of operation you want to perform (in this case indexthe document)
请注意,可以通过删除_index和_type在您的 URL 中指定的方式来缩短命令行。_id如果您在映射中指定id 字段的路径,也可以删除(请注意,此功能将在 ES 2.0 中弃用)。至少,您的命令行可以{"index":{}}用于所有文档,但它始终是强制性的,以指定您要执行的操作类型(在本例中index为文档)
UPDATE 2
更新 2
curl -XPOST localhost:9200/index_local/my_doc_type/_bulk --data-binary @/home/data1.json
/home/data1.jsonshould look like this:
/home/data1.json应该是这样的:
{"index":{}}
{"Amount": "480", "Quantity": "2", "Id": "975463711", "Client_Store_sk": "1109"}
{"index":{}}
{"Amount": "2105", "Quantity": "2", "Id": "975463943", "Client_Store_sk": "1109"}
{"index":{}}
{"Amount": "2107", "Quantity": "3", "Id": "974920111", "Client_Store_sk": "1109"}
回答by Thomas
As of today, 6.1.2 is the latest version of ElasticSearch, and the curl command that works for me on Windows (x64) is
截至今天,6.1.2 是 ElasticSearch 的最新版本,适用于我在 Windows (x64) 上的 curl 命令是
curl -s -XPOST localhost:9200/my_index/my_index_type/_bulk -H "Content-Type:
application/x-ndjson" --data-binary @D:\data\mydata.json
The format of the data that should be present in mydata.json remains the same as shown in @val's answer
应存在于 mydata.json 中的数据格式与@val 的回答中显示的相同
回答by Tadej
A valid Elasticsearch bulk APIrequest would be something like (ending with a newline):
有效的Elasticsearch 批量 API请求类似于(以换行符结尾):
POST http://localhost:9200/products_slo_development_temp_2/productModel/_bulk
POST http://localhost:9200/products_slo_development_temp_2/productModel/_bulk
{ "index":{ } }
{"RequestedCountry":"slo","Id":1860,"Title":"Stol"}
{ "index":{ } }
{"RequestedCountry":"slo","Id":1860,"Title":"Miza"}
Elasticsearch bulk api documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html
Elasticsearch 批量 API 文档:https: //www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html
This is how I do it
这就是我的做法
I send a POST http request with the urivaliable as the URI/URL of the http request and elasticsearchJsonvariable is the JSON sent in the body of the http request formatted for the Elasticsearch bulk api:
我发送了一个 POST http 请求,其中urivaliable 作为http 请求的 URI/URL,elasticsearchJson变量是在为 Elasticsearch 批量 api 格式化的 http 请求正文中发送的 JSON:
var uri = @"/" + indexName + "/productModel/_bulk";
var json = JsonConvert.SerializeObject(sqlResult);
var elasticsearchJson = GetElasticsearchBulkJsonFromJson(json, "RequestedCountry");
Helper method for generating the required json format for the Elasticsearch bulk api:
生成 Elasticsearch 批量 api 所需的 json 格式的 Helper 方法:
public string GetElasticsearchBulkJsonFromJson(string jsonStringWithArrayOfObjects, string firstParameterNameOfObjectInJsonStringArrayOfObjects)
{
return @"{ ""index"":{ } }
" + jsonStringWithArrayOfObjects.Substring(1, jsonStringWithArrayOfObjects.Length - 2).Replace(@",{""" + firstParameterNameOfObjectInJsonStringArrayOfObjects + @"""", @"
{ ""index"":{ } }
{""" + firstParameterNameOfObjectInJsonStringArrayOfObjects + @"""") + @"
";
}
The first property/field in my JSON object is the RequestedCountryproperty that's why I use it in this example.
我的 JSON 对象中的第一个属性/字段是我RequestedCountry在本示例中使用它的原因。
productModelis my Elasticsearch document type.
sqlResultis a C# generic list with products.
productModel是我的 Elasticsearch 文档类型。
sqlResult是带有产品的 C# 通用列表。

