Python 如何从 Elasticsearch 中删除文档
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/30859142/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to delete documents from Elasticsearch
提问by Jacobian
I can't find any example of deleting documents from Elasticsearch
in Python. What I've seen by now - is definition of delete
and delete_by_query
functions. But for some reason documentationdoes not provide even a microscopic example of using these functions. The single list of parameters does not tell me too much, if I do not know how to correctly feed them into the function call. So, lets say, I've just inserted one new doc like so:
我找不到任何从Elasticsearch
Python中删除文档的示例。什么我现在看到的-是的定义delete
和delete_by_query
功能。但出于某种原因,文档甚至没有提供使用这些功能的微观示例。如果我不知道如何将它们正确地输入到函数调用中,那么单个参数列表并不会告诉我太多。因此,可以说,我刚刚插入了一个新文档,如下所示:
doc = {'name':'Jacobian'}
db.index(index="reestr",doc_type="some_type",body=doc)
Who in the world knows how can I now delete this document using delete
and delete_by_query
?
世界上谁知道我现在如何使用delete
and删除此文档delete_by_query
?
采纳答案by Serkan
Since you are not giving a document id while indexing your document, you have to get the auto-generated document id from the return value and delete according to the id. Or you can define the id yourself, try the following:
由于您在索引文档时没有提供文档 ID,因此您必须从返回值中获取自动生成的文档 ID 并根据 ID 进行删除。或者您可以自己定义 id,请尝试以下操作:
db.index(index="reestr",doc_type="some_type",id=1919, body=doc)
db.delete(index="reestr",doc_type="some_type",id=1919)
In the other case, you need to look into return value;
在另一种情况下,您需要查看返回值;
r = db.index(index="reestr",doc_type="some_type", body=doc)
# r = {u'_type': u'some_type', u'_id': u'AU36zuFq-fzpr_HkJSkT', u'created': True, u'_version': 1, u'_index': u'reestr'}
db.delete(index="reestr",doc_type="some_type",id=r['_id'])
Another example for delete_by_query. Let's say after adding several documents with name='Jacobian', run the following to delete all documents with name='Jacobian':
delete_by_query 的另一个例子。假设添加了几个 name='Jacobian' 的文档后,运行以下命令删除所有 name='Jacobian' 的文档:
db.delete_by_query(index='reestr',doc_type='some_type', q={'name': 'Jacobian'})
回答by Chaoste
The Delete-By-Query API was removed from the ES core in version 2 for several reasons. This function became a plugin. You can look for more details here:
出于多种原因,Delete-By-Query API 从版本 2 中的 ES 核心中删除。这个功能变成了一个插件。您可以在此处查看更多详细信息:
Why Delete-By-Query is a plugin
Because I didn't want to add another dependency (because I need this later to run in a docker image) I wrote an own function solving this problem. My solution is to search for all quotes with the specified index and type. After that I remove them using the Bulk API:
因为我不想添加另一个依赖项(因为我稍后需要它在 docker 映像中运行),所以我编写了一个自己的函数来解决这个问题。我的解决方案是搜索具有指定索引和类型的所有引号。之后,我使用 Bulk API 删除它们:
def delete_es_type(es, index, type_):
try:
count = es.count(index, type_)['count']
response = es.search(
index=index,
filter_path=["hits.hits._id"],
body={"size": count, "query": {"filtered" : {"filter" : {
"type" : {"value": type_ }}}}})
ids = [x["_id"] for x in response["hits"]["hits"]]
if len(ids) > 0:
return
bulk_body = [
'{{"delete": {{"_index": "{}", "_type": "{}", "_id": "{}"}}}}'
.format(index, type_, x) for x in ids]
es.bulk('\n'.join(bulk_body))
# es.indices.flush_synced([index])
except elasticsearch.exceptions.TransportError as ex:
print("Elasticsearch error: " + ex.error)
raise ex
I hope that helps future googlers ;)
我希望这对未来的谷歌员工有所帮助;)
回答by Jay Patel
One can also do something like this:
一个人也可以做这样的事情:
def delete_by_ids(index, ids):
query = {"query": {"terms": {"_id": ids}}}
res = es.delete_by_query(index=index, body=query)
pprint(res)
# Pass index and list of id that you want to delete.
delete_by_ids('my_index', ['test1', 'test2', 'test3'])
Which will perform the delete operation on bulk data
它将对批量数据执行删除操作