Python 如何从 Elasticsearch 中删除文档

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/30859142/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 09:05:29  来源:igfitidea点击:

How to delete documents from Elasticsearch

pythonelasticsearch

提问by Jacobian

I can't find any example of deleting documents from Elasticsearchin Python. What I've seen by now - is definition of deleteand delete_by_queryfunctions. But for some reason documentationdoes not provide even a microscopic example of using these functions. The single list of parameters does not tell me too much, if I do not know how to correctly feed them into the function call. So, lets say, I've just inserted one new doc like so:

我找不到任何从ElasticsearchPython中删除文档的示例。什么我现在看到的-是的定义deletedelete_by_query功能。但出于某种原因,文档甚至没有提供使用这些功能的微观示例。如果我不知道如何将它们正确地输入到函数调用中,那么单个参数列表并不会告诉我太多。因此,可以说,我刚刚插入了一个新文档,如下所示:

doc = {'name':'Jacobian'}
db.index(index="reestr",doc_type="some_type",body=doc)

Who in the world knows how can I now delete this document using deleteand delete_by_query?

世界上谁知道我现在如何使用deleteand删除此文档delete_by_query

采纳答案by Serkan

Since you are not giving a document id while indexing your document, you have to get the auto-generated document id from the return value and delete according to the id. Or you can define the id yourself, try the following:

由于您在索引文档时没有提供文档 ID,因此您必须从返回值中获取自动生成的文档 ID 并根据 ID 进行删除。或者您可以自己定义 id,请尝试以下操作:

 db.index(index="reestr",doc_type="some_type",id=1919, body=doc)

 db.delete(index="reestr",doc_type="some_type",id=1919)

In the other case, you need to look into return value;

在另一种情况下,您需要查看返回值;

 r = db.index(index="reestr",doc_type="some_type", body=doc)
 # r = {u'_type': u'some_type', u'_id': u'AU36zuFq-fzpr_HkJSkT', u'created': True, u'_version': 1, u'_index': u'reestr'}

 db.delete(index="reestr",doc_type="some_type",id=r['_id'])

Another example for delete_by_query. Let's say after adding several documents with name='Jacobian', run the following to delete all documents with name='Jacobian':

delete_by_query 的另一个例子。假设添加了几个 name='Jacobian' 的文档后,运行以下命令删除所有 name='Jacobian' 的文档:

 db.delete_by_query(index='reestr',doc_type='some_type', q={'name': 'Jacobian'})

回答by Chaoste

The Delete-By-Query API was removed from the ES core in version 2 for several reasons. This function became a plugin. You can look for more details here:

出于多种原因,Delete-By-Query API 从版本 2 中的 ES 核心中删除。这个功能变成了一个插件。您可以在此处查看更多详细信息:

Why Delete-By-Query is a plugin

为什么按查询删除是一个插件

Delete By Query Plugin

按查询删除插件

Because I didn't want to add another dependency (because I need this later to run in a docker image) I wrote an own function solving this problem. My solution is to search for all quotes with the specified index and type. After that I remove them using the Bulk API:

因为我不想添加另一个依赖项(因为我稍后需要它在 docker 映像中运行),所以我编写了一个自己的函数来解决这个问题。我的解决方案是搜索具有指定索引和类型的所有引号。之后,我使用 Bulk API 删除它们:

def delete_es_type(es, index, type_):
    try:
        count = es.count(index, type_)['count']
        response = es.search(
            index=index,
            filter_path=["hits.hits._id"],
            body={"size": count, "query": {"filtered" : {"filter" : {
                  "type" : {"value": type_ }}}}})
        ids = [x["_id"] for x in response["hits"]["hits"]]
        if len(ids) > 0:
            return
        bulk_body = [
            '{{"delete": {{"_index": "{}", "_type": "{}", "_id": "{}"}}}}'
            .format(index, type_, x) for x in ids]
        es.bulk('\n'.join(bulk_body))
        # es.indices.flush_synced([index])
    except elasticsearch.exceptions.TransportError as ex:
        print("Elasticsearch error: " + ex.error)
        raise ex

I hope that helps future googlers ;)

我希望这对未来的谷歌员工有所帮助;)

回答by Jay Patel

One can also do something like this:

一个人也可以做这样的事情:

def delete_by_ids(index, ids):
    query = {"query": {"terms": {"_id": ids}}}
    res = es.delete_by_query(index=index, body=query)
    pprint(res)

# Pass index and list of id that you want to delete.
delete_by_ids('my_index', ['test1', 'test2', 'test3'])

Which will perform the delete operation on bulk data

它将对批量数据执行删除操作