java elasticsearch - 返回字段的标记

Question

提问by Kennedy

How can I have the tokens of a particular field returned in the result

如何在结果中返回特定字段的标记

For example, A GET request

例如，一个 GET 请求

curl -XGET 'http://localhost:9200/twitter/tweet/1'

returns

回报

{
    "_index" : "twitter",
    "_type" : "tweet",
    "_id" : "1", 
    "_source" : {
        "user" : "kimchy",
        "postDate" : "2009-11-15T14:12:12",
        "message" : "trying out Elastic Search"
    } 
}

I would like to have the tokens of '_source.message' field included in the result

我想在结果中包含“_source.message”字段的标记

Answer 1

回答by imotov

There is also another way to do it using the following script_fields script:

还有另一种方法可以使用以下 script_fields 脚本：

curl 'http://localhost:9200/test-idx/_search?pretty=true' -d '{
    "query" : {
        "match_all" : { }
    },
    "script_fields": {
        "terms" : {
            "script": "doc[field].values",
            "params": {
                "field": "message"
            }
        }

    }
}'

It's important to note that while this script returns the actual terms that were indexed, it also caches all field values and on large indices can use a lot of memory. So, on large indices, it might be more useful to retrieve field values from stored fields or source and reparse them again on the fly using the following MVEL script:

请务必注意，虽然此脚本返回已编入索引的实际术语，但它还缓存所有字段值，并且在大型索引上可能会使用大量内存。因此，对于大型索引，使用以下 MVEL 脚本从存储的字段或源中检索字段值并重新解析它们可能更有用：

import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
import java.io.StringReader;

// Cache analyzer for further use
cachedAnalyzer=(isdef cachedAnalyzer)?cachedAnalyzer:doc.mapperService().documentMapper(doc._type.value).mappers().indexAnalyzer();

terms=[];
// Get value from Fields Lookup
//val=_fields[field].values;

// Get value from Source Lookup
val=_source[field];

if(val != null) {
  tokenStream=cachedAnalyzer.tokenStream(field, new StringReader(val)); 
  CharTermAttribute termAttribute = tokenStream.addAttribute(CharTermAttribute); 
  while(tokenStream.incrementToken()) { 
    terms.add(termAttribute.toString())
  }; 
  tokenStream.close(); 
} 
terms

This MVEL script can be stored as config/scripts/analyze.mveland used with the following query:

此 MVEL 脚本可以存储为config/scripts/analyze.mvel以下查询并与以下查询一起使用：

curl 'http://localhost:9200/test-idx/_search?pretty=true' -d '{
    "query" : {
        "match_all" : { }
    },
    "script_fields": {
        "terms" : {
            "script": "analyze",
            "params": {
                "field": "message"
            }
        }

    }
}'

Answer 2

回答by javanna

If you mean the tokens that have been indexed you can make a terms faceton the message field. Increase the sizevalue in order to get more entries back, or set to 0to get all terms.

如果您的意思是已编入索引的令牌，您可以在消息字段上创建一个术语方面。增加size值以获取更多条目，或设置0为获取所有术语。

Lucene provides the ability to store the term vectors, but there's no way to have access to it with elasticsearch by now (as far as I know).

Lucene 提供了存储术语向量的能力，但目前还没有办法通过 elasticsearch 访问它（据我所知）。

Why do you need that? If you only want to check what you're indexing you can have a look at the analyze api.

你为什么需要那个？如果您只想检查索引的内容，可以查看分析 api。

java elasticsearch - 返回字段的标记

提问by Kennedy

回答by imotov

回答by javanna

相关推荐

最近更新

标签

java elasticsearch - 返回字段的标记

提问by Kennedy

回答by imotov

回答by javanna

相关推荐

java GWT RadioButton 更改处理程序

如何关闭在我的 Java 应用程序中运行的所有线程？

java Firebird JDBC 驱动程序连接字符编码

java JavaFX 自动向下滚动滚动窗格

相关推荐

最近更新

标签