<small id='Kp4jA'></small><noframes id='Kp4jA'>

    <tfoot id='Kp4jA'></tfoot>
    <i id='Kp4jA'><tr id='Kp4jA'><dt id='Kp4jA'><q id='Kp4jA'><span id='Kp4jA'><b id='Kp4jA'><form id='Kp4jA'><ins id='Kp4jA'></ins><ul id='Kp4jA'></ul><sub id='Kp4jA'></sub></form><legend id='Kp4jA'></legend><bdo id='Kp4jA'><pre id='Kp4jA'><center id='Kp4jA'></center></pre></bdo></b><th id='Kp4jA'></th></span></q></dt></tr></i><div id='Kp4jA'><tfoot id='Kp4jA'></tfoot><dl id='Kp4jA'><fieldset id='Kp4jA'></fieldset></dl></div>
      <legend id='Kp4jA'><style id='Kp4jA'><dir id='Kp4jA'><q id='Kp4jA'></q></dir></style></legend>

        • <bdo id='Kp4jA'></bdo><ul id='Kp4jA'></ul>

        “-"的 Lucene 索引问题特点

        Lucene Index problems with quot;-quot; character(“-的 Lucene 索引问题特点)
        <tfoot id='7oRMf'></tfoot>
          <legend id='7oRMf'><style id='7oRMf'><dir id='7oRMf'><q id='7oRMf'></q></dir></style></legend>
        • <i id='7oRMf'><tr id='7oRMf'><dt id='7oRMf'><q id='7oRMf'><span id='7oRMf'><b id='7oRMf'><form id='7oRMf'><ins id='7oRMf'></ins><ul id='7oRMf'></ul><sub id='7oRMf'></sub></form><legend id='7oRMf'></legend><bdo id='7oRMf'><pre id='7oRMf'><center id='7oRMf'></center></pre></bdo></b><th id='7oRMf'></th></span></q></dt></tr></i><div id='7oRMf'><tfoot id='7oRMf'></tfoot><dl id='7oRMf'><fieldset id='7oRMf'></fieldset></dl></div>

            • <small id='7oRMf'></small><noframes id='7oRMf'>

                <tbody id='7oRMf'></tbody>
                <bdo id='7oRMf'></bdo><ul id='7oRMf'></ul>

                  本文介绍了“-"的 Lucene 索引问题特点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

                  问题描述

                  我在使用 Lucene 索引时遇到问题,该索引的索引词包含-"字符.

                  I'm having trouble with a Lucene Index, which has indexed words, that contain "-" Characters.

                  它适用于某些包含-"的单词,但不适用于所有单词,我找不到原因,为什么它不起作用.

                  It works for some words that contain "-" but not for all and I don't find the reason, why it's not working.

                  我正在搜索的字段经过分析并包含带有和不带有-"字符的单词的版本.

                  The field I'm searching in, is analyzed and contains version of the word with and without the "-" character.

                  我正在使用分析器:org.apache.lucene.analysis.standard.StandardAnalyzer

                  I'm using the analyzer: org.apache.lucene.analysis.standard.StandardAnalyzer

                  这里是一个例子:

                  如果我搜索gsx-*"我得到一个结果,索引字段包含铃木 GSX-R 1000 GSX-R1000 GSXR"

                  if I search for "gsx-*" I got a result, the indexed field contains "SUZUKI GSX-R 1000 GSX-R1000 GSXR"

                  但如果我搜索v-*",我没有得到任何结果.预期结果的索引字段包含:铃木 DL 1000 V-STROM DL1000V-STROMVSTROM V STROM"

                  but if I search for "v-*" I got no result. The indexed field of the expected result contains: "SUZUKI DL 1000 V-STROM DL1000V-STROMVSTROM V STROM"

                  如果我在没有*"的情况下搜索v-strom",它可以工作,但如果我只搜索v-str",例如我不会得到结果.(应该有结果,因为它是针对网上商店的实时搜索)

                  If I search for "v-strom" without "*" it works, but if I just search for "v-str" for example I don't get the result. (There should be a result because it's for a live search for a webshop)

                  那么,两个预期结果之间有什么区别?为什么它适用于gsx-"而不适用于v-"?

                  So, what's the difference between the 2 expected results? why does it work for "gsx-" but not for "v-" ?

                  推荐答案

                  我相信,StandardAnalyzer 会将连字符视为空格.所以它把你的查询 "gsx-*" 变成 "gsx*""v-*" 变成空,因为 at 也消除了单字母令牌.您在搜索结果中看到的字段内容是该字段的存储值,它完全独立于为该字段编制索引的术语.

                  StandardAnalyzer will treat the hyphen as whitespace, I believe. So it turns your query "gsx-*" into "gsx*" and "v-*" into nothing because at also eliminates single-letter tokens. What you see as the field contents in the search result is the stored value of the field, which is completely independent of the terms that were indexed for that field.

                  所以你想要的是将v-strom"作为一个整体作为一个索引词.StandardAnalyzer 不适合这种文本.也许可以试试 WhitespaceAnalyzerSimpleAnalyzer.如果这仍然不能解决问题,您还可以选择将自己的分析器放在一起,或者只是从这两个开始并使用进一步的 TokenFilters 组合它们.theLucene 分析包 Javadoc.

                  So what you want is for "v-strom" as a whole to be an indexed term. StandardAnalyzer is not suited to this kind of text. Maybe have a go with the WhitespaceAnalyzer or SimpleAnalyzer. If that still doesn't cut it, you also have the option of throwing together your own analyzer, or just starting off those two mentined and composing them with further TokenFilters. A very good explanation is given in the Lucene Analysis package Javadoc.

                  顺便说一句,不需要在索引中输入所有变体,例如 V-strom、V-Strom 等.这个想法是让同一个分析器在索引中和解析时将所有这些变体标准化为相同的字符串查询.

                  BTW there's no need to enter all the variants in the index, like V-strom, V-Strom, etc. The idea is for the same analyzer to normalize all these variants to the same string both in the index and while parsing the query.

                  这篇关于“-"的 Lucene 索引问题特点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

                  本站部分内容来源互联网,如果有图片或者内容侵犯了您的权益,请联系我们,我们会在确认后第一时间进行删除!

                  相关文档推荐

                  Lucene Porter Stemmer not public(Lucene Porter Stemmer 未公开)
                  How to index pdf, ppt, xl files in lucene (java based or python or php any of these is fine)?(如何在 lucene 中索引 pdf、ppt、xl 文件(基于 java 或 python 或 php 中的任何一个都可以)?)
                  KeywordAnalyzer and LowerCaseFilter/LowerCaseTokenizer(KeywordAnalyzer 和 LowerCaseFilter/LowerCaseTokenizer)
                  How to search between dates (Hibernate Search)?(如何在日期之间搜索(休眠搜索)?)
                  How to get positions from a document term vector in Lucene?(如何从 Lucene 中的文档术语向量中获取位置?)
                  Java Lucene 4.5 how to search by case insensitive(Java Lucene 4.5如何按不区分大小写进行搜索)
                    <tbody id='1GLHD'></tbody>

                    <legend id='1GLHD'><style id='1GLHD'><dir id='1GLHD'><q id='1GLHD'></q></dir></style></legend>

                        • <bdo id='1GLHD'></bdo><ul id='1GLHD'></ul>
                          <tfoot id='1GLHD'></tfoot>

                            <small id='1GLHD'></small><noframes id='1GLHD'>

                          1. <i id='1GLHD'><tr id='1GLHD'><dt id='1GLHD'><q id='1GLHD'><span id='1GLHD'><b id='1GLHD'><form id='1GLHD'><ins id='1GLHD'></ins><ul id='1GLHD'></ul><sub id='1GLHD'></sub></form><legend id='1GLHD'></legend><bdo id='1GLHD'><pre id='1GLHD'><center id='1GLHD'></center></pre></bdo></b><th id='1GLHD'></th></span></q></dt></tr></i><div id='1GLHD'><tfoot id='1GLHD'></tfoot><dl id='1GLHD'><fieldset id='1GLHD'></fieldset></dl></div>