• <tfoot id='omqps'></tfoot>

    <small id='omqps'></small><noframes id='omqps'>

      1. <legend id='omqps'><style id='omqps'><dir id='omqps'><q id='omqps'></q></dir></style></legend>
      2. <i id='omqps'><tr id='omqps'><dt id='omqps'><q id='omqps'><span id='omqps'><b id='omqps'><form id='omqps'><ins id='omqps'></ins><ul id='omqps'></ul><sub id='omqps'></sub></form><legend id='omqps'></legend><bdo id='omqps'><pre id='omqps'><center id='omqps'></center></pre></bdo></b><th id='omqps'></th></span></q></dt></tr></i><div id='omqps'><tfoot id='omqps'></tfoot><dl id='omqps'><fieldset id='omqps'></fieldset></dl></div>
          <bdo id='omqps'></bdo><ul id='omqps'></ul>

        在 Lucene 中获取词频

        Get term frequencies in Lucene(在 Lucene 中获取词频)

        <i id='IUzOD'><tr id='IUzOD'><dt id='IUzOD'><q id='IUzOD'><span id='IUzOD'><b id='IUzOD'><form id='IUzOD'><ins id='IUzOD'></ins><ul id='IUzOD'></ul><sub id='IUzOD'></sub></form><legend id='IUzOD'></legend><bdo id='IUzOD'><pre id='IUzOD'><center id='IUzOD'></center></pre></bdo></b><th id='IUzOD'></th></span></q></dt></tr></i><div id='IUzOD'><tfoot id='IUzOD'></tfoot><dl id='IUzOD'><fieldset id='IUzOD'></fieldset></dl></div>

        <legend id='IUzOD'><style id='IUzOD'><dir id='IUzOD'><q id='IUzOD'></q></dir></style></legend>
        <tfoot id='IUzOD'></tfoot>

            <small id='IUzOD'></small><noframes id='IUzOD'>

              • <bdo id='IUzOD'></bdo><ul id='IUzOD'></ul>
                  <tbody id='IUzOD'></tbody>

                1. 本文介绍了在 Lucene 中获取词频的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

                  问题描述

                  有没有一种快速简便的方法从 Lucene 索引中获取词频,而无需通过 TermVectorFrequencies 类来完成,因为对于大型集合来说这需要大量时间?

                  Is there a fast and easy way of getting term frequencies from a Lucene index, without doing it through the TermVectorFrequencies class, since that takes an awful lot of time for large collections?

                  我的意思是,有没有像 TermEnum 这样的东西,它不仅有文档频率,还有词频?

                  What I mean is, is there something like TermEnum which has not just the document frequency but term frequency as well?

                  更新:使用 TermDocs 太慢了.

                  UPDATE: Using TermDocs is way too slow.

                  推荐答案

                  使用TermDocs 获取给定文档的词频.与文档频率一样,您可以使用感兴趣的术语从 IndexReader 获取术语文档.

                  您不会找到比 TermDocs 更快的方法而不失一些通用性.TermDocs 直接从索引段中的.frq"文件中读取,其中每个术语频率按文档顺序列出.

                  You won't find a faster method than TermDocs without losing some generality. TermDocs reads directly from the ".frq" file in an index segment, where each term frequency is listed in document order.

                  如果这太慢",请确保您已优化索引以将多个段合并为一个段.按顺序遍历文档(跳过没问题,但不能高效地在文档列表中来回跳转).

                  If that's "too slow", make sure that you've optimized your index to merge multiple segments into a single segment. Iterate over the documents in order (skips are alright, but you can't jump back and forth in the document list efficiently).

                  您的下一步可能是进行额外处理,以创建一个更专业的文件结构,省略 SkipData.就我个人而言,我会寻找更好的算法来实现我的目标,或者提供更好的硬件——大量内存,或者保存 RAMDirectory,或者提供给操作系统以在其自己的文件缓存系统上使用.

                  Your next step might be additional processing to create an even more specialized file structure that leaves out the SkipData. Personally I would look for a better algorithm to achieve my objective, or provide better hardware—lots of memory, either to hold a RAMDirectory, or to give to the OS for use on its own file-caching system.

                  这篇关于在 Lucene 中获取词频的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

                  本站部分内容来源互联网,如果有图片或者内容侵犯了您的权益,请联系我们,我们会在确认后第一时间进行删除!

                  相关文档推荐

                  Lucene Porter Stemmer not public(Lucene Porter Stemmer 未公开)
                  How to index pdf, ppt, xl files in lucene (java based or python or php any of these is fine)?(如何在 lucene 中索引 pdf、ppt、xl 文件(基于 java 或 python 或 php 中的任何一个都可以)?)
                  KeywordAnalyzer and LowerCaseFilter/LowerCaseTokenizer(KeywordAnalyzer 和 LowerCaseFilter/LowerCaseTokenizer)
                  How to search between dates (Hibernate Search)?(如何在日期之间搜索(休眠搜索)?)
                  How to get positions from a document term vector in Lucene?(如何从 Lucene 中的文档术语向量中获取位置?)
                  Java Lucene 4.5 how to search by case insensitive(Java Lucene 4.5如何按不区分大小写进行搜索)
                  • <small id='D8AfH'></small><noframes id='D8AfH'>

                    • <tfoot id='D8AfH'></tfoot>

                        <tbody id='D8AfH'></tbody>

                          <bdo id='D8AfH'></bdo><ul id='D8AfH'></ul>

                          <legend id='D8AfH'><style id='D8AfH'><dir id='D8AfH'><q id='D8AfH'></q></dir></style></legend>
                          1. <i id='D8AfH'><tr id='D8AfH'><dt id='D8AfH'><q id='D8AfH'><span id='D8AfH'><b id='D8AfH'><form id='D8AfH'><ins id='D8AfH'></ins><ul id='D8AfH'></ul><sub id='D8AfH'></sub></form><legend id='D8AfH'></legend><bdo id='D8AfH'><pre id='D8AfH'><center id='D8AfH'></center></pre></bdo></b><th id='D8AfH'></th></span></q></dt></tr></i><div id='D8AfH'><tfoot id='D8AfH'></tfoot><dl id='D8AfH'><fieldset id='D8AfH'></fieldset></dl></div>