<i id='SRjbW'><tr id='SRjbW'><dt id='SRjbW'><q id='SRjbW'><span id='SRjbW'><b id='SRjbW'><form id='SRjbW'><ins id='SRjbW'></ins><ul id='SRjbW'></ul><sub id='SRjbW'></sub></form><legend id='SRjbW'></legend><bdo id='SRjbW'><pre id='SRjbW'><center id='SRjbW'></center></pre></bdo></b><th id='SRjbW'></th></span></q></dt></tr></i><div id='SRjbW'><tfoot id='SRjbW'></tfoot><dl id='SRjbW'><fieldset id='SRjbW'></fieldset></dl></div>
    1. <legend id='SRjbW'><style id='SRjbW'><dir id='SRjbW'><q id='SRjbW'></q></dir></style></legend><tfoot id='SRjbW'></tfoot>

      <small id='SRjbW'></small><noframes id='SRjbW'>

      • <bdo id='SRjbW'></bdo><ul id='SRjbW'></ul>

      如何在文本中查找搭配,python

      How to find collocations in text, python(如何在文本中查找搭配,python)

            <bdo id='NrfsQ'></bdo><ul id='NrfsQ'></ul>

            • <tfoot id='NrfsQ'></tfoot>

              <small id='NrfsQ'></small><noframes id='NrfsQ'>

              • <i id='NrfsQ'><tr id='NrfsQ'><dt id='NrfsQ'><q id='NrfsQ'><span id='NrfsQ'><b id='NrfsQ'><form id='NrfsQ'><ins id='NrfsQ'></ins><ul id='NrfsQ'></ul><sub id='NrfsQ'></sub></form><legend id='NrfsQ'></legend><bdo id='NrfsQ'><pre id='NrfsQ'><center id='NrfsQ'></center></pre></bdo></b><th id='NrfsQ'></th></span></q></dt></tr></i><div id='NrfsQ'><tfoot id='NrfsQ'></tfoot><dl id='NrfsQ'><fieldset id='NrfsQ'></fieldset></dl></div>
                  <tbody id='NrfsQ'></tbody>
                <legend id='NrfsQ'><style id='NrfsQ'><dir id='NrfsQ'><q id='NrfsQ'></q></dir></style></legend>
                本文介绍了如何在文本中查找搭配,python的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

                问题描述

                如何在文本中找到搭配?搭配是一个单词序列,它们不寻常地经常一起出现.python 具有返回单词对的内置 func bigrams.

                How do you find collocations in text? A collocation is a sequence of words that occurs together unusually often. python has built-in func bigrams that returns word pairs.

                >>> bigrams(['more', 'is', 'said', 'than', 'done'])
                [('more', 'is'), ('is', 'said'), ('said', 'than'), ('than', 'done')]
                >>>
                

                剩下的就是根据单个单词的出现频率找出出现频率更高的二元组.任何想法如何将其放入代码中?

                What's left is to find bigrams that occur more often based on the frequency of individual words. Any ideas how to put it in the code?

                推荐答案

                试试 NLTK.您将主要对 nltk.collocations.BigramCollocationFinder 感兴趣,但这里有一个快速演示,向您展示如何开始:

                Try NLTK. You will mostly be interested in nltk.collocations.BigramCollocationFinder, but here is a quick demonstration to show you how to get started:

                >>> import nltk
                >>> def tokenize(sentences):
                ...     for sent in nltk.sent_tokenize(sentences.lower()):
                ...         for word in nltk.word_tokenize(sent):
                ...             yield word
                ... 
                
                >>> nltk.Text(tkn for tkn in tokenize('mary had a little lamb.'))
                <Text: mary had a little lamb ....>
                >>> text = nltk.Text(tkn for tkn in tokenize('mary had a little lamb.'))
                

                这个小部分没有,但这里有:

                There are none in this small segment, but here goes:

                >>> text.collocations(num=20)
                Building collocations list
                

                这篇关于如何在文本中查找搭配,python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

                本站部分内容来源互联网,如果有图片或者内容侵犯了您的权益,请联系我们,我们会在确认后第一时间进行删除!

                相关文档推荐

                Initialize Multiple Numpy Arrays (Multiple Assignment) - Like MATLAB deal()(初始化多个 Numpy 数组(多重赋值) - 像 MATLAB deal())
                How to extend Python class init(如何扩展 Python 类初始化)
                What#39;s the difference between dict() and {}?(dict() 和 {} 有什么区别?)
                What is a wrapper_descriptor, and why is Foo.__init__() one in this case?(什么是 wrapper_descriptor,为什么 Foo.__init__() 在这种情况下是其中之一?)
                Initialize list with same bool value(使用相同的布尔值初始化列表)
                setattr with kwargs, pythonic or not?(setattr 与 kwargs,pythonic 与否?)
                <tfoot id='M6KCc'></tfoot>

                  1. <small id='M6KCc'></small><noframes id='M6KCc'>

                    <legend id='M6KCc'><style id='M6KCc'><dir id='M6KCc'><q id='M6KCc'></q></dir></style></legend>
                      <i id='M6KCc'><tr id='M6KCc'><dt id='M6KCc'><q id='M6KCc'><span id='M6KCc'><b id='M6KCc'><form id='M6KCc'><ins id='M6KCc'></ins><ul id='M6KCc'></ul><sub id='M6KCc'></sub></form><legend id='M6KCc'></legend><bdo id='M6KCc'><pre id='M6KCc'><center id='M6KCc'></center></pre></bdo></b><th id='M6KCc'></th></span></q></dt></tr></i><div id='M6KCc'><tfoot id='M6KCc'></tfoot><dl id='M6KCc'><fieldset id='M6KCc'></fieldset></dl></div>

                        <tbody id='M6KCc'></tbody>
                        <bdo id='M6KCc'></bdo><ul id='M6KCc'></ul>