Ngram Extraction¶
Introduction¶
The ngram_extraction module uses nltk to split a given block of text into ngrams.
Usage¶
To use the module:
>>> import coast_core
>>> coast_core.ngram_extraction.function(to_use)
or:
>>> from coast_core import ngram_extraction
>>> ngram_extraction.function(to_use)
Functions¶
A collection of functions that can be used for splitting the article into ngrams.
-
coast_core.ngram_extraction.
generate_ngrams
(article_text)¶ Split the given text into ngrams, returning an object that contains ngrams from one to six.
Parameters: article_text – the block of text to operate on. Returns: An object containing all ngrams up to 6 in the following structure: { "unigrams": [list of unigrams], "bigrams": [list of bigrams], "trigrams": [list of trigrams], "fourgrams": [list of fourgrams], "fivegrams": [list of fivegrams], "sixgrams": [list of sixgrams] }