Ngram Extraction

Introduction

The ngram_extraction module uses nltk to split a given block of text into ngrams.

Usage

To use the module:

>>> import coast_core
>>> coast_core.ngram_extraction.function(to_use)

or:

>>> from coast_core import ngram_extraction
>>> ngram_extraction.function(to_use)

Functions

A collection of functions that can be used for splitting the article into ngrams.

coast_core.ngram_extraction.generate_ngrams(article_text)

Split the given text into ngrams, returning an object that contains ngrams from one to six.

Parameters:article_text – the block of text to operate on.
Returns:An object containing all ngrams up to 6 in the following structure:
{
    "unigrams": [list of unigrams],
    "bigrams": [list of bigrams],
    "trigrams": [list of trigrams],
    "fourgrams": [list of fourgrams],
    "fivegrams": [list of fivegrams],
    "sixgrams": [list of sixgrams]
}