Ngram Extraction¶
Introduction¶
The ngram_extraction module uses nltk to split a given block of text into ngrams.
Usage¶
To use the module:
>>> import coast_core
>>> coast_core.ngram_extraction.function(to_use)
or:
>>> from coast_core import ngram_extraction
>>> ngram_extraction.function(to_use)
Functions¶
A collection of functions that can be used for splitting the article into ngrams.
-
coast_core.ngram_extraction.
calculate_ngram_frequency_count
(article_text, ngram_size, stop_list=None)¶ Calculate the frequency of occurances for a given ngram based on an article test :param article_text: the block of text to operate on. :param ngram_size: the degree of ngmram to be returned (eg 3 would be a tri gram) :param stop_list: list of words to be excluded from the frequency count :return: An object containing the frequency count of the n grams without ngrams included in the stop list
-
coast_core.ngram_extraction.
generate_ngrams
(article_text)¶ Split the given text into ngrams, returning an object that contains ngrams from one to six.
Parameters: article_text – the block of text to operate on. Returns: An object containing all ngrams up to 6 in the following structure: { "unigrams": [list of unigrams], "bigrams": [list of bigrams], "trigrams": [list of trigrams], "fourgrams": [list of fourgrams], "fivegrams": [list of fivegrams], "sixgrams": [list of sixgrams] }