Utils

Introduction

The utils module contains some helper functions that are used by other modules. However, they are open to be utilised as you wish.

Usage

To use the citations module:

>>> import coast_core
>>> coast_core.utils.function(to_use)

or: .. code-block:: console

>>> from coast_core import utils
>>> utils.function(to_use)

Functions

A collection of generic utility functions that are used throughout coast by various modules relating to NLP tasks and the reporting and performance measures.

coast_core.utils.get_from_file(path)

Reads a file and returns each line as a list of strings. Notes:

  1. All double quotes are replaced with single quotes.
  2. New line characters are removed.
Parameters:path – The path to the file you wish to read.
Returns:A list of strings, where each string is a line in the file.
coast_core.utils.get_json_from_file(path)

Reads a JSON file and returns as an object. :param path: The path to the JSON file you wish to read. :return: A JSON object, generated from the contents of the file. :return: In the event of an error, the error is printed to the stdout.

coast_core.utils.get_ngrams(text, number)

Split a given body of text into ngrams.

Parameters:
  • text – The body of text to operate on.
  • number – Specify the size of the ngram (e.g unigram, bigram etc).
Returns:

A list of ngrams.

coast_core.utils.import_punkt()

Import punkt

coast_core.utils.penn_treebank_filter(article_text, filter_list, exception_list=[])

Returns a list of tuples that are tagged with any penn treebank tag from the filter list.

Parameters:
  • article_text – The text to analyse.
  • filter_list – The tags to return.
  • exception_list – A list of exception.
Returns:

A list of words containing any of the tags in the filter list.