Citations

Introduction

The citation module contains functions for doing the following:

  • Given a block of HTML, it will extract all of the URLs by analysing the anchor <a> tags.
  • Given the URL of the HTML being analysed, will determine which of the citations found are external resources.
  • Given a JSON file of classifications, will classify each of the external URLs accordingly.

Usage

To use the citations module:

>>> import coast_core
>>> coast_core.citations.function(to_use)

or:

>>> from coast_core import citations
>>> citations.function(to_use)

Functions

A collection of functions that can be used for analysing the citations within results to other resources.


See the documentation and sample_data for examples (https://coast-core.readthedocs.io).

coast_core.citations.classify_citations(external_uris, classification_config_file)

Given a file containing a JSON object of key value {classification:[patterns]} pairs. Classify each of the citations for each article.

For example, given the following JSON:

{
  "research": ["reseaerchgate", "ieee.", "dx.doi.", "acm", "sciencedirect"]
}

All citations that contain any sub-string within the list will be classified as ‘research’ citations. A more detailed JSON example can be found in our test_data: https://github.com/zedrem/coast_core/blob/master/coast_core/resources/data/citations_classification.json

Parameters:
  • external_uris – A list of uris to classify.
  • classification_config_file – A config file containing all classifications.
Returns:

A list of objects containing all classifications.

coast_core.citations.compute_citation_binary_counts(classified_external_uris, classification_config_file)

Take binary counts of each citation type.

Parameters:
  • classified_external_uris – a list of objects containing all classifications.
  • classification_config_file – A config file containing all classifications.
Returns:

An object containing a binary count of each classification type.

coast_core.citations.execute_full_citation_analysis(html, link, classification_config_file)

Runs a complete end-to-end analysis of citations using all other functions.

Parameters:
  • html – The html to operate on.
  • link – The link of the article being analysed.
  • classification_config_file – A config file containing all classifications.
Returns:

An object containing all analysis.

coast_core.citations.get_all_citations(html)

Extract citations from a single articles HTML.

Parameters:html – The html to operate on.
Returns:A list of all URI’s in lowercase form found in the article.
coast_core.citations.get_an_articles_domain(link)

For a given URL, parse and return the articles TLDN as a string.

Parameters:link – The link to parse.
Returns:The domain of the link.
coast_core.citations.select_external_citations(link, all_uris)

From a list of uri’s, return those that are external to the domain of the link.

Parameters:
  • link – The link of the article being analysed.
  • all_uris – A list of all URI’s found in the article.
Returns:

A list of uris that are external to the domain of the link being analysed.