Citations¶
Introduction¶
The citation module contains functions for doing the following:
- Given a block of HTML, it will extract all of the URLs by analysing the anchor <a> tags.
- Given the URL of the HTML being analysed, will determine which of the citations found are external resources.
- Given a JSON file of classifications, will classify each of the external URLs accordingly.
Usage¶
To use the citations module:
>>> import coast_core
>>> coast_core.citations.function(to_use)
or:
>>> from coast_core import citations
>>> citations.function(to_use)
Functions¶
A collection of functions that can be used for analysing the citations within results to other resources.
See the documentation and sample_data for examples (https://coast-core.readthedocs.io).
-
coast_core.citations.
classify_citations
(external_uris, classification_config_file)¶ Given a file containing a JSON object of key value {classification:[patterns]} pairs. Classify each of the citations for each article.
For example, given the following JSON:
{ "research": ["reseaerchgate", "ieee.", "dx.doi.", "acm", "sciencedirect"] }
All citations that contain any sub-string within the list will be classified as ‘research’ citations. A more detailed JSON example can be found in our test_data: https://github.com/zedrem/coast_core/blob/master/coast_core/resources/data/citations_classification.json
Parameters: - external_uris – A list of uris to classify.
- classification_config_file – A config file containing all classifications.
Returns: A list of objects containing all classifications.
-
coast_core.citations.
compute_citation_binary_counts
(classified_external_uris, classification_config_file)¶ Take binary counts of each citation type.
Parameters: - classified_external_uris – a list of objects containing all classifications.
- classification_config_file – A config file containing all classifications.
Returns: An object containing a binary count of each classification type.
-
coast_core.citations.
execute_full_citation_analysis
(html, link, classification_config_file)¶ Runs a complete end-to-end analysis of citations using all other functions.
Parameters: - html – The html to operate on.
- link – The link of the article being analysed.
- classification_config_file – A config file containing all classifications.
Returns: An object containing all analysis.
-
coast_core.citations.
get_all_citations
(html)¶ Extract citations from a single articles HTML.
Parameters: html – The html to operate on. Returns: A list of all URI’s in lowercase form found in the article.
-
coast_core.citations.
get_an_articles_domain
(link)¶ For a given URL, parse and return the articles TLDN as a string.
Parameters: link – The link to parse. Returns: The domain of the link.
-
coast_core.citations.
select_external_citations
(link, all_uris)¶ From a list of uri’s, return those that are external to the domain of the link.
Parameters: - link – The link of the article being analysed.
- all_uris – A list of all URI’s found in the article.
Returns: A list of uris that are external to the domain of the link being analysed.