Code detection

Introduction

The code detection module is used for identifying an extracting code examples within text. Regular expressions are used to identify the following features:

Feature Regular Expression Example
Arrow functions .(-|=)>. Funct funct = ()-> { console.log("Hello"); }
Full stops that don’t have a space character either side \w\.\w my_list.append(a_value)
Camel case [A-Z][a-z0-9]+[A-Z][a-z0-9]+ MyFirstClass(args)
Code comments \"\"\"|/(\*+)|//|\*+/|#|<!--|--> # Here is a Python comment
Curly brackets {|} my_function(){...}
Brackets that don’t have a space either side \w\(.*?\) my_function(type arg, type arg)
Semi-colons .; int i = 0;
Uncommon characters (!|\+|-)=|\+|(\*|&|\||=|<|>){1,2}|(_|:){2} if my_int > 0 and 'a' in __special_file.py:
Words that are separated by an underscore [[:alnum:]]_[[:alnum:]] some_words_separate_by_underscore = 5
Square brackets that don’t have a space either side \w\[.*?\] for object in my_database['my_collection_name']
Keywords (^|\s)" + keyword + "(\s|\(|\{|:|$) if while else for each elif

These default features (and keywords) are pulled from the patterns.json and keywords.txt files in coast_core/resources/data (https://github.com/zedrem/coast_core/tree/master/coast_core/resources/data). To add more features, you can simply add them to these files (Note that keywords can also be multiple words also).

Usage

To use the module:

>>> import coast_core
>>> coast_core.code_detection.function(to_use)

or:

>>> from coast_core import code_detection
>>> code_detection.function(to_use)

Functions

A collection of function that detect and analyse an article

coast_core.code_detection.execute_code_detection(text, granularity='ALL')
Execute all the function of code detection analysis. You can choose what to return.
  • ALL will return all the data we can get. This is the default value.
  • BASIC will return the binary and the absolute data.
  • FEATURES will return the detected features in the text
  • LINES will return the lines data.
Parameters:
  • text – The text to operate on
  • granularity – Will affect the returned data : ALL BASIC FEATURES LINES
Returns:

The return will depend of the granularity

coast_core.code_detection.extract_absolute_data(lines_list)

Extract the absolute data from lines.

Parameters:lines_list – The list of line to operate on returned by the extract_lines_data function.
Returns:An object containing the binary data of the lines.
coast_core.code_detection.extract_binary_data(lines_list)

Extract the binary data from lines.

Parameters:lines_list – The list of line to operate on returned by the extract_lines_data function.
Returns:An object containing the binary data of the lines.
coast_core.code_detection.extract_features_by_words(text)

Extract the features from words.

Parameters:text – The text to operate on
Returns:A list of features objects
coast_core.code_detection.extract_lines_data(text)

Extract the lines data from a text.

Parameters:text – The text to operate on
Returns:A list of lines objects
coast_core.code_detection.extract_text_data(text)

Extract the data of the text : total of characters, total of words, total of lines

Parameters:text – The text to operate on
Returns:an object containing th text data
coast_core.code_detection.features_detection(word)

Detect features in a word. Features come from the pattern.json in the resources directory

Parameters:word – the word to operate on.
Returns:the list of features of a word.
coast_core.code_detection.percentage(string, total)

Calculate the percentage of code in an string.

Parameters:
  • string – The string to operate on
  • total – The total depending on what you base your percentage
Returns:

The percentage of code in the string