Code detection¶
Introduction¶
The code detection module is used for identifying an extracting code examples within text. Regular expressions are used to identify the following features:
Feature | Regular Expression | Example |
Arrow functions | .(-|=)>. |
Funct funct = ()-> { console.log("Hello"); } |
Full stops that don’t have a space character either side | \w\.\w |
my_list.append(a_value) |
Camel case | [A-Z][a-z0-9]+[A-Z][a-z0-9]+ |
MyFirstClass(args) |
Code comments | \"\"\"|/(\*+)|//|\*+/|#|<!--|--> |
# Here is a Python comment |
Curly brackets | {|} |
my_function(){...} |
Brackets that don’t have a space either side | \w\(.*?\) |
my_function(type arg, type arg) |
Semi-colons | .; |
int i = 0; |
Uncommon characters | (!|\+|-)=|\+|(\*|&|\||=|<|>){1,2}|(_|:){2} |
if my_int > 0 and 'a' in __special_file.py: |
Words that are separated by an underscore | [[:alnum:]]_[[:alnum:]] |
some_words_separate_by_underscore = 5 |
Square brackets that don’t have a space either side | \w\[.*?\] |
for object in my_database['my_collection_name'] |
Keywords | (^|\s)" + keyword + "(\s|\(|\{|:|$) |
if while else for each elif |
These default features (and keywords) are pulled from the patterns.json and keywords.txt files in coast_core/resources/data (https://github.com/zedrem/coast_core/tree/master/coast_core/resources/data). To add more features, you can simply add them to these files (Note that keywords can also be multiple words also).
Usage¶
To use the module:
>>> import coast_core
>>> coast_core.code_detection.function(to_use)
or:
>>> from coast_core import code_detection
>>> code_detection.function(to_use)
Functions¶
A collection of function that detect and analyse an article
-
coast_core.code_detection.
execute_code_detection
(text, granularity='ALL')¶ - Execute all the function of code detection analysis. You can choose what to return.
- ALL will return all the data we can get. This is the default value.
- BASIC will return the binary and the absolute data.
- FEATURES will return the detected features in the text
- LINES will return the lines data.
Parameters: - text – The text to operate on
- granularity – Will affect the returned data : ALL BASIC FEATURES LINES
Returns: The return will depend of the granularity
-
coast_core.code_detection.
extract_absolute_data
(lines_list)¶ Extract the absolute data from lines.
Parameters: lines_list – The list of line to operate on returned by the extract_lines_data function. Returns: An object containing the binary data of the lines.
-
coast_core.code_detection.
extract_binary_data
(lines_list)¶ Extract the binary data from lines.
Parameters: lines_list – The list of line to operate on returned by the extract_lines_data function. Returns: An object containing the binary data of the lines.
-
coast_core.code_detection.
extract_features_by_words
(text)¶ Extract the features from words.
Parameters: text – The text to operate on Returns: A list of features objects
-
coast_core.code_detection.
extract_lines_data
(text)¶ Extract the lines data from a text.
Parameters: text – The text to operate on Returns: A list of lines objects
-
coast_core.code_detection.
extract_text_data
(text)¶ Extract the data of the text : total of characters, total of words, total of lines
Parameters: text – The text to operate on Returns: an object containing th text data
-
coast_core.code_detection.
features_detection
(word)¶ Detect features in a word. Features come from the pattern.json in the resources directory
Parameters: word – the word to operate on. Returns: the list of features of a word.
-
coast_core.code_detection.
percentage
(string, total)¶ Calculate the percentage of code in an string.
Parameters: - string – The string to operate on
- total – The total depending on what you base your percentage
Returns: The percentage of code in the string