skillNer.skill_extractor_class.SkillExtractor

class skillNer.skill_extractor_class.SkillExtractor(nlp, skills_db, phraseMatcher, tranlsator_func=False)

Main class to annotate skills in a given text and visualize them.

Constructor of the class.

Parameters
  • nlp ([type]) – NLP object loaded from spacy.

  • skills_db ([type]) – A skill database used as a lookup table to annotate skills.

  • phraseMatcher ([type]) – A phrasematcher loaded from spacy.

  • tranlsator_func (Callable) – A fucntion to translate text from source language to english def tranlsator_func(text_input: str) -> text_input:str

__init__(nlp, skills_db, phraseMatcher, tranlsator_func=False)

Constructor of the class.

Parameters
  • nlp ([type]) – NLP object loaded from spacy.

  • skills_db ([type]) – A skill database used as a lookup table to annotate skills.

  • phraseMatcher ([type]) – A phrasematcher loaded from spacy.

  • tranlsator_func (Callable) – A fucntion to translate text from source language to english def tranlsator_func(text_input: str) -> text_input:str

Methods

__init__(nlp, skills_db, phraseMatcher[, ...])

Constructor of the class.

annotate(text[, tresh])

To annotate a given text and thereby extract skills from it.

describe(annotations)

To display more details about the annotated skills.

display(results)

To display the annotated skills.

annotate(text: str, tresh: float = 0.5) dict

To annotate a given text and thereby extract skills from it.

Parameters
  • text (str) – The target text.

  • tresh (float, optional) – A treshold used to select skills in case of confusion, by default 0.5

Returns

returns a dictionnary with the text that was used and the annotated skills (see example).

Return type

dict

Examples

>>> import spacy
>>> from spacy.matcher import PhraseMatcher
>>> from skillNer.skill_extractor_class import SkillExtractor
>>> from skillNer.general_params import SKILL_DB
>>> nlp = spacy.load('en_core_web_sm')
>>> skill_extractor = SkillExtractor(nlp, SKILL_DB, PhraseMatcher)
loading full_matcher ...
loading abv_matcher ...
loading full_uni_matcher ...
loading low_form_matcher ...
loading token_matcher ...
>>> text = "Fluency in both english and french is mandatory"
>>> skill_extractor.annotate(text)
{'text': 'fluency in both english and french is mandatory',
'results': {'full_matches': [],
'ngram_scored': [{'skill_id': 'KS123K75YYK8VGH90NCS',
    'doc_node_id': [3],
    'doc_node_value': 'english',
    'type': 'lowSurf',
    'score': 1,
    'len': 1},
{'skill_id': 'KS1243976G466GV63ZBY',
    'doc_node_id': [5],
    'doc_node_value': 'french',
    'type': 'lowSurf',
    'score': 1,
    'len': 1}]}}
display(results: dict)

To display the annotated skills. This method uses built-in classes of spacy to render annotated text, namely displacy.

Parameters
  • results (dict) – results is the dictionnary resulting from applying .annotate() to a text.

  • Results

  • -------

  • None – render the text with annotated skills.