skillNer.text_class.Text¶

class skillNer.text_class.Text(text: str, nlp)¶

The main object to store/preprocess a raw text. The object behaviour is like a list according to words.

Constructor of the class

Parameters

text (str) – The raw text. It might be for instance a job description.
nlp ([type]) – An NLP object instanciated from Spacy.

Examples

>>> import spacy
>>> nlp = spacy.load('en_core_web_sm')
>>> from skillNer.text_class import Text
>>> text_obj = Text("Fluency in both English and French is mandatory")

__init__(text: str, nlp)¶

Constructor of the class

Parameters

text (str) – The raw text. It might be for instance a job description.
nlp ([type]) – An NLP object instanciated from Spacy.

Examples

>>> import spacy
>>> nlp = spacy.load('en_core_web_sm')
>>> from skillNer.text_class import Text
>>> text_obj = Text("Fluency in both English and French is mandatory")

Methods

`__init__`(text, nlp)	Constructor of the class
`lemmed`([as_list])	To get the lemmed version of text
`stemmed`([as_list])	To get the stemmed version of text
`words_start_end_position`(text)	To get the starting and ending index of each word in text

lemmed(as_list: bool = False)¶

To get the lemmed version of text

Parameters: as_list (bool) – True to get a list of lemmed words within text. False, to get lemmed text in a form of string.
Returns: return the lemmed text in the specified form by the argument as_list
Return type: str | List[str]

Examples

>>> import spacy
>>> nlp = spacy.load('en_core_web_sm')
>>> from skillNer.text_class import Text
>>> text_obj = Text("Fluency in both English and French is mandatory")
>>> text_obj.lemmed()
'fluency in both english and french be mandatory'
>>> text_obj.lemmed(as_list=True)
['fluency', 'in', 'both', 'english', 'and', 'french', 'be', 'mandatory']

stemmed(as_list: bool = False)¶

To get the stemmed version of text

Parameters: as_list (bool (default False)) – True to get a list of stemmed words within text. False, to get stemmed text in a form of string.
Returns: return the stemmed text in the specified form by the argument as_list.
Return type: str | List[str]

Examples

>>> import spacy
>>> nlp = spacy.load('en_core_web_sm')
>>> from skillNer.text_class import Text
>>> text_obj = Text("Fluency in both English and French is mandatory")
>>> text_obj.stemmed()
'fluenci in both english and french is mandatori'
>>> text_obj.stemmed(as_list=True)
['fluenci', 'in', 'both', 'english', 'and', 'french', 'is', 'mandatori']

static words_start_end_position(text: str) → List[skillNer.text_class.Word]¶

To get the starting and ending index of each word in text

Parameters: text (str) – The input text
Returns: Returns a list of words where in each word the start and end properties were filled by the starting and ending position of the word.
Return type: List[Word]

Examples

>>> import spacy
>>> nlp = spacy.load('en_core_web_sm')
>>> from skillNer.text_class import Text
>>> list_words = Text.words_start_end_position("Hello World I am SkillNer")
>>> word_1 = list_words[0]
>>> print(word_1.start, word_1.end)
0 5