skillNer.text_class.Text

class skillNer.text_class.Text(text: str, nlp)

The main object to store/preprocess a raw text. The object behaviour is like a list according to words.

Constructor of the class

Parameters
  • text (str) – The raw text. It might be for instance a job description.

  • nlp ([type]) – An NLP object instanciated from Spacy.

Examples

>>> import spacy
>>> nlp = spacy.load('en_core_web_sm')
>>> from skillNer.text_class import Text
>>> text_obj = Text("Fluency in both English and French is mandatory")
__init__(text: str, nlp)

Constructor of the class

Parameters
  • text (str) – The raw text. It might be for instance a job description.

  • nlp ([type]) – An NLP object instanciated from Spacy.

Examples

>>> import spacy
>>> nlp = spacy.load('en_core_web_sm')
>>> from skillNer.text_class import Text
>>> text_obj = Text("Fluency in both English and French is mandatory")

Methods

__init__(text, nlp)

Constructor of the class

lemmed([as_list])

To get the lemmed version of text

stemmed([as_list])

To get the stemmed version of text

words_start_end_position(text)

To get the starting and ending index of each word in text

lemmed(as_list: bool = False)

To get the lemmed version of text

Parameters

as_list (bool) – True to get a list of lemmed words within text. False, to get lemmed text in a form of string.

Returns

return the lemmed text in the specified form by the argument as_list

Return type

str | List[str]

Examples

>>> import spacy
>>> nlp = spacy.load('en_core_web_sm')
>>> from skillNer.text_class import Text
>>> text_obj = Text("Fluency in both English and French is mandatory")
>>> text_obj.lemmed()
'fluency in both english and french be mandatory'
>>> text_obj.lemmed(as_list=True)
['fluency', 'in', 'both', 'english', 'and', 'french', 'be', 'mandatory']
stemmed(as_list: bool = False)

To get the stemmed version of text

Parameters

as_list (bool (default False)) – True to get a list of stemmed words within text. False, to get stemmed text in a form of string.

Returns

return the stemmed text in the specified form by the argument as_list.

Return type

str | List[str]

Examples

>>> import spacy
>>> nlp = spacy.load('en_core_web_sm')
>>> from skillNer.text_class import Text
>>> text_obj = Text("Fluency in both English and French is mandatory")
>>> text_obj.stemmed()
'fluenci in both english and french is mandatori'
>>> text_obj.stemmed(as_list=True)
['fluenci', 'in', 'both', 'english', 'and', 'french', 'is', 'mandatori']
static words_start_end_position(text: str) List[skillNer.text_class.Word]

To get the starting and ending index of each word in text

Parameters

text (str) – The input text

Returns

Returns a list of words where in each word the start and end properties were filled by the starting and ending position of the word.

Return type

List[Word]

Examples

>>> import spacy
>>> nlp = spacy.load('en_core_web_sm')
>>> from skillNer.text_class import Text
>>> list_words = Text.words_start_end_position("Hello World I am SkillNer")
>>> word_1 = list_words[0]
>>> print(word_1.start, word_1.end)
0 5