skillNer.text_class.Text¶
- class skillNer.text_class.Text(text: str, nlp)¶
The main object to store/preprocess a raw text. The object behaviour is like a list according to words.
Constructor of the class
- Parameters
text (str) – The raw text. It might be for instance a job description.
nlp ([type]) – An NLP object instanciated from Spacy.
Examples
>>> import spacy >>> nlp = spacy.load('en_core_web_sm') >>> from skillNer.text_class import Text >>> text_obj = Text("Fluency in both English and French is mandatory")
- __init__(text: str, nlp)¶
Constructor of the class
- Parameters
text (str) – The raw text. It might be for instance a job description.
nlp ([type]) – An NLP object instanciated from Spacy.
Examples
>>> import spacy >>> nlp = spacy.load('en_core_web_sm') >>> from skillNer.text_class import Text >>> text_obj = Text("Fluency in both English and French is mandatory")
Methods
__init__
(text, nlp)Constructor of the class
lemmed
([as_list])To get the lemmed version of text
stemmed
([as_list])To get the stemmed version of text
words_start_end_position
(text)To get the starting and ending index of each word in text
- lemmed(as_list: bool = False)¶
To get the lemmed version of text
- Parameters
as_list (bool) – True to get a list of lemmed words within text. False, to get lemmed text in a form of string.
- Returns
return the lemmed text in the specified form by the argument as_list
- Return type
str | List[str]
Examples
>>> import spacy >>> nlp = spacy.load('en_core_web_sm') >>> from skillNer.text_class import Text >>> text_obj = Text("Fluency in both English and French is mandatory") >>> text_obj.lemmed() 'fluency in both english and french be mandatory' >>> text_obj.lemmed(as_list=True) ['fluency', 'in', 'both', 'english', 'and', 'french', 'be', 'mandatory']
- stemmed(as_list: bool = False)¶
To get the stemmed version of text
- Parameters
as_list (bool (default False)) – True to get a list of stemmed words within text. False, to get stemmed text in a form of string.
- Returns
return the stemmed text in the specified form by the argument as_list.
- Return type
str | List[str]
Examples
>>> import spacy >>> nlp = spacy.load('en_core_web_sm') >>> from skillNer.text_class import Text >>> text_obj = Text("Fluency in both English and French is mandatory") >>> text_obj.stemmed() 'fluenci in both english and french is mandatori' >>> text_obj.stemmed(as_list=True) ['fluenci', 'in', 'both', 'english', 'and', 'french', 'is', 'mandatori']
- static words_start_end_position(text: str) List[skillNer.text_class.Word] ¶
To get the starting and ending index of each word in text
- Parameters
text (str) – The input text
- Returns
Returns a list of words where in each word the start and end properties were filled by the starting and ending position of the word.
- Return type
List[Word]
Examples
>>> import spacy >>> nlp = spacy.load('en_core_web_sm') >>> from skillNer.text_class import Text >>> list_words = Text.words_start_end_position("Hello World I am SkillNer") >>> word_1 = list_words[0] >>> print(word_1.start, word_1.end) 0 5