skillNer.cleaner.Cleaner¶
- class skillNer.cleaner.Cleaner(to_lowercase: bool = True, include_cleaning_functions: List[str] = ['remove_punctuation', 'remove_redundant', 'stem_text', 'lem_text', 'remove_extra_space'], exclude_cleaning_function: List[str] = [])¶
- A class to build pipelines to clean text. - the constructor of the class. - Parameters
- to_lowercase (bool, optional) – whether to lowercase the text before cleaning it, by default True 
- include_cleaning_functions (List, optional) – List of cleaning operations to include in the pipeline, by default all_cleaning 
- exclude_cleaning_function (List, optional) – List of cleaning operations to exclude for the pipeline, by default [] 
 
 - __init__(to_lowercase: bool = True, include_cleaning_functions: List[str] = ['remove_punctuation', 'remove_redundant', 'stem_text', 'lem_text', 'remove_extra_space'], exclude_cleaning_function: List[str] = [])¶
- the constructor of the class. - Parameters
- to_lowercase (bool, optional) – whether to lowercase the text before cleaning it, by default True 
- include_cleaning_functions (List, optional) – List of cleaning operations to include in the pipeline, by default all_cleaning 
- exclude_cleaning_function (List, optional) – List of cleaning operations to exclude for the pipeline, by default [] 
 
 
 - Methods - __init__([to_lowercase, ...])- the constructor of the class. - __call__(text: str) str¶
- To apply the initiallized cleaning pipeline on a given text. - Parameters
- text (str) – text to clean 
- Returns
- returns the text after applying all cleaning operations on it. 
- Return type
- str 
 - Examples - >>> from skillNer.cleaner import Cleaner >>> cleaner = Cleaner( to_lowercase=True, include_cleaning_functions=["remove_punctuation", "remove_extra_space"] ) >>> text = " I am sentence with a lot of annoying extra spaces , and !! some ,., meaningless punctuation ?! .! AH AH AH" >>> cleaner(text) 'i am sentence with a lot of annoying extra spaces and some meaningless punctuation ah ah ah'