skillNer.cleaner.Cleaner¶
- class skillNer.cleaner.Cleaner(to_lowercase: bool = True, include_cleaning_functions: List[str] = ['remove_punctuation', 'remove_redundant', 'stem_text', 'lem_text', 'remove_extra_space'], exclude_cleaning_function: List[str] = [])¶
A class to build pipelines to clean text.
the constructor of the class.
- Parameters
to_lowercase (bool, optional) – whether to lowercase the text before cleaning it, by default True
include_cleaning_functions (List, optional) – List of cleaning operations to include in the pipeline, by default all_cleaning
exclude_cleaning_function (List, optional) – List of cleaning operations to exclude for the pipeline, by default []
- __init__(to_lowercase: bool = True, include_cleaning_functions: List[str] = ['remove_punctuation', 'remove_redundant', 'stem_text', 'lem_text', 'remove_extra_space'], exclude_cleaning_function: List[str] = [])¶
the constructor of the class.
- Parameters
to_lowercase (bool, optional) – whether to lowercase the text before cleaning it, by default True
include_cleaning_functions (List, optional) – List of cleaning operations to include in the pipeline, by default all_cleaning
exclude_cleaning_function (List, optional) – List of cleaning operations to exclude for the pipeline, by default []
Methods
__init__
([to_lowercase, ...])the constructor of the class.
- __call__(text: str) str ¶
To apply the initiallized cleaning pipeline on a given text.
- Parameters
text (str) – text to clean
- Returns
returns the text after applying all cleaning operations on it.
- Return type
str
Examples
>>> from skillNer.cleaner import Cleaner >>> cleaner = Cleaner( to_lowercase=True, include_cleaning_functions=["remove_punctuation", "remove_extra_space"] ) >>> text = " I am sentence with a lot of annoying extra spaces , and !! some ,., meaningless punctuation ?! .! AH AH AH" >>> cleaner(text) 'i am sentence with a lot of annoying extra spaces and some meaningless punctuation ah ah ah'