Glossary Extractor¶
-
class
etk.extractors.glossary_extractor.
GlossaryExtractor
(glossary: List[str], extractor_name: str, tokenizer: etk.tokenizer.Tokenizer, ngrams: int = 2, case_sensitive=False)[source]¶ Bases:
etk.extractor.Extractor
- Description
- This class takes a list of glossary as reference, extract the matched ngrams string from the tokenized input string.
Examples
glossary = ['Beijing', 'Los Angeles', 'New York', 'Shanghai'] glossary_extractor = GlossaryExtractor(glossary=glossary, ngrams=3, case_sensitive=True) glossary_extractor.extract(tokens=Tokenizer(input_text))
-
extract
(tokens: List[spacy.tokens.token.Token]) → List[etk.extraction.Extraction][source]¶ Extracts information from a string(TEXT) with the GlossaryExtractor instance
Parameters: token (List[Token]) – list of spaCy token to be processed. Returns: the list of extraction or the empty list if there are no matches. Return type: List[Extraction]