Glossary Extractor¶
- 
class etk.extractors.glossary_extractor.GlossaryExtractor(glossary: List[str], extractor_name: str, tokenizer: etk.tokenizer.Tokenizer, ngrams: int = 2, case_sensitive=False)[source]¶
- Bases: - etk.extractor.Extractor- Description
- This class takes a list of glossary as reference, extract the matched ngrams string from the tokenized input string.
 - Examples - glossary = ['Beijing', 'Los Angeles', 'New York', 'Shanghai'] glossary_extractor = GlossaryExtractor(glossary=glossary, ngrams=3, case_sensitive=True) glossary_extractor.extract(tokens=Tokenizer(input_text)) - 
extract(tokens: List[spacy.tokens.token.Token]) → List[etk.extraction.Extraction][source]¶
- Extracts information from a string(TEXT) with the GlossaryExtractor instance - Parameters: - token (List[Token]) – list of spaCy token to be processed. - Returns: - the list of extraction or the empty list if there are no matches. - Return type: - List[Extraction] 
 
