Spacy Rule Extractor

class etk.extractors.spacy_rule_extractor.Pattern(d: Dict, nlp)[source]

Bases: object

class pattern represent each token

For each token, we let user specify constrains for tokens. Some attributes are spacy build-in attributes, which can be used with rule-based matching: https://spacy.io/usage/linguistic-features#section-rule-based-matching Some are custom attributes, need to apply further filtering after we get matches

class etk.extractors.spacy_rule_extractor.Rule(d: Dict, nlp)[source]

Bases: object

Class Rule represent each matching rule, each rule contains many pattern

class etk.extractors.spacy_rule_extractor.SpacyRuleExtractor(nlp, rules: Dict, extractor_name: str)[source]

Bases: etk.extractor.Extractor

Description
This extractor takes a spaCy rule as reference and extracts the substring which matches the given spaCy rule.

Examples

rules = json.load(open('path_to_spacy_rules.json', "r"))
sample_rules = rules["test_SpacyRuleExtractor_word_1"]
spacy_rule_extractor = SpacyRuleExtractor(nlp=nlp,
                                         rules=sample_rules)
spacy_rule_extractor.extract(text=text)
extract(text: str) → List[etk.extraction.Extraction][source]

Extract from text

Parameters:text (str) – input str to be extracted.
Returns:the list of extraction or the empty list if there are no matches.
Return type:List[Extraction]