Inferlink Extractor¶
-
class
etk.extractors.inferlink_extractor.
InferlinkExtractor
(rule_set: etk.extractors.inferlink_extractor.InferlinkRuleSet)[source]¶ Bases:
etk.extractor.Extractor
- Description
- This class extracts segments from an HTML page using rules created by the Inferlink web wrapper.
Examples
inferlink_extractor = InferlinkExtractor() inferlink_extractor.extract(text=input_doc, threshold=0.8)
-
extract
(html_text: str, threshold=0.5) → List[etk.extraction.Extraction][source]¶ Parameters: - html_text (str) – str of the html page to be extracted
- threshold (float) – if the ratio of rules that successfully extracted something over all rules is higher than or equal to the threshold, return the results, else return an empty list
Returns: a list of Extractions, each extraction includes the extracted value, the rule name, the provenance etc.
Return type: List[Extraction]