Inferlink Extractor¶

class etk.extractors.inferlink_extractor.InferlinkExtractor(rule_set: etk.extractors.inferlink_extractor.InferlinkRuleSet)[source]¶

Bases: etk.extractor.Extractor

Description: This class extracts segments from an HTML page using rules created by the Inferlink web wrapper.

Examples

inferlink_extractor = InferlinkExtractor()
inferlink_extractor.extract(text=input_doc,
                            threshold=0.8)

extract(html_text: str, threshold=0.5) → List[etk.extraction.Extraction][source]¶

Parameters:	html_text (str) – str of the html page to be extracted threshold (float) – if the ratio of rules that successfully extracted something over all rules is higher than or equal to the threshold, return the results, else return an empty list
Returns:	a list of Extractions, each extraction includes the extracted value, the rule name, the provenance etc.
Return type:	List[Extraction]

ETK