Inferlink Extractor¶

class etk.extractors.inferlink_extractor.InferlinkExtractor(rule_set: etk.extractors.inferlink_extractor.InferlinkRuleSet)[source]¶

Bases: etk.extractor.Extractor

Description
This class extracts segments from an HTML page using rules created by the Inferlink web wrapper.

Examples

inferlink_extractor = InferlinkExtractor()
inferlink_extractor.extract(text=input_doc,
                            threshold=0.8)
extract(html_text: str, threshold=0.5) → List[etk.extraction.Extraction][source]¶
Parameters:
  • html_text (str) – str of the html page to be extracted
  • threshold (float) – if the ratio of rules that successfully extracted something over all rules is higher than or equal to the threshold, return the results, else return an empty list
Returns:

a list of Extractions, each extraction includes the extracted value, the rule name, the provenance etc.

Return type:

List[Extraction]

ETK

Navigation

  • Installation
  • Extractors
    • Bitcoin Address Extractor
    • Cryptographic Hash Extractor
    • Cve Extractor
    • Date Extractor
    • DBpedia Spotlight Extractor
    • Decoding Value Extractor
    • Email Extractor
    • Excel Extractor
    • Glossary Extractor
    • Hostname Extractor
    • HTML Content Extractor
    • HTML Metadata Extractor
    • Inferlink Extractor
    • IP Address Extractor
    • Language Identification Extractor
    • Regular Expression Extractor
    • Sentence Extractor
    • Spacy NER Extractor
    • Spacy Rule Extractor
    • Table Extractor
    • URL Extractor

  • ETK @ GitHub
  • ETK @ PyPI
  • Issue Tracker
  • USC/ISI CKG

Quick search

©2018, USC/ISI.
Fork me on GitHub