URL Extractor¶
-
class
etk.extractors.url_extractor.
URLExtractor
(allow_missing_http: bool = False)[source]¶ Bases:
etk.extractors.regex_extractor.RegexExtractor
- Description
- This class inherits the RegexExtractor and pre-defines the url pattern as the regex pattern.
Example
url_extractor = URLExtractor(allow_missing_http=True) url_extractor.extractor(text=text)
-
extract
(text: str, flags=0, mode: etk.extractors.regex_extractor.MatchMode = <MatchMode.FINDALL: (<enum.auto object>, )>) → List[etk.extraction.Extraction]¶ - Extracts information from a text using the given regex. If the pattern has no groups, it returns a list with a single Extraction. If the pattern has groups, it returns a list of Extraction, one for each group. Each extraction records the start and end char positions of matches.
Parameters: - text (str) – the text to extract from.
- flags (enum['a', 'i', 'L', 'm', 's', 'u', 'x']) – flags given to search or match. The value should be one or more letters from the set ‘a’, ‘i’, ‘L’, ‘m’, ‘s’, ‘u’, ‘x’.) The group matches the empty string; the letters set the corresponding flags: re.A (ASCII-only matching), re.I (ignore case), re.L (locale dependent), re.M (multi-line), re.S (dot matches all), re.U (Unicode matching), and re.X (verbose), for the entire regular expression.
- mode (enum[MatchMode.MATCH, MatchMode.SEARCH, MatchMode.FINDALL, MatchMode.SPLIT]) – whether to use re.search() or re.match().
Returns: the list of extraction or the empty list if there are no matches.
Return type: List(Extraction)