URL Extractor¶

class etk.extractors.url_extractor.URLExtractor(allow_missing_http: bool = False)[source]¶

Bases: etk.extractors.regex_extractor.RegexExtractor

Description: This class inherits the RegexExtractor and pre-defines the url pattern as the regex pattern.

Example

url_extractor = URLExtractor(allow_missing_http=True)
url_extractor.extractor(text=text)

extract(text: str, flags=0, mode: etk.extractors.regex_extractor.MatchMode = <MatchMode.FINDALL: (<enum.auto object>, )>) → List[etk.extraction.Extraction]¶

Extracts information from a text using the given regex. If the pattern has no groups, it returns a list with a single Extraction. If the pattern has groups, it returns a list of Extraction, one for each group. Each extraction records the start and end char positions of matches.

Parameters:	text (str) – the text to extract from. flags (enum['a', 'i', 'L', 'm', 's', 'u', 'x']) – flags given to search or match. The value should be one or more letters from the set ‘a’, ‘i’, ‘L’, ‘m’, ‘s’, ‘u’, ‘x’.) The group matches the empty string; the letters set the corresponding flags: re.A (ASCII-only matching), re.I (ignore case), re.L (locale dependent), re.M (multi-line), re.S (dot matches all), re.U (Unicode matching), and re.X (verbose), for the entire regular expression. mode (enum[MatchMode.MATCH, MatchMode.SEARCH, MatchMode.FINDALL, MatchMode.SPLIT]) – whether to use re.search() or re.match().
Returns:	the list of extraction or the empty list if there are no matches.
Return type:	List(Extraction)

URL Extractor¶

ETK

Navigation