Regular expression functions#
Примечание
Ниже приведена оригинальная документация Trino. Скоро мы ее переведем на русский язык и дополним полезными примерами.
All of the regular expression functions use the Java pattern syntax, with a few notable exceptions:
When using multi-line mode (enabled via the
(?m)
flag), only\n
is recognized as a line terminator. Additionally, the(?d)
flag is not supported and must not be used.Case-insensitive matching (enabled via the
(?i)
flag) is always performed in a Unicode-aware manner. However, context-sensitive and local-sensitive matching is not supported. Additionally, the(?u)
flag is not supported and must not be used.Surrogate pairs are not supported. For example,
\uD800\uDC00
is not treated asU+10000
and must be specified as\x{10000}
.Boundaries (
\b
) are incorrectly handled for a non-spacing mark without a base character.\Q
and\E
are not supported in character classes (such as[A-Z123]
) and are instead treated as literals.Unicode character classes (
\p{prop}
) are supported with the following differences:All underscores in names must be removed. For example, use
OldItalic
instead ofOld_Italic
.Scripts must be specified directly, without the
Is
,script=
orsc=
prefixes. Example:\p{Hiragana}
Blocks must be specified with the
In
prefix. Theblock=
andblk=
prefixes are not supported. Example:\p{Mongolian}
Categories must be specified directly, without the
Is
,general_category=
orgc=
prefixes. Example:\p{L}
Binary properties must be specified directly, without the
Is
. Example:\p{NoncharacterCodePoint}
- regexp_count(string, pattern) bigint #
Returns the number of occurrence of
pattern
instring
:SELECT regexp_count('1a 2b 14m', '\s*[a-z]+\s*'); -- 3
- regexp_extract_all(string, pattern)#
Returns the substring(s) matched by the regular expression
pattern
instring
:SELECT regexp_extract_all('1a 2b 14m', '\d+'); -- [1, 2, 14]
- regexp_extract_all(string, pattern, group)
Finds all occurrences of the regular expression
pattern
instring
and returns the capturing group numbergroup
:SELECT regexp_extract_all('1a 2b 14m', '(\d+)([a-z]+)', 2); -- ['a', 'b', 'm']
- regexp_extract(string, pattern) varchar #
Returns the first substring matched by the regular expression
pattern
instring
:SELECT regexp_extract('1a 2b 14m', '\d+'); -- 1
- regexp_extract(string, pattern, group) varchar
Finds the first occurrence of the regular expression
pattern
instring
and returns the capturing group numbergroup
:SELECT regexp_extract('1a 2b 14m', '(\d+)([a-z]+)', 2); -- 'a'
- regexp_like(string, pattern) boolean #
Evaluates the regular expression
pattern
and determines if it is contained withinstring
.The
pattern
only needs to be contained withinstring
, rather than needing to match all ofstring
. In other words, this performs a contains operation rather than a match operation. You can match the entire string by anchoring the pattern using^
and$
:SELECT regexp_like('1a 2b 14m', '\d+b'); -- true
- regexp_position(string, pattern) integer #
Returns the index of the first occurrence (counting from 1) of
pattern
instring
. Returns -1 if not found:SELECT regexp_position('I have 23 apples, 5 pears and 13 oranges', '\b\d+\b'); -- 8
- regexp_position(string, pattern, start) integer
Returns the index of the first occurrence of
pattern
instring
, starting fromstart
(includestart
). Returns -1 if not found:SELECT regexp_position('I have 23 apples, 5 pears and 13 oranges', '\b\d+\b', 5); -- 8 SELECT regexp_position('I have 23 apples, 5 pears and 13 oranges', '\b\d+\b', 12); -- 19
- regexp_position(string, pattern, start, occurrence) integer
Returns the index of the nth
occurrence
ofpattern
instring
, starting fromstart
(includestart
). Returns -1 if not found:SELECT regexp_position('I have 23 apples, 5 pears and 13 oranges', '\b\d+\b', 12, 1); -- 19 SELECT regexp_position('I have 23 apples, 5 pears and 13 oranges', '\b\d+\b', 12, 2); -- 31 SELECT regexp_position('I have 23 apples, 5 pears and 13 oranges', '\b\d+\b', 12, 3); -- -1
- regexp_replace(string, pattern) varchar #
Removes every instance of the substring matched by the regular expression
pattern
fromstring
:SELECT regexp_replace('1a 2b 14m', '\d+[ab] '); -- '14m'
- regexp_replace(string, pattern, replacement) varchar
Replaces every instance of the substring matched by the regular expression
pattern
instring
withreplacement
. Capturing groups can be referenced inreplacement
using$g
for a numbered group or${name}
for a named group. A dollar sign ($
) may be included in the replacement by escaping it with a backslash (\$
):SELECT regexp_replace('1a 2b 14m', '(\d+)([ab]) ', '3c$2 '); -- '3ca 3cb 14m'
- regexp_replace(string, pattern, function) varchar
Replaces every instance of the substring matched by the regular expression
pattern
instring
usingfunction
. The lambda expressionfunction
is invoked for each match with the capturing groups passed as an array. Capturing group numbers start at one; there is no group for the entire match (if you need this, surround the entire expression with parenthesis).SELECT regexp_replace('new york', '(\w)(\w*)', x -> upper(x[1]) || lower(x[2])); --'New York'
- regexp_split(string, pattern)#
Splits
string
using the regular expressionpattern
and returns an array. Trailing empty strings are preserved:SELECT regexp_split('1a 2b 14m', '\s*[a-z]+\s*'); -- [1, 2, 14, ]