# Row pattern recognition in window structures#

Примечание

Ниже приведена оригинальная документация Trino. Скоро мы ее переведем на русский язык и дополним полезными примерами.

A window structure can be defined in the `WINDOW`

clause or in the `OVER`

clause of a window operation. In both cases, the window specification can
include row pattern recognition clauses. They are part of the window frame. The
syntax and semantics of row pattern recognition in window are similar to those
of the MATCH_RECOGNIZE clause.

This section explains the details of row pattern recognition in window structures, and highlights the similarities and the differences between both pattern recognition mechanisms.

## Window with row pattern recognition#

**Window specification:**

```
(
[ existing_window_name ]
[ PARTITION BY column [, ...] ]
[ ORDER BY column [, ...] ]
[ window_frame ]
)
```

**Window frame:**

```
[ MEASURES measure_definition [, ...] ]
frame_extent
[ AFTER MATCH skip_to ]
[ INITIAL | SEEK ]
[ PATTERN ( row_pattern ) ]
[ SUBSET subset_definition [, ...] ]
[ DEFINE variable_definition [, ...] ]
```

Generally, a window frame specifies the `frame_extent`

, which defines the
«sliding window» of rows to be processed by a window function. It can be
defined in terms of `ROWS`

, `RANGE`

or `GROUPS`

.

A window frame with row pattern recognition involves many other syntactical
components, mandatory or optional, and enforces certain limitations on the
`frame_extent`

.

**Window frame with row pattern recognition:**

```
[ MEASURES measure_definition [, ...] ]
ROWS BETWEEN CURRENT ROW AND frame_end
[ AFTER MATCH skip_to ]
[ INITIAL | SEEK ]
PATTERN ( row_pattern )
[ SUBSET subset_definition [, ...] ]
DEFINE variable_definition [, ...]
```

## Description of the pattern recognition clauses#

The `frame_extent`

with row pattern recognition must be defined in terms of
`ROWS`

. The frame start must be at the `CURRENT ROW`

, which limits the
allowed frame extent values to the following:

```
ROWS BETWEEN CURRENT ROW AND CURRENT ROW
ROWS BETWEEN CURRENT ROW AND <expression> FOLLOWING
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
```

For every input row processed by the window, the portion of rows enclosed by
the `frame_extent`

limits the search area for row pattern recognition. Unlike
in `MATCH_RECOGNIZE`

, where the pattern search can explore all rows until the
partition end, and all rows of the partition are available for computations, in
window structures the pattern matching can neither match rows nor retrieve
input values outside the frame.

Besides the `frame_extent`

, pattern matching requires the `PATTERN`

and
`DEFINE`

clauses.

The `PATTERN`

clause specifies a row pattern, which is a form of a regular
expression with some syntactical extensions. The row pattern syntax is similar
to the row pattern syntax in MATCH_RECOGNIZE.
However, the anchor patterns `^`

and `$`

are not allowed in a window
specification.

The `DEFINE`

clause defines the row pattern primary variables in terms of
boolean conditions that must be satisfied. It is similar to the
DEFINE clause of MATCH_RECOGNIZE.
The only difference is that the window syntax does not support the
`MATCH_NUMBER`

function.

The `MEASURES`

clause is syntactically similar to the
MEASURES clause of MATCH_RECOGNIZE. The only
limitation is that the `MATCH_NUMBER`

function is not allowed. However, the
semantics of this clause differs between `MATCH_RECOGNIZE`

and window.
While in `MATCH_RECOGNIZE`

every measure produces an output column, the
measures in window should be considered as **definitions** associated with the
window structure. They can be called over the window, in the same manner as
regular window functions:

```
SELECT cust_key, value OVER w, label OVER w
FROM orders
WINDOW w AS (
PARTITION BY cust_key
ORDER BY order_date
MEASURES
RUNNING LAST(total_price) AS value,
CLASSIFIER() AS label
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
PATTERN (A B+ C+)
DEFINE
B AS B.value < PREV (B.value),
C AS C.value > PREV (C.value)
)
```

Measures defined in a window can be referenced in the `SELECT`

clause and in
the `ORDER BY`

clause of the enclosing query.

The `RUNNING`

and `FINAL`

keywords are allowed in the `MEASURES`

clause.
They can precede a logical navigation function `FIRST`

or `LAST`

, or an
aggregate function. However, they have no effect. Every computation is
performed from the position of the final row of the match, so the semantics is
effectively `FINAL`

.

The `AFTER MATCH SKIP`

clause has the same syntax as the
AFTER MATCH SKIP clause of MATCH_RECOGNIZE.

The `INITIAL`

or `SEEK`

modifier is specific to row pattern recognition in
window. With `INITIAL`

, which is the default, the pattern match for an input
row can only be found starting from that row. With `SEEK`

, if there is no
match starting from the current row, the engine tries to find a match starting
from subsequent rows within the frame. As a result, it is possible to associate
an input row with a match which is detached from that row.

The `SUBSET`

clause is used to define union variables as sets of primary pattern variables. You can
use union variables to refer to a set of rows matched to any primary pattern
variable from the subset:

```
SUBSET U = (A, B)
```

The following expression returns the `total_price`

value from the last row
matched to either `A`

or `B`

:

```
LAST(U.total_price)
```

If you want to refer to all rows of the match, there is no need to define a
`SUBSET`

containing all pattern variables. There is an implicit *universal
pattern variable* applied to any non prefixed column name and any
`CLASSIFIER`

call without an argument. The following expression returns the
`total_price`

value from the last matched row:

```
LAST(total_price)
```

The following call returns the primary pattern variable of the first matched row:

```
FIRST(CLASSIFIER())
```

In window, unlike in `MATCH_RECOGNIZE`

, you cannot specify `ONE ROW PER MATCH`

or `ALL ROWS PER MATCH`

. This is because all calls over window,
whether they are regular window functions or measures, must comply with the
window semantics. A call over window is supposed to produce exactly one output
row for every input row. And so, the output mode of pattern recognition in
window is a combination of `ONE ROW PER MATCH`

and `WITH UNMATCHED ROWS`

.

## Processing input with row pattern recognition#

Pattern recognition in window processes input rows in two different cases:

upon a row pattern measure call over the window:

some_measure OVER w

upon a window function call over the window:

sum(total_price) OVER w

The output row produced for each input row, consists of:

all values from the input row

the value of the called measure or window function, computed with respect to the pattern match associated with the row

Processing the input can be described as the following sequence of steps:

Partition the input data accordingly to

`PARTITION BY`

Order each partition by the

`ORDER BY`

expressions- For every row of the ordered partition:
- If the row is „skipped“ by a match of some previous row:
For a measure, produce a one-row output as for an unmatched row

For a window function, evaluate the function over an empty frame and produce a one-row output

- Otherwise:
Determine the frame extent

Try match the row pattern starting from the current row within the frame extent

If no match is found, and

`SEEK`

is specified, try to find a match starting from subsequent rows within the frame extent

- If no match is found:
For a measure, produce a one-row output for an unmatched row

For a window function, evaluate the function over an empty frame and produce a one-row output

- Otherwise:
For a measure, produce a one-row output for the match

For a window function, evaluate the function over a frame limited to the matched rows sequence and produce a one-row output

Evaluate the

`AFTER MATCH SKIP`

clause, and mark the „skipped“ rows

## Empty matches and unmatched rows#

If no match can be associated with a particular input row, the row is
*unmatched*. This happens when no match can be found for the row. This also
happens when no match is attempted for the row, because it is skipped by the
`AFTER MATCH SKIP`

clause of some preceding row. For an unmatched row,
every row pattern measure is `null`

. Every window function is evaluated over
an empty frame.

An *empty match* is a successful match which does not involve any pattern
variables. In other words, an empty match does not contain any rows. If an
empty match is associated with an input row, every row pattern measure for that
row is evaluated over an empty sequence of rows. All navigation operations and
the `CLASSIFIER`

function return `null`

. Every window function is evaluated
over an empty frame.

In most cases, the results for empty matches and unmatched rows are the same. A constant measure can be helpful to distinguish between them:

The following call returns `'matched'`

for every matched row, including empty
matches, and `null`

for every unmatched row:

```
matched OVER (
...
MEASURES 'matched' AS matched
...
)
```