Version: 7.0

GROUP operator

The GROUP operator creates a property implementing grouping.

Syntax

GROUP 
type [expr1, ..., exprN]
[orderClause]
[TOP topExpr] [OFFSET offsetExpr]
[WHERE whereExpr]
[BY groupExpr1, ..., groupExprM]

Where type is defined as:

SUM
MAX
MIN
AGGR
NAGGR
EQUAL
CONCAT
LAST
CUSTOM [NULL] [className] aggrFunc

And orderClause is defined as:

[WITHIN] ORDER [DESC] orderExpr1, ..., orderExprK

Description

The GROUP operator creates a property implementing grouping. The type of grouping is determined by the type of the aggregate function. This operator differs from others in that it can implicitly declare its parameters in the expressions used (by analogy with the = statement when the parameters are not defined explicitly). At the same time, it is important to understand that these "implicitly declared" parameters are not parameters of the created property (which are actually determined by the BY block and / or the upper parameters used)

The BY block describes group expressions. Each expression corresponds to a parameter of the property being created. As in other operators, upper parameters can be used in this operator, and the used parameters also implicitly become groups of the created property. If the BY block is omitted and no upper parameters are used, all matching object collections form a single group and the created property has no parameters. Accordingly, when using the operator in the = statement and explicitly defining the parameters on the left, the expressions from the BY block are mapped only for unused parameters. Moreover, if the classes or the number of these parameters do not match the number / classes of BY expressions then the platform will throw an error.

info

If a BY block is defined, this operator cannot be used inside expressions: it is only allowed as the entire definition of a property — the right side of the = statement — or as an inline definition in the brackets of the JOIN operator. In the latter case the applied form [GROUP ... BY ...](...) is an ordinary expression again. Another way to use such a value inside an expression is to first write it into a separate (for example, local) property and use that property instead.

The ORDER block defines the order in which the aggregate function will be calculated. It is mandatory for CONCAT and LAST. For commutative aggregators (SUM, MAX, MIN, EQUAL, AGGR, NAGGR) it may also be specified; in that case the order does not change the final aggregate value by itself, but it affects which records participate in aggregation when TOP / OFFSET are used. If the function is non-commutative, the order should be specified so that it is uniquely determined. If a new parameter is declared in the expressions specifying the order (i.e. parameter is not used in the remaining blocks or in the upper context), the condition of non-NULLness of all these expressions is automatically added.

For CUSTOM aggregates, the WITHIN keyword does not introduce some separate "extra ordering"; instead, it selects the SQL call form of the user-defined or DBMS built-in aggregate function. Without WITHIN, the order is passed inside the aggregate call, so a construction like GROUP CUSTOM ... 'aggrFunc' expr1, ..., exprN ORDER orderExpr1, ... corresponds to aggrFunc(expr1, ..., exprN ORDER BY orderExpr1, ...). With WITHIN, the ordered-set form is used instead, so GROUP CUSTOM ... 'aggrFunc' expr1, ..., exprN WITHIN ORDER orderExpr1, ... corresponds to aggrFunc(expr1, ..., exprN) WITHIN GROUP (ORDER BY orderExpr1, ...). This matters for aggregates that are defined in the DBMS specifically via WITHIN GROUP, such as percentile_cont.

From the result perspective, the difference is which values are treated as the aggregate input. Without WITHIN, the aggregate computes its result from the values of expr1, ..., exprN, while ORDER only specifies the processing order of those values. With WITHIN, the result is computed from the ordered set of values coming from ORDER: these values form the sample over which the aggregate is calculated, while expr1, ..., exprN become parameters of the aggregate function itself. For example, in GROUP CUSTOM ... 'percentile_cont' 0.9 WITHIN ORDER value(i), the expression 0.9 specifies which percentile is requested, and the result is the 90th percentile of value(i) values inside the group.

The TOP and OFFSET blocks restrict the subset of records already selected for aggregation inside each group: first OFFSET skips the leading records, then the next TOP records are taken in the specified order. Either block may be specified independently: if TOP is omitted, all records after the offset participate; if OFFSET is omitted, the first TOP records are taken; if both are omitted, all records of the group participate. If the ORDER block is omitted, these records are selected in arbitrary order.

The WHERE block defines the condition under which object collections will participate in the group operation. It can only be used with the aggregate functions AGGR, NAGGR, CONCAT, and LAST. For LAST, if WHERE is omitted, non-NULLness of the aggregated expression itself is used as the condition.

For AGGR and NAGGR, when more than one object collection falls into the same group — so the single aggregated object assumed per group does not hold — the aggregated object is chosen deterministically, as the maximum of the candidates, instead of an arbitrary one. For EQUAL, the same deterministic choice, as the maximum, is used when the operand values within a group are not all equal. In both cases, for non-comparable value types, such as files, the choice stays arbitrary.

info

For AGGR and NAGGR using this block explicitly (and not, say, an IF operator in GROUP and BY blocks) only makes sense from the perspective of being able to change the created property to non-NULL in some automatic mechanisms of the platform (for example, automatic resolution of simple constraints).

Parameters

type

Type of aggregate function. Can be one of: SUM, MAX, MIN, AGGR, NAGGR, EQUAL, CONCAT, LAST, CUSTOM.
NULL

Keyword. Used only for CUSTOM aggregates and specifies that the aggregate may return NULL even when all parameter values are non-NULL.
className

Name of the built-in class of the returned value. Used only for CUSTOM aggregates and allows the result type to be set explicitly. If it is omitted, the result type is inferred from the main expression of the aggregate: usually from the first main operand, and for the WITHIN form or the form without main operands from the first ORDER expression.
aggrFunc

String literal containing the name of a user-defined or DBMS built-in aggregate function. Used only for CUSTOM aggregates.
expr1, ..., exprN

A list of expressions whose values are passed to the aggregate function as operands. The number of expressions should correspond to the number of operands of the function used. SUM, MAX, MIN, EQUAL, AGGR, and NAGGR support one expression. For SUM, this expression must have a numeric class from the IntegralClass family, for example INTEGER, LONG, NUMERIC, or DOUBLE. For AGGR and NAGGR, the operand must be a simple object parameter rather than an arbitrary expression. CONCAT supports either two expressions, the aggregated value and the separator, or one JSON / JSONTEXT expression. LAST uses one expression, while the filtering condition is defined either by WHERE or automatically by non-NULLness of that expression. For CUSTOM aggregates, the list may be empty, but then the ORDER / WITHIN ORDER block is mandatory.
groupExpr1, ..., groupExprM

List of group expressions. If the list is omitted and no upper parameters are used, all matching object collections form a single group.
WITHIN

Keyword. Used only for CUSTOM aggregates and enables the ordered-set form of the aggregate function call. Without WITHIN, expressions from ORDER are passed inside the aggregate call: aggrFunc(... ORDER BY ...). With WITHIN, they are moved into a separate SQL clause: aggrFunc(...) WITHIN GROUP (ORDER BY ...). Functionally, this means that without WITHIN the aggregate works on the values of the main expressions, while with WITHIN it works on the ordered values from ORDER, and the main expressions become parameters of the function itself. It is not applicable to other aggregate types.
DESC

Keyword. Specifies a reverse iteration order for object collections.
orderExpr1, ..., orderExprK

A list of expressions that define the order in which object collections will be iterated over when calculating the aggregate function. To determine the order, first the value of the first expression is used; then, if equal, the value of the second is used, etc. This list is mandatory for CONCAT and LAST. For CUSTOM aggregates it is mandatory if the main expression list is omitted.
TOP topExpr

Only the first n records will participate in aggregation inside each group, where n is the value of expression topExpr.
OFFSET offsetExpr

Only records starting from offset m will participate in aggregation inside each group, where m is the value of expression offsetExpr.
whereExpr

Filtering expression. Only object groups for which the value of the filtering expression is not NULL will participate in the grouping.

Examples

CLASS Game;
CLASS Team;
hostGoals = DATA INTEGER (Game);
hostTeam = DATA Team (Game);
date = DATA DATE (Game);
hostGoalsScored(team) = GROUP SUM hostGoals(Game game) BY hostTeam(game);
last3HostGoalsScored(team) = GROUP SUM hostGoals(Game game) ORDER DESC date(game), game TOP 3 BY hostTeam(game);

name = DATA STRING[100] (Country);
// property (STRING[100]) -> Country is obtained
countryName = GROUP AGGR Country country BY name(country); 

CLASS Book;
CLASS Tag;
name = DATA STRING[100] (Tag);
in = DATA BOOLEAN (Book, Tag);

tags(Book b) = GROUP CONCAT name(Tag t) IF in(b, t), ', ' ORDER name(t), t TOP 10 OFFSET 0;

// CONCAT with an explicit WHERE block — filtering whole object collections, as an alternative to filtering the operand with IF
selected = DATA BOOLEAN (Tag);
selectedBookTags(Book b) = GROUP CONCAT name(Tag t), ', ' ORDER name(t), t WHERE in(b, t) AND selected(t);

// max and min host goals per team
maxHostGoals(team) = GROUP MAX hostGoals(Game game) BY hostTeam(game);
minHostGoals(team) = GROUP MIN hostGoals(Game game) BY hostTeam(game);

// host goals scored in the latest home game of each team
latestHostGoals(team) = GROUP LAST hostGoals(Game game) ORDER date(game), game BY hostTeam(game);

// host city of the team, taken as the common value across all home games
// (with an implicit constraint that it is the same)
city = DATA STRING[100] (Game);
teamCity(team) = GROUP EQUAL city(Game game) BY hostTeam(game);

// reverse lookup STRING[100] -> Country without a uniqueness constraint
// (uniqueness is assumed to follow from how Country.name is filled)
countryByName = GROUP NAGGR Country country BY name(country);

value = DATA NUMERIC[14,2] (INTEGER);
// 90th percentile of value(i)
percentile90() = GROUP CUSTOM NUMERIC[14,2] 'percentile_cont' 0.9 WITHIN ORDER value(INTEGER i);

Syntax​

Description​

Parameters​

Examples​

Syntax

Description

Parameters

Examples