ArangoDB v3.8 reached End of Life (EOL) and is no longer supported.

This documentation is outdated. Please see the most recent version at docs.arangodb.com

WINDOW

Aggregate adjacent documents or value ranges with a sliding window to calculate running totals, rolling averages, and other statistical properties

The WINDOW operation can be used for aggregations over adjacent documents, or preceding and / or following rows in other words. It can also aggregate based on a value or duration range relative to a document attribute.

The operation performs a COLLECT AGGREGATE-like operation on a set of query rows. However, whereas a COLLECT operation groups multiple query rows into a single result group, a WINDOW operation produces a result for each query row:

  • The row for which function evaluation occurs is called the current row.
  • The query rows related to the current row over which function evaluation occurs, comprise the window frame for the current row.

Window frames are determined with respect to the current row:

  • By defining a window frame to be all rows from the query start to the current row, you can compute running totals for each row.
  • By defining a frame as extending N rows on either side of the current row, you can compute rolling averages.

SyntaxPermalink

There are two syntax variants for WINDOW operations.

Row-based (adjacent documents):

WINDOW { preceding: numPrecedingRows, following: numFollowingRows } AGGREGATE variableName = aggregateExpression

Range-based (value or duration range):

WINDOW rangeValue WITH { preceding: offsetPreceding, following: offsetFollowing } AGGREGATE variableName = aggregateExpression

Calls to the following functions are supported in aggregation expressions:

  • LENGTH() / COUNT()
  • MIN()
  • MAX()
  • SUM()
  • AVERAGE() / AVG()
  • STDDEV_POPULATION() / STDDEV()
  • STDDEV_SAMPLE()
  • VARIANCE_POPULATION() / VARIANCE()
  • VARIANCE_SAMPLE()
  • UNIQUE()
  • SORTED_UNIQUE()
  • COUNT_DISTINCT() / COUNT_UNIQUE()
  • BIT_AND()
  • BIT_OR()
  • BIT_XOR()

Row-based AggregationPermalink

The first syntax form of WINDOW allows aggregating over a fixed number of rows, following or preceding the current row. It is also possible to define that all preceding or following rows should be aggregated ("unbounded"). The number of rows has to be determined at query compile time.

Below query demonstrates the use of window frames to compute running totals as well as rolling averages computed from the current row and the rows that immediately precede and follow it:

Permalink
FOR t IN observations
    SORT t.time
    WINDOW { preceding: 1, following: 1 }
    AGGREGATE rollingAverage = AVG(t.val), rollingSum = SUM(t.val)
    WINDOW { preceding: "unbounded", following: 0}
    AGGREGATE cumulativeSum = SUM(t.val)
    RETURN {
      time: t.time,
      subject: t.subject,
      val: t.val,
      rollingAverage, // average of the window's values
      rollingSum,     // sum of the window's values
      cumulativeSum   // running total
    }
Show query results
Hide query results
[
  {
    "time": "2021-05-25 07:00:00",
    "subject": "st113",
    "val": 10,
    "rollingAverage": 5,
    "rollingSum": 10,
    "cumulativeSum": 10
  },
  {
    "time": "2021-05-25 07:00:00",
    "subject": "xh458",
    "val": 0,
    "rollingAverage": 6.333333333333333,
    "rollingSum": 19,
    "cumulativeSum": 10
  },
  {
    "time": "2021-05-25 07:15:00",
    "subject": "st113",
    "val": 9,
    "rollingAverage": 6.333333333333333,
    "rollingSum": 19,
    "cumulativeSum": 19
  },
  {
    "time": "2021-05-25 07:15:00",
    "subject": "xh458",
    "val": 10,
    "rollingAverage": 14.666666666666666,
    "rollingSum": 44,
    "cumulativeSum": 29
  },
  {
    "time": "2021-05-25 07:30:00",
    "subject": "st113",
    "val": 25,
    "rollingAverage": 13.333333333333334,
    "rollingSum": 40,
    "cumulativeSum": 54
  },
  {
    "time": "2021-05-25 07:30:00",
    "subject": "xh458",
    "val": 5,
    "rollingAverage": 16.666666666666668,
    "rollingSum": 50,
    "cumulativeSum": 59
  },
  {
    "time": "2021-05-25 07:45:00",
    "subject": "st113",
    "val": 20,
    "rollingAverage": 18.333333333333332,
    "rollingSum": 55,
    "cumulativeSum": 79
  },
  {
    "time": "2021-05-25 07:45:00",
    "subject": "xh458",
    "val": 30,
    "rollingAverage": 25,
    "rollingSum": 75,
    "cumulativeSum": 109
  },
  {
    "time": "2021-05-25 08:00:00",
    "subject": "xh458",
    "val": 25,
    "rollingAverage": 27.5,
    "rollingSum": 55,
    "cumulativeSum": 134
  }
]

The row order is controlled by the SORT operation on the time attribute.

The first WINDOW operation aggregates the previous, current, and next row (preceding and following is set to 1) and calculates the average and sum of these three values. In case of the first row, there is no preceding row but a following row, hence the values 10 and 0 are added up to calculate the sum, which is divided by 2 to compute the average. For the second row, the values 10, 0 and 9 are summed up and divided by 3, and so on.

The second WINDOW operation aggregates all previous values (unbounded) to calculate a running sum. For the first row, that is just 10, for the second row it is 10 + 0, for the third 10 + 0 + 9, and so on.

time subject val rollingAverage rollingSum cumulativeSum
2021-05-25 07:00:00 st113 10 5 10 10
2021-05-25 07:00:00 xh458 0 6.333… 19 10
2021-05-25 07:15:00 st113 9 6.333… 19 19
2021-05-25 07:15:00 xh458 10 14.666… 44 29
2021-05-25 07:30:00 st113 25 13.333… 40 54
2021-05-25 07:30:00 xh458 5 16.666… 50 59
2021-05-25 07:45:00 st113 20 18.333… 55 79
2021-05-25 07:45:00 xh458 30 25 75 109
2021-05-25 08:00:00 xh458 25 27.5 55 134

The below query demonstrates the use of window frames to compute running totals within each subject group of time-ordered query rows, as well as rolling sums and averages computed from the current row and the rows that immediately precede and follow it, also per subject group and sorted by time:

Permalink
FOR t IN observations
    COLLECT subject = t.subject INTO group = t
    LET subquery = (FOR t2 IN group
      SORT t2.time
      WINDOW { preceding: 1, following: 1 }
      AGGREGATE rollingAverage = AVG(t2.val), rollingSum = SUM(t2.val)
      WINDOW { preceding: "unbounded", following: 0 }
      AGGREGATE cumulativeSum = SUM(t2.val)
      RETURN {
        time: t2.time,
        subject: t2.subject,
        val: t2.val,
        rollingAverage,
        rollingSum,
        cumulativeSum
      }
    )
    // flatten subquery result
    FOR t2 IN subquery
      RETURN t2
Show query results
Hide query results
[
  {
    "time": "2021-05-25 07:00:00",
    "subject": "st113",
    "val": 10,
    "rollingAverage": 9.5,
    "rollingSum": 19,
    "cumulativeSum": 10
  },
  {
    "time": "2021-05-25 07:15:00",
    "subject": "st113",
    "val": 9,
    "rollingAverage": 14.666666666666666,
    "rollingSum": 44,
    "cumulativeSum": 19
  },
  {
    "time": "2021-05-25 07:30:00",
    "subject": "st113",
    "val": 25,
    "rollingAverage": 18,
    "rollingSum": 54,
    "cumulativeSum": 44
  },
  {
    "time": "2021-05-25 07:45:00",
    "subject": "st113",
    "val": 20,
    "rollingAverage": 22.5,
    "rollingSum": 45,
    "cumulativeSum": 64
  },
  {
    "time": "2021-05-25 07:00:00",
    "subject": "xh458",
    "val": 0,
    "rollingAverage": 5,
    "rollingSum": 10,
    "cumulativeSum": 0
  },
  {
    "time": "2021-05-25 07:15:00",
    "subject": "xh458",
    "val": 10,
    "rollingAverage": 5,
    "rollingSum": 15,
    "cumulativeSum": 10
  },
  {
    "time": "2021-05-25 07:30:00",
    "subject": "xh458",
    "val": 5,
    "rollingAverage": 15,
    "rollingSum": 45,
    "cumulativeSum": 15
  },
  {
    "time": "2021-05-25 07:45:00",
    "subject": "xh458",
    "val": 30,
    "rollingAverage": 20,
    "rollingSum": 60,
    "cumulativeSum": 45
  },
  {
    "time": "2021-05-25 08:00:00",
    "subject": "xh458",
    "val": 25,
    "rollingAverage": 27.5,
    "rollingSum": 55,
    "cumulativeSum": 70
  }
]

If you look at the first row with the subject xh458, then you can see the cumulative sum reset and that the rolling average and sum does not take the previous row into account that belongs to subject st113.

time subject val rollingAverage rollingSum cumulativeSum
2021-05-25 07:00:00 st113 10 9.5 19 10
2021-05-25 07:15:00 st113 9 14.666… 44 19
2021-05-25 07:30:00 st113 25 18 54 44
2021-05-25 07:45:00 st113 20 22.5 45 64
2021-05-25 07:00:00 xh458 0 5 10 0
2021-05-25 07:15:00 xh458 10 5 15 10
2021-05-25 07:30:00 xh458 5 15 45 15
2021-05-25 07:45:00 xh458 30 20 60 45
2021-05-25 08:00:00 xh458 25 27.5 55 70

Range-based AggregationPermalink

The second syntax form of WINDOW allows aggregating over a all documents within a value range. Offsets are differences in attribute values from the current document.

Attribute values have to be numeric. The offset calculations are performed by adding or subtracting the numeric offsets specified in the following and preceding attribute. The offset numbers have to be positive and have to be determined at query compile time. The default offset is 0.

The range based window syntax requires the input rows to be sorted by the row value. To ensure correctness of the result, the AQL optimizer will automatically insert a SORT statement into the query in front of the WINDOW statement. The optimizer may be able to optimize away that SORT statement later if a sorted index is present on the group criteria.

The following query demonstrates the use of window frames to compute totals as well as averages computed from the current document and the documents that have attribute values in t.val in the range of [-10, +5] (inclusive), preceding and following:

Permalink
FOR t IN observations
    WINDOW t.val WITH { preceding: 10, following: 5 }
    AGGREGATE rollingAverage = AVG(t.val), rollingSum = SUM(t.val)
    RETURN {
      time: t.time,
      subject: t.subject,
      val: t.val,
      rollingAverage,
      rollingSum
    }
Show query results
Hide query results
[
  {
    "time": "2021-05-25 07:00:00",
    "subject": "xh458",
    "val": 0,
    "rollingAverage": 2.5,
    "rollingSum": 5
  },
  {
    "time": "2021-05-25 07:30:00",
    "subject": "xh458",
    "val": 5,
    "rollingAverage": 6.8,
    "rollingSum": 34
  },
  {
    "time": "2021-05-25 07:15:00",
    "subject": "st113",
    "val": 9,
    "rollingAverage": 6.8,
    "rollingSum": 34
  },
  {
    "time": "2021-05-25 07:00:00",
    "subject": "st113",
    "val": 10,
    "rollingAverage": 6.8,
    "rollingSum": 34
  },
  {
    "time": "2021-05-25 07:15:00",
    "subject": "xh458",
    "val": 10,
    "rollingAverage": 6.8,
    "rollingSum": 34
  },
  {
    "time": "2021-05-25 07:45:00",
    "subject": "st113",
    "val": 20,
    "rollingAverage": 18,
    "rollingSum": 90
  },
  {
    "time": "2021-05-25 07:30:00",
    "subject": "st113",
    "val": 25,
    "rollingAverage": 25,
    "rollingSum": 100
  },
  {
    "time": "2021-05-25 08:00:00",
    "subject": "xh458",
    "val": 25,
    "rollingAverage": 25,
    "rollingSum": 100
  },
  {
    "time": "2021-05-25 07:45:00",
    "subject": "xh458",
    "val": 30,
    "rollingAverage": 25,
    "rollingSum": 100
  }
]

The value range of the first row is [-10, 5] since val is 0, thus the values from the first and second row are added up to 5 with the average being 2.5. The value range of the last row is [20, 35] as val is 30, which means that the last four rows get aggregated to a sum of 100 and an average of 25 (the range is inclusive, i.e. val falls within the range with a value of 20).

time subject val rollingAverage rollingSum
2021-05-25 07:00:00 xh458 0 2.5 5
2021-05-25 07:30:00 xh458 5 6.8 34
2021-05-25 07:15:00 st113 9 6.8 34
2021-05-25 07:00:00 st113 10 6.8 34
2021-05-25 07:15:00 xh458 10 6.8 34
2021-05-25 07:45:00 st113 20 18 90
2021-05-25 07:30:00 st113 25 25 100
2021-05-25 08:00:00 xh458 25 25 100
2021-05-25 07:45:00 xh458 30 25 100

Duration-based AggregationPermalink

Aggregating by time intervals is a subtype of range-based aggregation that uses the second syntax form of WINDOW but with ISO durations.

To support WINDOW frames over time-series data the WINDOW operation may calculate timestamp offsets using positive ISO 8601 duration strings, like P1Y6M (1 year and 6 months) or PT12H30M (12 hours and 30 minutes). Also see Date functions. In contrast to the ISO 8601 standard, week components may be freely combined with other components. For example, P1WT1H and P1M1W are both valid. Fractional values are only supported for seconds, and only with up to three decimals after the separator, i.e., millisecond precision. For example, PT0.123S is a valid duration while PT0.5H and PT0.1234S are not.

Durations can be specified separately in following and preceding. If such a duration is used, then the attribute value of the current document must be a number and is treated as numeric timestamp in milliseconds. The range is inclusive. If either bound is not specified, it is treated as an empty duration (i.e., P0D).

The following query demonstrates the use of window frames to compute rolling sums and averages over observations in the last 30 minutes (inclusive), based on the document attribute time that is converted from a datetime string to a numeric timestamp:

Permalink
FOR t IN observations
    WINDOW DATE_TIMESTAMP(t.time) WITH { preceding: "PT30M" }
    AGGREGATE rollingAverage = AVG(t.val), rollingSum = SUM(t.val)
    RETURN {
      time: t.time,
      subject: t.subject,
      val: t.val,
      rollingAverage,
      rollingSum
    }
Show query results
Hide query results
[
  {
    "time": "2021-05-25 07:00:00",
    "subject": "st113",
    "val": 10,
    "rollingAverage": 5,
    "rollingSum": 10
  },
  {
    "time": "2021-05-25 07:00:00",
    "subject": "xh458",
    "val": 0,
    "rollingAverage": 5,
    "rollingSum": 10
  },
  {
    "time": "2021-05-25 07:15:00",
    "subject": "st113",
    "val": 9,
    "rollingAverage": 7.25,
    "rollingSum": 29
  },
  {
    "time": "2021-05-25 07:15:00",
    "subject": "xh458",
    "val": 10,
    "rollingAverage": 7.25,
    "rollingSum": 29
  },
  {
    "time": "2021-05-25 07:30:00",
    "subject": "st113",
    "val": 25,
    "rollingAverage": 9.833333333333334,
    "rollingSum": 59
  },
  {
    "time": "2021-05-25 07:30:00",
    "subject": "xh458",
    "val": 5,
    "rollingAverage": 9.833333333333334,
    "rollingSum": 59
  },
  {
    "time": "2021-05-25 07:45:00",
    "subject": "st113",
    "val": 20,
    "rollingAverage": 16.5,
    "rollingSum": 99
  },
  {
    "time": "2021-05-25 07:45:00",
    "subject": "xh458",
    "val": 30,
    "rollingAverage": 16.5,
    "rollingSum": 99
  },
  {
    "time": "2021-05-25 08:00:00",
    "subject": "xh458",
    "val": 25,
    "rollingAverage": 21,
    "rollingSum": 105
  }
]

With a time of 07:30:00, everything from 07:00:00 to 07:30:00 on the same day falls within the duration range with preceding: "PT30M", thus aggregating the top six rows to a sum of 59 and an average of 9.8333….

time subject val rollingAverage rollingSum
2021-05-25 07:00:00 st113 10 5 10
2021-05-25 07:00:00 xh458 0 5 10
2021-05-25 07:15:00 st113 9 7.25 29
2021-05-25 07:15:00 xh458 10 7.25 29
2021-05-25 07:30:00 st113 25 9.8333… 59
2021-05-25 07:30:00 xh458 5 9.8333… 59
2021-05-25 07:45:00 st113 20 16.5 99
2021-05-25 07:45:00 xh458 30 16.5 99
2021-05-25 08:00:00 xh458 25 21 105