Enterprise Search Engines - Aggregations

Reading material

  1. https://logz.io/blog/elasticsearch-aggregations/
  2. https://opensearch.org/docs/latest/aggregations/
  3. https://stackoverflow.com/questions/56955558/elasticsearch-aggregate-on-nested-json-data
  4. https://opensearch.org/docs/latest/aggregations/metric/scripted-metric/

Aggregations

Aggregations let you tap into OpenSearch’s powerful analytics engine to analyze your data and extract statistics from it.

The use cases of aggregations vary from analyzing data in real time to take some action to using OpenSearch Dashboards to create a visualization dashboard.

OpenSearch can perform aggregations on massive datasets in milliseconds. Compared to queries, aggregations consume more CPU cycles and memory.

Elasticsearch Aggregations provide you with the ability to group and perform calculations and statistics (such as sums and averages) on your data by using a simple search query. An aggregation can be viewed as a working unit that builds analytical information across a set of documents. Using aggregations, you can extract the data you want by running the GET method in Kibana UI’s Dev Tools. You can also use CURL or APIs in your code. These will query Elasticsearch and return the aggregated result.

Syntax

https://opensearch.org/docs/latest/aggregations/#general-aggregation-structure

Types

https://opensearch.org/docs/latest/aggregations/#types-of-aggregations

Aggregation and Bucket Aggregation

For an nginx web server this produces web hit counts by user city:

curl -XGET --user $pwd --header 'Content-Type: application/json'  https://58571402f5464923883e7be42a037917.eu-central-1.aws.cloud.es.io:9243/logstash/_search?pretty -d '{
    "aggs": {
        "cityName": {
            "terms": {
                "field": "geoip.city_name.keyword",
                "size": 50
            }
        }
    }
}
'

This expands that to product response code count by city in an nginx web server log

curl -XGET --user $pwd --header 'Content-Type: application/json'  https://58571402f5464923883e7be42a037917.eu-central-1.aws.cloud.es.io:9243/logstash/_search?pretty -d '{
    "aggs": {
        "city": {
            "terms": {
                "field": "geoip.city_name.keyword"
            },
            "aggs": {
                "responses": {
                    "terms": {
                        "field": "response"
                    }
                }
            }
        },
        "responses": {
            "terms": {
                "field": "response"
            }
        }
    }
}'

Nested aggregations

https://opensearch.org/docs/latest/aggregations/#nested-aggregations