Enterprise Search Engines
Table of Contents
TODO
The Complete Guide to the ELK Stack https://logz.io/learn/complete-guide-elk-stack/
Enterprise Search Engines Concepts
Analogy for easy understanding
Enterprise Search Engines | Database |
---|---|
Index | An index is an equivalent of a relational database |
Document | Table |
Field | Column |
OpenSearch/Elasticsearch Terminology | Description |
---|---|
Cluster | A cluster is a collection of one or more nodes (servers) that together holds your entire data and provides federated indexing and search capabilities across all nodes. |
A cluster is identified by a unique name which by default is ‘opensearch’ (for opensearch clusters) and ‘elasticsearch’ for elasticsearch clusters. . | |
The cluster name is important because a node can only be part of a cluster if the node is set up to join the cluster by its name. | |
Make sure that you don’t reuse the same cluster names in different environments, otherwise you might end up with nodes joining the wrong cluster. | |
For instance you could use logging-dev, logging-stage, and logging-prod for the development, staging, and production clusters. | |
An OpenSearch cluster is one or more OpenSearch nodes with the same cluster identification. | |
Node | A node is a single server that is part of your cluster, stores your data, and participates in the cluster’s indexing and search capabilities. |
Just like a cluster, a node is identified by a name. | |
You can define any node name you want if you do not want the default. | |
This name is important for administration purposes where you want to identify which servers in your network correspond to which nodes in your OpenSearch/Elasticsearch cluster. | |
An OpenSearch node is a single OpenSearch process, and the minimum number of nodes for a highly available OpenSearch cluster is three. | |
Index | An index is a collection of documents that have somewhat similar characteristics. |
In a single cluster, you can define as many indexes as you want. | |
An index is an equivalent of a relational database. | |
An OpenSearch index is a collection of documents in OpenSearch. Each index is split into shards. | |
Type | Type is the OpenSearch/Elasticsearch meta object where the mapping for an index is stored. |
Alias | Alias is a reference to an OpenSearch/Elasticsearch index. An alias can be mapped to more than one index. |
Document | A document is a basic unit of information that can be indexed. |
This document is expressed in JSON format. | |
Connected query returns parent and child rows. Child information is attached to the main query and is sent as one document. | |
Shard | Elasticsearch provides the ability to subdivide your index into multiple pieces called shards. |
OpenSearch shards enable parallelization of data processing across both single and multiple OpenSearch nodes. | |
By default, OpenSearch automatically manages shard allocation within the node(s). | |
Optimizing shards is an important component of improving OpenSearch performance. | |
OpenSearch provides the ability to subdivide your index into multiple pieces called shards. | |
When you create an index, you can simply define the number of shards that you want. | |
Each shard is in itself a fully-functional and independent ‘index’ that can be hosted on any node in the cluster. | |
Replica | OpenSearch/Elasticsearch allows you to make one or more copies of your index’s shards into what are called replica shards, or replicas for short. |
OpenSearch replicas serve as a backup for shards and also aid in search performance by providing additional capacity. | |
OpenSearch automatically creates five primary shards and one replica for every index. | |
You can add or remove replicas at any time to scale out query processing. | |
After the index is created, you may change the number of replicas dynamically anytime but you cannot change the number of shards after-the-fact. | |
Port | The default OpenSearch port is 9200/tcp. |
The OpenSearch port can be modified in the configuration file, opensearch.yml. | |
Query | OpenSearch queries are sub-divided into two categories: leaf queries and compound queries. |
OpenSearch leaf queries search for specific values within a field or field(s). | |
OpenSearch compound queries combine multiple queries together. | |
Pagination | OpenSearch pagination is the setting to return a maximum number of results. |
This number changes frequently. | |
OpenSearch pagination can be changed by adding a size parameter to the search request. | |
Managed OpenSearch | Managed Opensearch provides 24/7 monitoring, support, and maintenance to maximize performance and uptime. |
Managed OpenSearch is typically provided by a team of engineers who have extensive experience with OpenSearch management. | |
Hosted OpenSearch | Hosted OpenSearch is a type of Managed OpenSearch where the service provider hosts their clients’ clusters in the service provider’s own environment. |
Hosted OpenSearch tends to increase latency and cost more than when OpenSearch is run in a client’s own environment. It also opens up the client to additional security risk. |
Reading material
- https://en.wikipedia.org/wiki/Apache_Lucene
- https://docs.oracle.com/cd/F44947_01/pt858pbr3/eng/pt/tpst/concept_ElasticsearchConceptsAndTerminology.html
- https://docs.oracle.com/cd/F88569_01/pt861pbr1/eng/pt/tpst/ConceptsAndTerminology.html
- https://dattell.com/data-architecture-blog/opensearch-terms-and-definitions/
- https://docs.oracle.com/cd/F44947_01/pt858pbr3/eng/pt/tpst/concept_ElasticsearchConceptsAndTerminology.html