Switching from using Elasticsearch to OpenSearch
Switching from using Elasticsearch to OpenSearch
https://forum.opensearch.org/t/spring-data-integration/6239/15
It looks like, the smoothest transition from Elasticsearch to Opensearch is, if the application is making HTTP calls as if it is calling any other external application - instead of using frameworks (like Jest) to connect to Elasticsearch directly.
Issues faced
Request size exceeded 10485760 bytes
When trying to insert a document of a large size from Opensearch workbench in the UI, I saw this error.
Create a REST endpoint in the java application and call the insert operation using opensearch-java client. Put the json content in a file in resources and read it in the java application and insert it into opensearch.
Request failed: [mapper_parsing_exception] failed to parse field [perStartDate] of type [date]
With elasticsearch, the model object was using Date
as the type for a specific field. It looks like elasticsearch was converting it into the format "yyyy-MM-dd hh:mm:ss a"
without any issues.
While the application is retrieving the document from elasticsearch, the application is reading the date field and parsing it without any issues.
And opensearch is not doing it.
While trying to insert an object with a Date
field in it, I saw this error.
opensearch client is failing to parse fields of type Date
org.opensearch.client.opensearch._types.OpenSearchException: Request failed: [mapper_parsing_exception] failed to parse field [createdOn] of type [date] in document with id 'M7f5E5EBLWvOEL9OZOnz'. Preview of field's value: '1722616997084'
org.opensearch.client.opensearch._types.OpenSearchException: Request failed: [mapper_parsing_exception] failed to parse field [perStartDate] of type [date] in document with id '74ZUSJEBU-iosdOntBJ1'. Preview of field's value: '1569888000000'
I saw the error if the mapping for the element in the json file looks like this:
"createdOn": {
"type": "date",
"format": "yyyy-MM-dd hh:mm:ss a"
}
If we change the mapping for the element to look like this, the opensearch client is able to parse the Date
fields correctly.
"createdOn": {
"type": "date",
"format" : "strict_date_optional_time||epoch_millis"
}
But the team is concerned about changing the format of the element. The reason is, in addition to this java application, there are a lot of other components that are working with data from this opensearch domain. e.g. some redshift jobs, some other applications, etc. So, changing the format of the field is not a good idea. How do we solve for this?
How to fix this error?
Option 1: Customize the format using com.fasterxml.jackson.annotation.JsonFormat
This is the best solution.
import lombok.Data;
import java.util.Date;
import com.fasterxml.jackson.annotation.JsonFormat;
@Data
public class MyDocument {
private String docId;
private String docName, userName, status, s3Objectkey;
@JsonFormat(pattern = "yyyy-MM-dd hh:mm:ss a")
private Date createdOn;
private byte[] fileBytes;
private String reqJson;
}
Option 2: Use the default format for date instead of using custom ones
(Do not use this option. Use Option 1)
https://opensearch.org/docs/latest/field-types/supported-field-types/date/
https://www.elastic.co/guide/en/elasticsearch/reference/current/date.html
Use the default format.
"format" : "strict_date_optional_time||epoch_millis"
Or, better yet, do not use any formatting at all.
Advantages with this approach:
- The code is simple. There is no need for parsing the values of dates.
- All the built-in conditions work without any issues.
GET /documents2/_search { "sort": [ { "createdOn": { "order": "desc" } } ], "query": { "match_all": {} } }
Option 3: Store it as a string
(Do not use this option. Use Option 1)
Store it as a formatted date in Opensearch and parse the value before insertions and after retrievals. However, this is painful to deal with. During insertion and retrieval.
One option to fix this is to: Change the model object to make the type of the field a string or a generic
.
public class MyDocument<T> {
private Long id;
private String docName;
private String userName;
private String status;
private String createdOn;
private String reqJson;
}
Before indexing with opensearch, set the field appropriately:
myDocument.setCreatedOn(new SimpleDateFormat("yyyy-MM-dd hh:mm:ss a").format(new Date()));
UnexpectedJsonEventException while parsing opensearch request object
org.opensearch.client.json.UnexpectedJsonEventException: Unexpected JSON event 'START_ARRAY' instead of '[START_OBJECT, KEY_NAME, VALUE_STRING, VALUE_TRUE, VALUE_FALSE]'
change version number in pom.xml to
<!-- https://mvnrepository.com/artifact/org.opensearch.client/opensearch-java -->
<dependency>
<groupId>org.opensearch.client</groupId>
<artifactId>opensearch-java</artifactId>
<version>2.13.0</version>
</dependency>
Breaking painless language scripts
Scripts that were working with Elasticsearch are breaking in Opensearch.
To debug errors like this, run the queries in the UI workbenches and then look at the error message from Opensearch to figure out the root cause of the issue.
Working in Elasticsearch:
GET /my-index/_search
{
"from": 0,
"size": 0,
"query": {
"terms": {
"progAcronym.keyword": [
"A-PROGRAM-ACRONYM"
]
}
},
"aggs": {
"df": {
"terms": {
"field": "formShortTitle.keyword",
"size": 100000,
"order": {
"_term": "asc"
}
},
"aggs": {
"form_id": {
"terms": {
"field": "formId",
"size": 100000,
"order": {
"edo": "desc"
}
},
"aggs": {
"fst": {
"terms": {
"script": {
"lang": "painless",
"source": """
def efMonth = doc.effectiveDate.date.monthOfYear;
def efYear = doc.effectiveDate.date.year;
def exMonth = doc.expirationDate.date.monthOfYear;
def exYear = doc.expirationDate.date.year;
def efFiscalYear = efMonth >=9 ?efYear+1: efYear;
def exFiscalYear = exMonth>9 ?exYear-1:exYear;
return doc['formShortTitle.keyword'].value
+' Ver: '+doc['formVersionNumber'].value
+' (FY: '+efFiscalYear+' To '
+ exFiscalYear+')'
"""
},
"size": 1
}
},
"edo": {
"max": {
"field": "effectiveDate"
}
}
}
}
}
}
}
}
Changed version that is working in Opensearch.
The reason is: Painless language supports
- https://www.elastic.co/guide/en/elasticsearch/painless/current/painless-datetime.html
- https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/time/LocalDate.html
According to the documentation, convert the date to LocalDate and then access the values from it.
GET /my-index/_search
{
"from": 0,
"size": 0,
"query": {
"terms": {
"progAcronym.keyword": [
"A-PROGRAM-ACRONYM"
]
}
},
"aggs": {
"df": {
"terms": {
"field": "formShortTitle.keyword",
"size": 100000,
"order": {
"_term": "asc"
}
},
"aggs": {
"form_id": {
"terms": {
"field": "formId",
"size": 100000,
"order": {
"edo": "desc"
}
},
"aggs": {
"fst": {
"terms": {
"script": {
"lang": "painless",
"source": """
def efMonth = doc.effectiveDate.value.toLocalDate().getMonth().getValue();
def efYear = doc.effectiveDate.value.toLocalDate().getYear();
def exMonth = doc.expirationDate.value.toLocalDate().getMonth().getValue();
def exYear = doc.expirationDate.value.toLocalDate().getYear();
def efFiscalYear = efMonth >=9 ?efYear+1: efYear;
def exFiscalYear = exMonth>9 ?exYear-1:exYear;
return doc['formShortTitle.keyword'].value
+' Ver: '+doc['formVersionNumber'].value
+' (FY: '+efFiscalYear+' To '
+ exFiscalYear+')'
"""
},
"size": 1
}
},
"edo": {
"max": {
"field": "effectiveDate"
}
}
}
}
}
}
}
}