Elasticsearch replica shards
Resources utilized:
https://stackoverflow.com/questions/15694724/shards-and-replicas-in-elasticsearch
https://www.elastic.co/guide/en/elasticsearch/reference/current/elasticsearch-intro.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/diagnose-unassigned-shards.html
https://opster.com/guides/elasticsearch/operations/elasticsearch-cat-shards/
https://bigdataboutique.com/blog/fixing-elasticsearch-unassigned-shards-644152
https://discuss.elastic.co/t/why-is-replica-shard-in-unassigned-state-if-it-exists-on-disk/118001/2
https://docs.securityonion.net/en/2.4/release-notes.html#known-issues
https://www.elastic.co/guide/en/cloud/current/ec-api-deployment-crud.html#ec_update_a_deployment
https://www.elastic.co/guide/en/elasticsearch/reference/8.13/modules-node.html
https://www.elastic.co/guide/en/elasticsearch/reference/8.13/migrate-index-allocation-filters.html
Error photo in Grid section of security onion console:
Location of log file - /opt/so/log/elasticsearch/
Log output in text:
[2024-03-29T11:44:19.913+00:00][WARN ][plugins.licensing] License information could not be obtained from Elasticsearch due to ConnectionError: connect ECONNREFUSED 192.168.80.50:9200 error
[2024-03-29T11:44:47.384+00:00][WARN ][plugins.licensing] License information could not be obtained from Elasticsearch due to ConnectionError: connect ECONNREFUSED 192.168.80.50:9200 error
[2024-03-29T11:44:49.912+00:00][WARN ][plugins.licensing] License information could not be obtained from Elasticsearch due to ConnectionError: connect ECONNREFUSED 192.168.80.50:9200 error
[2024-03-29T11:45:07.398+00:00][ERROR][plugins.security.authentication] License is not available, authentication is not possible.
[2024-03-31T23:59:57,631][WARN ][rest.suppressed ] path: /.ds-logs-*/_eql/search, params: {ignore_unavailable=true, index=.ds-logs-*}
org.elasticsearch.action.search.SearchPhaseExecutionException: Partial shards failure
at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:729) ~[elasticsearch-8.10.4.jar:?]
at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:433) ~[elasticsearch-8.10.4.jar:?]
at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:761) ~[elasticsearch-8.10.4.jar:?]
at org.elasticsearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:513) ~[elasticsearch-8.10.4.jar:?]
at org.elasticsearch.action.search.AbstractSearchAsyncAction$1.onFailure(AbstractSearchAsyncAction.java:350) ~[elasticsearch-8.10.4.jar:?]
at org.elasticsearch.action.ActionListenerImplementations.safeAcceptException(ActionListenerImplementations.java:62) ~[elasticsearch-8.10.4.jar:?]
at org.elasticsearch.action.ActionListenerImplementations.safeOnFailure(ActionListenerImplementations.java:73) ~[elasticsearch-8.10.4.jar:?]
at org.elasticsearch.action.DelegatingActionListener.onFailure(DelegatingActionListener.java:27) ~[elasticsearch-8.10.4.jar:?]
at org.elasticsearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:54) ~[elasticsearch-8.10.4.jar:?]
at org.elasticsearch.action.search.SearchTransportService$ConnectionCountingHandler.handleException(SearchTransportService.java:630) ~[elasticsearch-8.10.4.jar:?]
at org.elasticsearch.transport.TransportService$UnregisterChildTransportResponseHandler.handleException(TransportService.java:1707) ~[elasticsearch-8.10.4.jar:?]
at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1424) ~[elasticsearch-8.10.4.jar:?]
at org.elasticsearch.transport.TransportService$DirectResponseChannel.processException(TransportService.java:1560) ~[elasticsearch-8.10.4.jar:?]
at org.elasticsearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1535) ~[elasticsearch-8.10.4.jar:?]
at org.elasticsearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:51) ~[elasticsearch-8.10.4.jar:?]
at org.elasticsearch.action.support.ChannelActionListener.onFailure(ChannelActionListener.java:37) ~[elasticsearch-8.10.4.jar:?]
at org.elasticsearch.action.ActionRunnable.onFailure(ActionRunnable.java:124) ~[elasticsearch-8.10.4.jar:?]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:28) ~[elasticsearch-8.10.4.jar:?]
at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:33) ~[elasticsearch-8.10.4.jar:?]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:983) ~[elasticsearch-8.10.4.jar:?]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) ~[elasticsearch-8.10.4.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
at java.lang.Thread.run(Thread.java:1583) ~[?:?]
Caused by: org.elasticsearch.search.query.QueryPhaseExecutionException: Time exceeded
at org.elasticsearch.search.query.QueryPhase.addCollectorsAndSearch(QueryPhase.java:214) ~[elasticsearch-8.10.4.jar:?]
at org.elasticsearch.search.query.QueryPhase.executeQuery(QueryPhase.java:134) ~[elasticsearch-8.10.4.jar:?]
at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:63) ~[elasticsearch-8.10.4.jar:?]
at org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:516) ~[elasticsearch-8.10.4.jar:?]
at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:668) ~[elasticsearch-8.10.4.jar:?]
at org.elasticsearch.search.SearchService.lambda$executeQueryPhase$2(SearchService.java:541) ~[elasticsearch-8.10.4.jar:?]
at org.elasticsearch.action.ActionRunnable$2.accept(ActionRunnable.java:51) ~[elasticsearch-8.10.4.jar:?]
at org.elasticsearch.action.ActionRunnable$2.accept(ActionRunnable.java:48) ~[elasticsearch-8.10.4.jar:?]
at org.elasticsearch.action.ActionRunnable$3.doRun(ActionRunnable.java:73) ~[elasticsearch-8.10.4.jar:?]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) ~[elasticsearch-8.10.4.jar:?]
Log output via screenshot with error of shards
Elasticsearch error in the Kibana dashboard located in Security onion:
Go to your security onion console > Kibana > the 3 horizontal lines on the top left corner > Search > Elasticsearch
Select Indices
Notice the following Index health status’s as Yellow for shards unassigned and not allocating:
Can’t see all of your Indexes? Select the tab on the top stating Show hidden indices
Instead of diving straight into it, lets start from the beginning of understanding what ElasticSearch is completely.
What is ElasticSearch:
Elasticsearch is where distributed search analytics occur. Data that is aggregated from Logstash and beats are collected and stored within Elasticsearch where the information can be displayed into a Kibana dashboard in a pie chart, graphs or however you wish to configure your data sets for visualization. Elasticsearch however is where the indexing, search queries and analysis happens.
Bringing all Elastic, Logstash and Kibana together; we have the ELK stack.
Elasticsearch shows all types of data from unstructured text, numerical data, geospatial data, store and index data that supports fast searches.
There are a wide range of uses for Elasticsearch to be used such as:
- Search box in an app or website
- Store an analyze logs, metrics, and security event data.
- Use machine learning to automatically model the behavior of your data in real time. This machine learning technology is a Unsupervised Machine Learning model which integrates with vector databases and embeddings
Source - https://www.elastic.co/elasticsearch/machine-learning
- Can automate business workflows using Elasticsearch as a storage engine
- Manage, integrate and analyze spatial information using Elasticsearch as a geographic information System (GIS)
- Store and process genetic data using Elasticsearch as a bioinformatics research tool
Geographic information system output from Elasticsearch example with the Kibana dashboard using the Access map logs
Analysis
The index analysis module acts as a configurable registry of analyzers to convert a string field into individual terms such as:
- Added to the inverted index in order to make the document searchable
- Used by high level queries such as the match query to generate terms
Source of match query - https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query.html
Index-level shard allocation filtering
Shard allocation filters are used to control where Elasticsearch allocates shards of a particular index. This is used in a per-index filters and are applied in conjunction with cluster-wide allocation filtering where Elasticsearch controls and allocates shards from any index and also uses allocation awareness which is an awareness attribute to enable Elasticsearch to take your physical hardware configuration into account when allocating shards
Shard allocation filters can be custome on node attributes or the built-in attributes such as _name, _host_ip, _publish_ip, _ip, _host, _id, _tier and _tier_preference
Delaying allocation when a node leaves
When a node leaves the cluster for whatever reason, intentional or otherwise, the master reacts by doing the following:
- Promoting a replica shard to primary to replace any primaries that were on the node.
- Allocating replica shards to replace the missing replicas (assuming there are enough nodes)
- Rebalancing shards evenly across the remaining nodes
These actions are intended to protect the cluster against data loss to ensure every shard is fully replicated as soon as possible
The example error above I initially have shows there is a shard failure as an example for the allocation of the cluster and didn’t resolve itself as it was intended.
Sample error:
[2024-03-31T23:59:57,631][WARN ][rest.suppressed] path: /.ds-logs-*/_eql/search, params: {ignore_unavailable=true, index=.ds-logs-*} org.elasticsearch.action.search.SearchPhaseExecutionException: Partial shards failure
A sample scenario described by the Elastic team:
- Node 5 loses network connectivity.
- The master promotes a replica shard to primary for each primary that was on Node 5.
- The master allocates new replicas to other nodes in the cluster
- Each new replica makes an entire copy of the primary shard across the network.
- More shards are moved to different nodes to re-balance the cluster.
- Node 5 returns after a few minutes.
- The master re-balances the cluster by allocating shards to node 5
If the missing shards waited a few minutes longer it could’ve been resolved minimaly with network traffic or could have been quicker if it was idle shards when indexing requests are not being received when flushing where it ensures data is currently only stored in a transaction log in a Lucene index.
Flush API - https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-flush.html
Transaction logs - https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-translog.html
Here is how you can query for the health of the cluster
Command – sudo so-elasticsearch-query _cluster/health
This shows the following output
Full query:
{"cluster_name":"securityonion","status":"yellow","timed_out":false,"number_of_nodes":1,"number_of_data_nodes":1,"active_primary_shards":155,"active_shards":155,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":3,"delayed_unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0,"task_max_waiting_in_queue_millis":0,"active_shards_percent_as_number":98.10126582278481}
What does this output mean?
{"cluster_name":"securityonion",
This is the cluster name of my node which is securityonion. NOT the hostname of aaa004
"status":"yellow",
This is the status of the cluster node of yellow which means the replica shards on the elasticsearch cluster aren’t allocated to a node which is a problem.
The status’ for a Cluster health are described as followed:
Red – The shard is not allocated on the cluster
Yellow – Primary shard is allocated but the replicas are not.
Green – All shards are allocated
"timed_out":false,"
The node health is currently timed out when it’s being checked in every 30 seconds.
number_of_nodes":1,
This shows the number of nodes in the cluster which is 1.
"number_of_data_nodes":1,
This is the number of nodes that are dedicated data nodes which is 1.
"active_primary_shards":155,
This shows the number of active primary shards which is 155
"active_shards":155,
This shows the total number of active primary replica shards at 155
"relocating_shards":0,
This shows the number of shards that are currently being relocated which is 0
"initializing_shards":0,
This shows the number of shards that are attempting to be initialized which is 0
"unassigned_shards":3,
This shows the number of unassigned shards which are not allocated at 3
"delayed_unassigned_shards":0,
This shows the number of shards whose allocation is delayed due to timeout settings which is 0
"number_of_pending_tasks":0,
This shows the number of cluster-level changes that have not been executed which is 0
"number_of_in_flight_fetch":0,
This shows the number of unfinished fetches which is 0
"task_max_waiting_in_queue_millis":0,
This shows time expressed in milliseconds since the earliest initiated task that is waiting to be performed which is 0
"active_shards_percent_as_number”:98
This shows the ratio of active shards in the cluster expressed as a percentage
As noticed above, I have unassigned shards which is not a good thing for Elasticsearch
"unassigned_shards":3,
This shows the number of unassigned shards which are not allocated at 3 which is because since I only have one node in my network, it should not have any replica shards allocating
We are going to ssh into the management node of security onion
Command – ssh sdick@192.168.80.50
After signing in, we are going to query for unassigned shards from the API within security onion
Command – sudo so-elasticsearch-query _cat/shards?v=true&h=index,shard,prirep,state,node,unassigned.reason&s=state
Output query from all elasticsearch shards from this Node
Per the documentation within Security onion, we can query for unassigned shards from elasticsearch indices with the following command to shorten the list above.
Source of the query - https://docs.securityonion.net/en/2.4/release-notes.html#known-issues
Command - sudo so-elasticsearch-query _cat/shards | grep UN
Output given:
Index Shard Prirep State
String 1 - .ds-.logs-endpoint.diagnostic.collection-default-2024.03.23-000010 0 r UNASSIGNED
String 2 - .items-default-000001 0 r UNASSIGNED
String 3 - .ds-metrics-elastic_agent.filebeat_input-default-2024.04.07-000001 0 r UNASSIGNED
String 4 - .ds-metrics-elastic_agent.osquerybeat-default-2024.04.07-000001 0 r UNASSIGNED
String 5 - .lists-default-000001 0 r UNASSIGNED
String 6 - .ds-metrics-elastic_agent.filebeat-default-2024.04.07-000001 0 r UNASSIGNED
String 7 - .ds-metrics-elastic_agent.elastic_agent-default-2024.04.07-000001 0 r UNASSIGNED
Separated:
String 1:
Index - .ds-.logs-endpoint.diagnostic.collection-default-2024.03.23-000010
Shard – 0
Prirep – r
State – UNASSIGNED
String 2:
Index - .items-default-000001
Shard – 0
Prirep – r
State – UNASSIGNED
String 3:
Index - .ds-metrics-elastic_agent.filebeat_input-default-2024.04.07-000001
Shard – 0
Prirep – r
State – UNASSIGNED
String 4:
Index - .ds-metrics-elastic_agent.osquerybeat-default-2024.04.07-000001
Shard – 0
Prirep – r
State - UNASSIGNED
String 5:
Index - .lists-default-000001
Shard – 0
Prirep – r
State – UNASSIGNED
String 6:
Index - .ds-metrics-elastic_agent.filebeat-default-2024.04.07-000001
Shard – 0
Prirep – r
State – UNASSIGNED
String 7:
Index - .ds-metrics-elastic_agent.elastic_agent-default-2024.04.07-000001
Shard – 0
Prirep – r
State - UNASSIGNED
As you notice, the letter r is the shard type which is a replica
With the above information, I asked some friends about the Primary and replicas issue I was having with shards and some how Replicas are enabled when it can’t be enabled with only 1 node in my environment.
So we now will disable replicas in Elasticsearch
Source answer from someone I asked:
“The problem is, you can't turn on replicas when you only have 1 node
If you create an index with anything other than 1 replica, it will never go green”
He also sourced the specific documentation here - https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html#dynamic-index-settings
Configuring the number of replicas when using 1 Node
Syntax - sudo so-elasticsearch-query $index/_settings -d '{"number_of_replicas":0}' -XPUT
What does this syntax mean?
Sudo – Run a command as an administrative user
so-elasticsearch-query – Query within the elasticsearch docker container
$index/_settings – Controls the specific indexes settings of the elasticsearch node
-d - Data node
'{"number_of_replicas":0}' – Changes the number of replicas configured for the index of the resource
-XPUT – Pushes the query and changes the resource configuration in Elasticsearch
Bulk Strings:
String 1 - .ds-.logs-endpoint.diagnostic.collection-default-2024.03.23-000010 0 r UNASSIGNED
String 2 - .items-default-000001 0 r UNASSIGNED
String 3 - .ds-metrics-elastic_agent.filebeat_input-default-2024.04.07-000001 0 r UNASSIGNED
String 4 - .ds-metrics-elastic_agent.osquerybeat-default-2024.04.07-000001 0 r UNASSIGNED
String 5 - .lists-default-000001 0 r UNASSIGNED
String 6 - .ds-metrics-elastic_agent.filebeat-default-2024.04.07-000001 0 r UNASSIGNED
String 7 - .ds-metrics-elastic_agent.elastic_agent-default-2024.04.07-000001 0 r UNASSIGNED
Now we will configure the replicas from
String 1 - .ds-.logs-endpoint.diagnostic.collection-default-2024.03.23-000010
Command - sudo so-elasticsearch-query .ds-.logs-endpoint.diagnostic.collection-default-2024.03.23-000010/_settings -d '{"number_of_replicas":0}' -XPUT
String 2 - .items-default-000001
Command - sudo so-elasticsearch-query .items-default-000001/_settings -d '{"number_of_replicas":0}' -XPUT
String 3 - .ds-metrics-elastic_agent.filebeat_input-default-2024.04.07-000001
Command - sudo so-elasticsearch-query .ds-metrics-elastic_agent.filebeat_input-default-2024.04.07-000001/_settings -d '{"number_of_replicas":0}' -XPUT
String 4 - .ds-metrics-elastic_agent.osquerybeat-default-2024.04.07-000001
Command - sudo so-elasticsearch-query .ds-metrics-elastic_agent.osquerybeat-default-2024.04.07-000001/_settings -d '{"number_of_replicas":0}' -XPUT
String 5 - .lists-default-000001
Command - sudo so-elasticsearch-query .lists-default-000001/_settings -d '{"number_of_replicas":0}' -XPUT
String 6 - .ds-metrics-elastic_agent.filebeat-default-2024.04.07-000001
Command - sudo so-elasticsearch-query .ds-metrics-elastic_agent.filebeat-default-2024.04.07-000001/_settings -d '{"number_of_replicas":0}' -XPUT
String 7 - .ds-metrics-elastic_agent.elastic_agent-default-2024.04.07-000001
Command - sudo so-elasticsearch-query .ds-metrics-elastic_agent.elastic_agent-default-2024.04.07-000001/_settings -d '{"number_of_replicas":0}' -XPUT
Next we will query the elasticsearch cluster for the cluster health
Command – sudo so-elasticsearch-query _cluster/health
and as we can see, all shards are healthy 100 percent after pushing cluster configuration changes to the stack
Green status means it’s a healthy cluster
We will now restart the elasticsearch container with the following command
Command – sudo so-elasticsearch-restart
After restarting the elasticsearch container, wait around 30 minutes (depending on your resources) for all primary shards to allocate properly then check the cluster health and ensure it’s in a Green state meaning all shards are allocating correctly to the management node
We will also confirm the Security onion console is in a running state as well