Splunk is an incredibly robust tool that can scale depending on the certain parameters:
- Number of users using the deployment
- Amount of data coming in
- Number of endpoints sending data to the deployment
Depending upon the above parameters you can horizontally/vertically scale a deployment to accommodate to your needs. In this blog we will briefly discuss following deployments:
- Standalone deployment
- Distributed deployment
- Clustered deployment
Before we dive into various deployments, let us go over some of the widely used components in a Splunk deployment. Splunk comes out of the box with the following components and can be tailored suit your needs. Bear in mind – these components will be used in all the deployments except “Standalone”. Will shed more light on this later.
A search head is used to – as the name suggests –search the data. Search heads get all the traffic from the end users. End users log into the UI using the search head and run their searches, reports, alerts, and dashboards and other knowledge objects.
An indexer is used to index/parse the data. Splunk uses its proprietary algorithm to store the data in a way that it can be retrieved in a faster manner and then searched upon.
In a distributed deployment – search head (where user searches) and an indexer (where the data is stored) can be separated out. This makes sure that both of the functions i.e. searching and storing is done in a quick and efficient manner. To understand how Splunk indexes data, you can follow this link
A forwarder is used to – as the name suggests –orward the data to a specific target. There are two types of forwarders:
- Heavy Forwarders
- Lightweight Forwarders
Depending on the use case for the data and infrastructure which decide role for selecting the type of forwarder. To learn more about the difference refer this link.
Components above are represented diagrammatically as follows:
Now that we have covered understanding of basic components, let’s go over the different deployments of Splunk.
A standalone deployment in Splunk means that all the functions that Splunk does are managed by a single instance. Various functions that a Standalone Deployment can do are:
- Hold Knowledge Objects (This covers Reporting/Alerting/Dashboard Creation and many more)
When is this deployment type used?
This type of deployment is typically used when there are a limited number of users and a very limited amount of data flowing into Splunk.
Pros/Cons of this type of deployment:
|Supportability||Very easy to manage and support as it has only one instance||NA|
|High Availability||NA||No high availability as it is a single point of failure|
|Disaster Recovery||NA||No disaster availability as it is a single point of failure|
|Search Concurrency||NA||Low search concurrency as it is a single instance and can be over-loaded easily|
There are a few drawbacks of a “Standalone” deployment for Splunk in terms of High Availability, Disaster Recovery and Search Concurrency. To overcome some of these, Splunk can be set up in a way to distribute the tasks to different instances within the platform.
In this deployment, the roles of the Search Head, Indexers and Forwarders are split to create a distributed deployment.
To do this we need to create a distributed search. To learn more about distributed search click on this link.
We can see that we have now split the functions of each component to create a distributed environment. Find the comparison as follows:
|Supportability||Easy to support as the components are separated out in different functions||NA|
|High Availability||NA||Single point of failure. If the indexer goes down, then indexing stops.|
|Disaster Recovery||NA||Single point of failure.|
|Search Concurrency||Higher search concurrency compared to standalone as Search head is separated out||If the users are more, consider going into search head clustering for higher search concurrency|
We can’t achieve features like High Availability and Disaster Recovery for mission-critical production deployments. To achieve this, we need a clustered deployment which looks as follows:
(Image source: Splunk)
In the above deployment, the indexers are in a cluster and there is something called a “Master Node” – this Master Node or Cluster Master manages the indexers and replicates the data across multiple indexers. This creates more than one copy of data across the deployment giving the users “High Availability” of the data.
Master Node Functions:
- Manages configurations/apps across all the Peer nodes/indexers
- Manages incoming search requests from the search heads
- Knows all the copies of the data that is indexed and replicates if any indexer goes out of service
Search Head Captain Functions:
- Manages the search load coming in from users
- Delegates search jobs coming in from different search head members
- Replicates the knowledge bundle across the search head cluster.
We can also see the Search Heads are now in a cluster. This means that it now gives us “High Search Concurrency” which was a drawback in a previous distributed deployment.
|Supportability||Supportaibility is challenging, however, with Master and Captain Nodes we can manage the Splunk configs and apps easily||NA|
|High Availability||Highly Available as data is replicated across multiple nodes and if single indexer goes down still the data is searchable. If a search head goes down, other search heads will continue to provide the service||NA|
|Disaster Recovery||No disaster recovery||No disaster recovery|
|Search Concurrency||High search concurrency as there is a cluster of Search Heads serving multiple clients||NA|
The learn more on how to size and scale any Splunk deployments please refer this link.
This blog explains different deployments and compares them. It is suggested that once you have the basic differences between different deployments in Splunk, it is easier to architect your deployment per your needs.