Centrally Monitor Splunk Platforms – A Pragmatic Solution

Centrally Monitor Splunk Platform Blog Image
For large Splunk Deployments, we often get asked the questions on how to centrally monitor the platforms. The problem in light is for the Monitoring of Monitoring.

Let’s say we have a production cluster spread across 100+ locations globally and the goal is to provide visibility across all the Splunk Environments for performance, errors, issues etc. Possible options/solutions commonly visited are as follows:

Option 1: Add All Cluster Masters to a single Monitoring Console – Distributed Search

This is a good option that is feasible and also fulfils the purpose. But the complexity rises when there are issues around network connectivity, cross data centre chatter and security concerns from data egress. This makes it difficult for this kind of solution to be put in place.


Moreover, there can be limitations of the Ip Address, CIDR and hostname conventions followed across different sites.

Option 2: Setup Separate Monitoring Consoles

This idea is a no-go since it would be too much of an overhead for someone to setup separate monitoring consoles separately.

(Bad Idea!! – No central visibility. Inconvenient)

Option 3: Index and Forward

Since both the usual suspects have been ruled out to tackle this problem, it is time to think out of the box. This option uses Splunk’s built in “indexAndForward” capability to deal with the problem. Also it makes sure that each cluster on it’s own has access to it’s own internal data AND has a copy of data sent over to the Central Monitoring Console (Yes, what an original name! )

 

Following are the default parameters for whitelist and blacklist in the $SPLUNK_HOME/etc/system/default/outputs.conf:

				
					forwardedindex.0.whitelist = .* # Match all indexes

forwardedindex.1.blacklist = _.* # Blacklist all indexes starting with “_”

forwardedindex.2.whitelist = (_audit|_internal|_introspection|_telemetry)
				
			

With the above configuration, Splunk forwards all the data (both Splunk logs and data sources coming in)

 

To address the problem, we have to whitelist only internal indexes (_*) on the indexers outputs.conf. Post the below change, Splunk will only forward the data coming in _internal,_audit,_introspection, _telemetry.

 

On the Indexer, make the following change to SPLUNK_HOME/etc/system/local/outputs.conf.

				
					[tcpout]

defaultGroup = primary_indexers

forwardedindex.0.whitelist = (_audit|_internal|_introspection|_telemetry)

forwardedindex.1.blacklist = _.*

forwardedindex.2.whitelist = (_audit|_internal|_introspection|_telemetry)

[tcpout:primary_indexers]

server= <ip_addr>:9997,<ip_addr>:9997,<ip_addr>:9997,<ip_addr>:9997
				
			
Special things to consider:
  • The above configuration forwards data only from the indexes (_audit, _internal, _introspection, _telemetry)
  • The output group should have the Indexers connected to Central Monitoring Console.
  • Whitelist always take precedence over blacklist
  • Whitelist/blacklist order should be sequential. The order must start at 0 and continue with positive integers in sequence.
  • There can’t be any gaps in the sequence
For example:
				
					[tcpout]

defaultGroup = primary_indexers

forwardedindex.0.whitelist = 
forwardedindex.1.blacklist = 
forwardedindex.5.whitelist = (_audit|_internal|_introspection|_telemetry)
server= 10.1.5.2:9997,10.1.5.3:9997,10.1.5.4:9997,10.1.5.5:9997

[tcpout:primary_indexers]

server= 10.1.5.2:9997,10.1.5.3:9997,10.1.5.4:9997,10.1.5.5:9997
				
			
This won’t work for the following reasons:
  • The sequence of numbers after forwardedindex.(x).whitelist and forwardedindex.(x).blacklist is not in order and misses the sequence
  • The default parameters are left empty
Tests and Rollout:
  • Test the above configuration in Preprod Indexer cluster and see if the internal data is flowing in the Monitoring Console
  • Once tested and all looks good, roll this out to all the indexer clusters and you should get the data in one central platform globally

You can get creative and tag the data using _meta in inputs.confto differentiate the clusters based on the metadata. You can then use these tags to differentiate clusters in the Monitoring Console for ease of use. This makes the challenging problem of isolation of cluster’s issue very easy to Tackle.

For all the Splunkers visiting Splunk Conf 2019 next week – we urge you to visit our Booth #160 at the event to find other such tips and tricks that help you manage Splunk better.