Bootstrap Splunk Search Head cluster captain with Chef

Whether Puppet or Chef or Ansible, the modules/cookbooks out there skip on the bootstrapping of the search cluster captain. Here is an implementation using Chef in AWS EC2.

Here is an implementation using Chef in AWS EC2.

Search Head Cluster Captain status

A helper function to get the captain election status.

# chef/cookbooks/splunk/libraries/helper.rb

def is_captain_elected
  splunk_cmd = "#{node['splunk']['server_home']}/bin/splunk"
  command_out = ''
  ruby_block "is_captain_elected" do
      block do
          Chef::Resource::RubyBlock.send(:include, Chef::Mixin::ShellOut)
          command = "#{splunk_cmd} list shcluster-captain-info"
          command_out = shell_out(command)
          Chef::Log.info("shcluster-captain-info: #{command_out}")
      end
      action :create
  end

  if command_out.include? "elected_captain"
    return true
  else
    return false
  end
end

Cluster node status

A helper functon to manage the EC2 tags used to indicate cluster node status.

# chef/cookbooks/splunk/libraries/helper.rb

# update/insert ec2 tag. Ensure ohai ec2 hints are dropped. touch /etc/chef/ohai/hints/ec2.json
# tag key: shc-member
# tag values:
#  - unregisterd: instance has been initialised but not added to the cluster ie captain not bootstrapped
#  - registed: instance has been added to the cluster and captain bootstrapped
#
def upsert_ec2_tag(instance_id, tag_value)

    ec2 = Aws::EC2::Resource.new(region: "#{node['ec2']['placement_availability_zone'].chop}")

    ec2.create_tags({
      resources: ["#{instance_id}"],
      tags: [
        {
          key: "shc-member",
          value: "#{tag_value}",
        },
      ],
    })

end

Discovering cluster members

A helper function to get the EC2 instance IDs and hostnames of search cluster members as an array. The EC2 tag name is 'shc-member'. This function retrieves instances based on the tag's value.

# chef/cookbooks/splunk/libraries/helper.rb

def get_members(tag_value)

  ec2 = Aws::EC2::Resource.new(region: "#{node['ec2']['placement_availability_zone'].chop}")
  members = []

  ec2.instances(
    {
      filters: [
        {
          name: 'tag:shc-member',
          values: ["#{tag_value}"]
        },
        {
          name: 'instance-state-name',
          values: ['running','stopped']
        },
      ]
    }
  ).each do |i|

    members << {
      "instance_id" => i.instance_id,
      "dns" => i.private_dns_name
    }
  end

  return members
end

Bootstrapping the captain

The search head recipe installs Splunk and starts the Splunk daemon. It then performs the following logic to attempt to bootstrap the captain:

  1. Performs cluster member initialisation
  2. Tags the current node as 'unregistered'
  3. If captain is already elected then register the current node via a registered member. Then tag the current node as 'registered'.
  4. If there is no captain elected and there are less than 3 unregistered nodes, then the Chef run is finished
  5. If there is no captain elected and there are less 3 or more unregistered nodes, then execute the bootstrap command on the current node
  6. Update each node to 'registered'
# chef/cookbooks/splunk/recipes/search-head.rb

# Initialise this node as a search cluster member
if node['splunk']['search_cluster_deployment'] && node['splunk']['search_cluster_member']
  execute "Initialising search head cluster member" do
    command "#{splunk_cmd} init shcluster-config -auth #{node['splunk']['auth']} -mgmt_uri https://#{node['fqdn']}:#{node['splunk']['mgmt_server_port']} -replication_port #{node['splunk']['search_head_cluster_replication_port']} -replication_factor #{node['splunk']['search_factor']} -conf_deploy_fetch_url #{node['splunk']['deployer_url']}:#{node['splunk']['deployer_port']} -secret #{node['splunk']['pass4SymmKey']} -shcluster_label #{node['splunk']['shcluster_label']}"
    not_if "#{splunk_cmd} list shcluster-member-info -auth #{node['splunk']['auth']} | grep 'is_registered:1'"
    notifies :restart, "service[splunk]", :immediate
  end
end

# tag this node as unregistered
upsert_ec2_tag( node['ec2']['instance_id'], 'unregistered' )

# collect the unregisterd members
unregistered_members = get_members("unregistered")
Chef::Log.info("Found #{unregistered_members.count} unregistered members")

captain_elected = is_captain_elected

if captain_elected
  # captain already elected so have to initialise then add this node ot the cluster

  Chef::Log.info("SHC captain elected, adding this node via member")

  # discover registered members
  registered_members = get_members("registered")

  # add member to cluster using one of the registered members mgmt URI
  execute "Adding member to search head cluster" do
    command "#{splunk_cmd} add shcluster-member -auth #{node['splunk']['auth']} -current_member_uri https://#{registered_members[0]['dns']}:#{node['splunk']['mgmt_server_port']}"
  end

  # tag this node as registered member
  upsert_ec2_tag( node['ec2']['instance_id'], 'registered' )

elsif !captain_elected && unregistered_members.count <3

  Chef::Log.info("SHC captain not elected and less than 3 members available to vote. Nothing to do")

elsif !captain_elected && unregistered_members.count >= 3

  Chef::Log.info("SHC captain not elected but #{unregistered_members.count} members ready to vote, bootstrapping captain")

  servers_list = []

  unregistered_members.each do |i|
    servers_list << "https://#{i['dns']}:#{node['splunk']['mgmt_server_port']}"
  end

  servers_list = servers_list.join(',')

  execute "Boostrapping the search head cluster captain on #{node['fqdn']}" do
    command "#{splunk_cmd} bootstrap shcluster-captain -servers_list #{servers_list} -auth #{node['splunk']['auth']}"
  end

  # update ec2 tags
  unregistered_members.each do | i |
    upsert_ec2_tag( i["instance_id"], "registered" )
  end

else

  Chef::Log.info("SHC is in an unknown state")

end