There is strong concensus that backing up Splunk hot buckets using storage snapshot is reliable but there's not many, if any, complete examples of how to set this up. Here is an implementation using LVM and EBS, automation with Cloudformation and Puppet.
Backing up data that has been indexed by Splunk is critical to most organisations. Unfortunately backing up the most recent indexed data inside the hot buckets is not possible. And according to their documentation, not recommended by Splunk themselves. However it is possible to take snapshots and the key is to backup the snapshots.
We used Amazon Linux AMI with XFS filesystem in this project. LVM snapshots works with XFS freeze to ensure the filesystem is consistent when the snapshot is taken. So when mounting a backup of the LVM snapshot volume, there is no risk of data loss or corruption.
Here is the inspiration for this implementation and an explanation of how the process works.
The context is Splunk on AWS EC2 using EBS volumes for the indexer cluster. As storage is cheap, primary and replicated buckets are backed up.
Here is an excerpt from the CloudFormation template showing the disk configuration for Splunk storage. The template makes heavy use of parameters for resuability. The CloudFormation resource is actually an Auto Scaling launch configuration, specifying the configurations for each member of the Auto Scale group.
With regards to the various EBS Volume Size parameters, there is extra logic required by the Splunk engineers. There needs to be a calculated reserve to cater for the LVM snapshot growth. For example, in the extreme case that the logical volume is extremely busy or that the snapshot is left running longer than required, all block on the logical volume might get overwritten hence the snapshot volume required will be the same size as the source logical volume. That is, the EBS volume size required will be 200% of the Splunk storage needed for Splunk. So if Splunk needs 100GB for hot buckets, then the EBS volume required is 200GB, though you should throw in some extra for overheads like metadata.
Notice that there is option to provide an EBS snapshot ID for recovery purposes. The EBS snapshot will dictate the volume size and will contain the LVM snapshot.
# splunk_index_cluster_template.yaml
IndexerLaunchConfig:
Type: 'AWS::AutoScaling::LaunchConfiguration'
Properties:
BlockDeviceMappings:
- DeviceName: /dev/xvda
Ebs:
VolumeSize: '40'
VolumeType: gp2
- DeviceName: /dev/xvdf
Ebs:
VolumeSize: !Ref SplunkEbsVolumeSize
VolumeType: io1
Iops: 1000
Encrypted: true
DeleteOnTermination: true
- DeviceName: /dev/xvdg
Ebs:
VolumeSize: !If
- UseHotBucketsSnapshot
- !Ref AWS::NoValue
- !Ref HotBucketsEbsVolumeSize
VolumeType: io1
Iops: 1000
Encrypted: true
DeleteOnTermination: true
SnapshotId: !If
- UseHotBucketsSnapshot
- !Ref ColdBucketsSnapshotId
- !Ref AWS::NoValue
- DeviceName: /dev/xvdh
Ebs:
VolumeSize: !If
- UseColdBucketsSnapshot
- !Ref AWS::NoValue
- !Ref ColdBucketsEbsVolumeSize
VolumeType: st1
Encrypted: true
DeleteOnTermination: true
SnapshotId: !If
- UseColdBucketsSnapshot
- !Ref ColdBucketsSnapshotId
- !Ref AWS::NoValue
- DeviceName: /dev/xvdi
Ebs:
VolumeSize: !Ref FrozenEbsVolumeSize
VolumeType: st1
Encrypted: true
DeleteOnTermination: true
Here is an excerpt from Puppet Hiera data for configuring LVM on the node. The Hiera keys align with the various module classes to enable automatic class parameter lookup to simplify Puppet code.
As per Puppet best practice, we use the Roles and Profile pattern to separate Puppet code from actual configurations to further simplify Hiera data
# roles/indexer_cluster.yaml
profile::operating_system::storage::splunk_paths:
splunkhotdb:
path: /opt/splunk/db
splunkcolddb:
path: /opt/splunk/colddb
splunkfrozendb:
path: /opt/splunk/frozendb
lvm::volume_groups:
splunkhotvg:
createonly: true
physical_volumes:
- /dev/xvdg
logical_volumes:
splunkhotlv:
mountpath: /opt/splunk/db
mountpath_require: true
tag: splunk_lv
splunkcoldvg:
createonly: true
physical_volumes:
- /dev/xvdh
logical_volumes:
splunkcoldlv:
mountpath: /opt/splunk/colddb
mountpath_require: true
tag: splunk_lv
splunkfrozenvg:
createonly: true
physical_volumes:
- /dev/xvdi
logical_volumes:
splunkfrozenlv:
mountpath: /opt/splunk/frozendb
mountpath_require: true
tag: splunk_lv
There are so many ways of scheduling jobs in the Cloud world from cool tools to AWS' evergrowing list of services. In this project, Rundeck is used to automate as well document the various housekeeping tasks as well as system functions. This of course includes scheduling LVM snapshots and EBS snapshots.
The benefits of using Rundeck include:
Here is the configuration for the backup job. The job uses Ansible to execute the LVM and EBS snapshots using the inline playbook written into the job specs. This setup aligns with devops principles of versioning all artefacts and creates highly readable code as documentation. Rundeck can filter which nodes the job are run against using regex so we can easily filter for nodes of a certain type such as only indexers or search heads.
# backup_splunk_volume.yaml
- defaultTab: output
description: |+
Backup a splunk volume using LVM and EBS snapshots
name: backup_splunk_volume
options:
- name: volume_group
description: volume group name
- name: logical_volume
description: logical volume device path to snapshot
- name: snapshot_size
description: Snapshot volume size
- name: ebs_volume_id
description: EBS volume ID to snapshot
nodefilters:
filter: '"Index cluster .*|Search.*|Index Cluster Master.*|HeavyForwarder.*|.*API Collector.*" '
sequence:
commands:
- configuration:
ansible-become: 'false'
ansible-disable-limit: 'true'
ansible-playbook-inline: |
---
- name: backup splunk volume
hosts: localhost
connection: local
gather_facts: false
tasks:
- name: Create LVM snapshot
lvol:
vg: ${option.volume_group}
lv: ${option.logical_volume}
snapshot: "${option.logical_volume}_snapshot"
size: ${option.size}
- debug: msg="{{ output }}"
- name: Create EBS snapshot
ec2_snapshot:
region: ${option.region}
description:
volume_id: ${option.ebs_volume_id}
snapshot_tags:
Application: Splunk
Owner: 123123
register: output
- debug: msg="{{ output }}"
- name: Delete snapshot volume
lvol:
vg: ${option.volume_group}
lv: ${option.logical_volume}
snapshot: "${option.logical_volume}_snapshot"
state: absent
- debug: msg="{{ output }}"