Reference: TanOS backup and recovery

Use this guide to learn about available backup and recovery options in TanOS. Topics include:

For backup procedures, see Reference: Appliance Maintenance menu.

Concepts and terminology

There are several concepts and terminology that you need to know to plan for backup and recovery on a Tanium Appliance.

Tanium cluster

You can configure Tanium Appliances into a Tanium cluster, where the Tanium Server is active-active, and the database component is active-passive. The Tanium Servers read and write to the database co-located on the first appliance.

  • The primary server is the server that hosts the read/write database.
  • The secondary server contains a read-only database that replicates from the database on the primary server.

For more information on how to use Tanium Appliances with a Tanium cluster, see Installing an Appliance Array.

Partitions

Each Tanium Appliance contains two partitions, a primary partition and a secondary partition.

In TanOS 1.6.1 and later, virtual appliances contain only one partition by default. You can add a secondary partition to perform a partition sync, or you can take a snapshot of the virtual image.

Boot options

When the Tanium Appliance boots, you are prompted to select a partition. Your choices are TanOS Active and TanOS Inactive.

  • The TanOS Active option boots into the active partition. By default, the primary partition is the active partition.
  • The TanOS Inactive option boots into the inactive partition. By default, the secondary partition is the inactive partition.

Partition swap

If you have multiple partitions, you can swap partitions through the Active Partition menu (A-X-3). For more information, see Change the active partition.

  • When you change the active partition, the partition that was set as the inactive partition becomes the active partition, and the partition that was set as the active partition becomes the inactive partition.
  • For example, in the default configuration, the primary partition is the active partition and the secondary partition is the inactive partition. After you change the active partition, the secondary partition becomes the active partition and the primary partition becomes the inactive partition.

Mounting states

  • In a normal mount, the active partition is set to the primary partition (root is mounted on /).
  • In an inverted mount, the active partition is set to the secondary partition (root is mounted on /altroot).

TanOS backup options

TanOS offers multiple options for backup. You can find the available options in the Backup menu (B-1). The following sections describe the backup options in detail.

As of TanOS 1.5.6, backup options are also available to schedule on a regular basis. See the Backup menu (B-1) for options around this, or refer to Configure and run automatic backups.

Additionally, TanOS provides options to schedule backups on a regular basis. See the Backup menu (B-1) for options around this, or refer to Configure and run automatic backups.

In addition to the backup options that TanOS provides, you can also take a snapshot of the virtual image.

Partition sync

TanOS can have two partitions: an active partition and an inactive partition in case of failover or troubleshooting. A partition sync is a backup procedure that uses the rsync utility to copy the active partition to the inactive partition.

In TanOS 1.6.1 and later, virtual appliances contain only one partition by default. Appliances with only one partition do not contain the option to perform a partition sync.

Perform a partition sync before you upgrade TanOS or a Tanium Server component, so that you have an alternate partition in case issues occur during the upgrade process or the Tanium Server update. You can also use the inactive partition if the active partition fails to boot. During the TanOS boot process, you have the option to select the inactive partition if needed.

To protect data consistency, the partition sync job disables (shuts down) the Tanium Server, Tanium database server, and other related services for the duration of the partition sync. Make sure to set a partition sync schedule that does not disrupt solution processes.

Core backup

Perform a core backup to back up essential files that can help you quickly recover from failures. Tanium services do not stop during a core backup.

A core backup produces a core recovery bundle, which includes the following content:

  • Appliance, array and network settings
  • LDAP database contents (locally managed Tanium accounts)
  • Critical Tanium Server, Tanium Module Server, and Tanium Zone Server configuration and key material
  • LDAP and root CA certificates
  • Tanium Server database contents (primary Tanium Server only)

Beginning with TanOS 1.6.3, a core backup replaces a minimal backup and a Tanium database backup. If you previously scheduled a Tanium database backup through TanOS, the schedule will be reused by the core backup after upgrade.

A core backup is recommended in most situations for the Tanium Servers. You should run the core backup on each of your Tanium Servers. The Tanium Module Server core backup only includes the files to restore the appliance configuration; it does not include any module data. If you are using a Standby Tanium Module Server in your environment, the core backup is sufficient. Otherwise, consider a comprehensive backup instead.

You have the option to transfer the backup file to a remote location using SCP, or to copy the backup into the /outgoing directory for manual collection.

Comprehensive backup

Perform a comprehensive backup to produce a comprehensive recovery bundle. A comprehensive recovery bundle includes the same content as a core recovery bundle, in addition to Tanium Server downloads and module data:

  • Appliance, array and network settings
  • LDAP database contents (locally managed Tanium accounts)
  • Critical Tanium Server, Tanium Module Server, and Tanium Zone Server configuration and key material
  • LDAP and root CA certificates
  • Tanium Server database contents (primary Tanium Server only)
  • Tanium Server downloads
  • Module data

Beginning with TanOS 1.6.3, a comprehensive backup replaces a full backup. If you previously scheduled a full backup through TanOS, the schedule will be reused by the comprehensive backup after upgrade. You should re-evaluate your need for a comprehensive backup; for Tanium Servers, the core backup might meet your needs. The comprehensive backup is more commonly used if you have a single Tanium Module Server in your environment.

You have the option to transfer the backup file to a remote location using SCP, or to drop the backup into the /outgoing directory for manual collection.

A comprehensive backup stops services on the Tanium Module Server to capture the module data. During this time, users can still access Tanium and ask questions, but the module workbenches are unavailable until the backup completes.

Backup exclusions

  • Generally, any TanOS configuration that is specific to that appliance. This includes:
    • Auth key file (SSH keys)
    • FQDN, IP, routes
  • TanOS local user accounts and their account policies (such as password policy)
  • Scheduled jobs (such as backups)

TanOS backup recommendations

The following section describes standard guidance for backups. You can reference these as standard guidelines. Your backup strategy may differ based on the specific needs of your organization.

These recommendations are for TanOS 1.6.3 and later. For earlier versions, reference the appropriate Tanium Appliance Deployment Guide.

Define a disaster recovery plan

Tanium recommends that you define a disaster recovery plan early during deployment. The restoration and backup options that fit your disaster recovery (DR) plan vary depending on your specific needs and requirements.

To properly define a DR plan, define specific restore time objectives (RTO) and restore point objectives (RPO).

  • RPO: The expected point of recovery. For example, an RPO of 1 month means recovery point within 30 days of the time of failure.
  • RTO: What is the amount of downtime that is acceptable to recover the system.

Your frequency for backup and recovery may vary depending on your RPO and RTO.

General recommendations

Tanium recommends performing a partition sync or VM snapshot prior to any major changes, including:

Tanium recommends that you take a snapshot of the virtual image prior to any major changes, including:

  • TanOS upgrades
  • Tanium Server upgrades
  • Tanium Module upgrades

Tanium recommends that you use the automated backup features available in TanOS 1.5.6 and later to automate many of the backup options. For detailed steps, see Configure and run automatic backups.

Tanium recommends that you use the automated backup features available in TanOS to automate many of the backup options. For detailed steps, see Configure and run automatic backups.

Tanium Cloud Appliance

For cloud deployments, Tanium recommends that you back up the Tanium Cloud Appliance image file. You can use these image files to restore the Tanium Cloud Appliance to the specific time the backup occurred; this is the preferred method to restore the entire appliance after a major failure. Tanium recommends that you back up the image file once a day. You should store as many copies as you deem necessary and in accordance with your company’s RPO.

There are many third party tools on the market that you can use to back up the image file. Contact your virtual infrastructure team to determine if your organization already uses one of these solutions. Tanium does not allow third party backup agents on the appliance; however, open-vm-tools is installed on the Tanium Cloud Appliance to provide a method for applications to quiesce the file system before taking the snapshots that are used to save the image to persistent storage.

If you are not able to perform image-based backups and you do not have use a Tanium cluster, Tanium recommends that you take a snapshot of the virtual image every two weeks, and monthly comprehensive backups for all Tanium Servers.

For environments with Tanium deployed in a Tanium cluster, Tanium recommends monthly snapshots and quarterly comprehensive backups. You can restore the Tanium Server from a secondary Tanium Server in a Tanium cluster with minimal downtime, and the comprehensive backup is less critical in this scenario. It is only necessary to perform the comprehensive backup on the secondary Tanium Server of the cluster to reduce downtime.

Tanium Appliance - virtual

For virtual deployments, Tanium recommends that you back up the Tanium Appliance virtual appliance image file. You can use these image files to restore the Tanium Appliance to the specific time the backup occurred; this is the preferred method to restore the entire appliance after a major failure. Tanium recommends that you back up the image once a day. You should store as many copies as you deem necessary and in accordance with your company’s RPO.

There are a number of third party tools on the market that you can use to back up the image file. Contact your virtual infrastructure team to determine if your organization already uses one of these solutions. Tanium does not allow third party backup agents on the appliance; however, open-vm-tools is installed on the virtual appliance to provide a method for applications to quiesce the file system before taking the snapshots that are used to save the image to persistent storage.

If you are not able to perform image-based backups and you do not use a Tanium cluster, Tanium recommends a daily core backup for the Tanium Server and a weekly comprehensive backup for your Tanium Module Server. For additional protection, you can optionally enable the alternate partition and run a partition sync as needed.

In a Tanium cluster, you should schedule a core backup of both Tanium Servers to protect your deployment. If you have a single Tanium Module Server, schedule a comprehensive backup of that server. If you have a Standby Module Server that is actively syncing, a core backup of the Tanium Module Server is sufficient.

Tanium Appliance - physical

For physical appliances that do not use Tanium Servers in a cluster, Tanium recommends weekly partition syncs, a daily core backup for the Tanium Server, and a weekly comprehensive backup for the Tanium Module Server.

In a Tanium cluster, Tanium recommends that you schedule a core backup of both Tanium Servers to protect your deployment. If you have a single Tanium Module Server, schedule a comprehensive backup of that server. If you have a Standby Module Server that is actively syncing, a core backup of the Tanium Module Server is sufficient.

Backup automation

Automated backups with TanOS 1.5.6 and later

Use the automated backup options that are available in TanOS. For detailed steps, see Configure and run automatic backups.

To fully automate the transfer of backup files to an off-box location, you must set up an SCP destination that the appliance can access.

TanOS restore options

The following section details specific options for Tanium Appliance restoration. The options are organized from least impactful to most impactful. Review each option sequentially to determine the correct procedures to follow for your restoration.

These steps are not designed to be run without the assistance of Tanium Support. Contact Tanium Support before you run any steps in this section. For more information, see Contact Tanium Support.

Option 1: Restore TanOS to a partition sync (physical or virtual)

The quickest way to restore to a known good restore point is to use the partition sync feature on TanOS.

Prerequisites

  • A previous partition sync that is within your RPO
  • The password that was used on the previous partition sync (if the TanOS password was changed after partition sync)
  • Access to a user with the tanadmin role

Notes

  • Your partition syncs for each Tanium Server and the Module Server must be initiated within 30 minutes of each other (to minimize configuration drift between the Tanium Server and the Tanium Module Server).

Steps

  1. Sign into the TanOS console as a user with the tanadmin role. Make sure the appliance is booted to the primary partition in normal mounting mode.
  2. Initiate a partition swap (B-X-3). TanOS automatically reboots.
  3. Boot into the TanOS Active partition. You will now be in inverted mounting mode.
  4. Perform a partition sync (B-1-1). This syncs your secondary partition (which is active) to your primary partition (inactive).
  5. Changes on the primary partition will be overwritten.
  6. Initiate a partition swap (B-X-3). TanOS automatically reboots.
  7. Repeat these steps for all Tanium Servers and Tanium Module Servers in the cluster.
    • Start with all Tanium Servers followed by the Tanium Module Servers.

Option A: Restore TanOS using a VM image or snapshot

Option 2: Restore TanOS using a VM image or snapshot (virtual)

These steps are only available for virtual appliances as these steps require a VM image (or snapshot) to be available for restoration.

Prerequisites

  • A known good image/snapshot for appliance restore.
  • The restore point must have Tanium Module Server and Tanium Server images that are at the same time.

Steps

  1. Turn off all appliances (Tanium Server, Tanium Module Server).
  2. Restore the virtual appliances to the image or snapshot (state).
  3. Start the Tanium Servers first, followed by the Tanium Module Server.
  4. Perform your standard checkout steps (including the following):
    1. Import/reimport/upgrade a module (upgrade is okay in this state instead of a reimport).
    2. Make sure modules can load without errors.
    3. Make sure modules service accounts are present.
    4. Make sure plugin schedules are set to running and not disabled.

Option B: File-level restoration

Option 3: File-level restoration (physical or virtual)

You can restore individual files on your TanOS appliance deployment. This is an option when you know which files you need to restore and have a good version in your backup.

Prerequisites

  • A good comprehensive recovery bundle. Tanium Server downloads and module data can only be found in comprehensive recovery bundles.
  • Root shell key to restore the file on the appliance

Notes

  • You may need to stop services before performing file restoration. You can do this in the Backup menu (B-1) or using systemctl [stop, start, status] <service name> on the command line.
  • You can restore some files from TanOS menus such as:
    • Tanium Server SOAP certificates
    • Tanium public and private keys (Tanium Core Platform 7.3 and earlier)

Steps

  1. Extract the files from the recovery bundle and upload to the /incoming folder of the appliance.
  2. Get a root shell to the box (menu B-5-3). You will need to request additional access from Tanium Support. For more information, see Contact Tanium Support.
  3. Obtain the needed files from /opt/home/tancopy/incoming and copy to a staging location such as /home/tanadmin/.
  4. Run ls -alh on the files you are about to replace and take a screenshot of the file permissions and ownership.
  5. Back up the existing files to a staging location.
  6. Copy the restore files to the destination.
  7. Make sure the permissions remain the same after you copy the file to the destination.
    • Check the previous files for permissions and match those permissions for the newly restored files.
    • You may need to run the chown and chmod commands to fix permissions and ownership. Refer to the Linux command help to understand how to use these commands.
  8. After you finish, clean up the staging folder that you created.
  9. Sign out of the root shell and revoke your root shell access.

Option C: Recover failed member of a redundant TS cluster

Option 4: Recover failed member of a redundant TS cluster (physical or virtual)

If you have a Tanium cluster set up with multiple Tanium Servers, you can perform the following steps to recover the cluster.

Prerequisites

  • One member of the cluster must be in good working order.
  • Replication and communication must be healthy within the cluster prior to failure.

Restore secondary Tanium Server steps

If the secondary Tanium Server is down, and your environment is operational (but in a reduced redundancy state):

  1. Set up the new appliance.
  2. Conduct initial configuration:
  3. Install the role for this appliance.
    • Make sure that you match the platform version of the primary server.
  4. Register the remote Tanium Module Server to this new secondary Tanium Server.
  5. (HA) Reconfigure clustering.
    1. Set up IPSEC.
    2. Set up clustering with the cluster pair. Make sure the new appliance joins the cluster (do not initialize the cluster on the new appliance).
    3. After the new appliance joins the cluster, it will re-sync the database.
  6. In a web browser, sign in to the Tanium Console and reimport all modules to ensure they match the same version as the primary Tanium Server. Be careful not to accidentally upgrade modules on the secondary Tanium Server.
    • After each module import, verify that the module loads and service accounts and other settings are restored (connections, detect alerts, and so on).
  7. Manually restore the following:
    • SSH auth keys
    • Reconfigure any TanOS settings
    • Backup schedules (if any)
    • (Optional) If you have Tanium web console users (not AD synced), recreate them in TanOS. This step is not needed if you are joining a cluster.
  8. Ensure plugin schedules are in place and perform standard checkout steps:
    • Import/reimport/upgrade a module. It is okay to upgrade modules in this state.
    • Make sure modules can load without errors.
    • Make sure package files are cached.
    • Make sure packages, sensors, users, computer groups, and other objects are restored.

Restore primary Tanium Server steps

If the primary Tanium Server is down, your Tanium is not functional until you perform the following steps:

  1. Promote the secondary Tanium Server to primary by following the steps at Initiate database server failover.
  2. Stand up a new server by following the instructions in Restore secondary Tanium Server steps.

Option D: Recover failed TMS with standby TMS

Option 5: Recover failed TMS with standby TMS (physical or virtual)

You can perform the following procedure when recovering a TMS server failure with a standby (inactive) TMS.

Prerequisites

  • The most recent TMS sync must be successful and meet your RPO.

Steps

  1. Register your standby module server with your primary (and secondary) Tanium Servers. See Configure the Tanium Server to use the remote Module Server.
  2. Make sure plugin schedules are in place and perform standard checkout steps:
    1. Import/reimport/upgrade a module. It is okay to upgrade modules in this state.
    2. Make sure modules can load without errors.
    3. Make sure package files are cached.
    4. Make sure packages, sensors, users, computer groups, and other objects are restored.
  3. Your environment is now fully operational. Follow the steps at Deploying a standby Module Server to deploy a new standby Tanium Module Server:

Option E: Fully restore to a new appliance

Option 6: Fully restore to a new appliance (physical or virtual)

Use this procedure for a full restore of TanOS to a new appliance. You can use these steps for a complete failure or to migrate to a new appliance.

Prerequisites

  • Minimum: core recovery bundle
  • Recommended: comprehensive recovery bundle
  • Destination appliance is connected and has an IP address (provisioned)

The steps in this section assume you are only restoring a single Tanium Appliance. If you need assistance with a coordinated restore with multiple appliances (such as a Tanium Server and a Tanium Module Server), contact Tanium Support. For more information, see Contact Tanium Support.

General Steps

  1. Set up the new appliance.
  2. Conduct initial configuration:
  3. Install the role needed for this backup.
    • Make sure that you match the platform version of the backup.
  4. Shut down all Tanium services.
  5. Follow the steps to decrypt the recovery bundle provided within the tgz file.
  6. Get a root shell to the box (B-5-3). You will need to request additional access from Tanium Support. For more information, see Contact Tanium Support.
  7. Restore the content of the recovery bundle to the TaniumServer Directory (do this as root).
    1. Sftp the <backup>_internal.tgz file (the file that contains the actual recovery bundle) to the /incoming folder of the appliance.
    2. Copy <backup>_internal.tgz file to the root directory (/):
      cp /opt/home/tancopy/incoming/<backup>_internal.tgz /
    3. Extract the payload:
      tar -xvf <backup>_internal.tgz

      This restores the files from the previous appliance to the new appliance in the correct file locations.

    4. Remove the tgz file after you extract the payload:
      rm -vf <backup>_internal.tgz
  8. Run Re-apply ACL from the TanOS menus (B-X-3).
  9. Start all Tanium services.

Tanium Server steps

  1. Re-register the Tanium Module Server to the new Tanium Server.
  2. Reconfigure clustering, if applicable.
    1. Set up IPSEC.
    2. Set up clustering with the cluster pair. Make sure the new appliance joins the cluster (do not initialize the cluster on the new appliance).
    3. After the new appliance joins the cluster, it will re-sync the database.
  3. In a web browser, sign in to the Tanium Console and reimport all of the previous modules.
    • After each module imports, verify the module loads and service accounts and other settings are restored (connections, detect alerts, and so on).
  4. Manually restore the following:
    • SSH auth keys
    • Reconfigure any TanOS settings
    • Backup schedules (if any)
    • (Optional) If you have Tanium web console users (not AD synced), recreate them in TanOS. This step is not needed if you are joining a cluster.
  5. Make sure plugin schedules are in place and perform standard checkout steps:
    • Import/reimport/upgrade a module. It is okay to upgrade modules in this state.
    • Make sure modules can load without errors.
    • Make sure package files are cached.
    • Make sure packages, sensors, users, computer groups, and other objects are restored.

Tanium Module Server steps

  1. Re-register the Tanium Module Server to the Tanium Server.
    1. For Tanium clusters, register the Tanium Module Server to each Tanium Server appliance.
    2. For Tanium clusters, recluster the Tanium Server appliance.
  2. If a secondary Tanium Module Server is present:
    1. Set up IPSEC with the secondary Tanium Module Server.
    2. Reconfigure Tanium Module Server sync as previously configured.
  3. In a web browser, sign in to the Tanium Console and reimport all the modules again.
    • After each module imports, verify the module loads and service accounts and other settings are restored (connections, detect alerts, and so on).
  4. Manually restore the following:
    • SSH auth keys (for example, tanadmin)
    • Reconfigure any TanOS-specific settings (such as password policies)
    • Backup schedules (if any)
  5. Make sure plugin schedules are in place and perform standard checkout steps:
    1. Import/Reimport/Upgrade a module. It is okay to upgrade modules in this state.
    2. Make sure modules can load without errors
    3. Make sure module service accounts are present.
    4. Make sure module configurations are correctly set.
    5. Make sure plugin schedules are running and not disabled.
    6. Run a TanOS Health Check (menu 3-5) and resolve any issues.

Tanium Zone Server steps

  1. Reconfigure the zone server to match the settings on the previous zone server. See Install the Tanium Zone Server.
  2. Set up communications with the associated Tanium Zone Server Hub.
  3. Manually restore the following:
    • SSH auth keys (for example, tanadmin)
    • Reconfigure any TanOS-specific settings (such as password policies)
    • Backup schedules (if any)
  4. Make sure you see clients successfully registering through the zone servers.
    • Test asking questions to clients connected through the zone server.
    • Test issuing packages to the clients behind the zone server.