Reference: TanOS backup and recovery

Use this guide to learn about available backup and recovery options in TanOS. Topics include:

Concepts and terminology

There are several concepts and terminology that you need to know to plan for backups and recovery on a Tanium Appliance.

Redundant clusters

You can configure Tanium Appliances into a redundant cluster, where the Tanium Server is active-active, and the database component is active-passive. The Tanium Servers read and write to the database co-located on the first appliance.

  • The primary server is the server that hosts the read/write database.
  • The secondary server contains a read-only database that replicates from the database on primary server.

For more information on how to use Tanium Appliances in a redundant cluster, see Installing an Appliance Array.

Partitions

Each Tanium Appliance contains two partitions, a master partition and a backup partition.

Boot options

When the Tanium Appliance boots, you are prompted to select a partition. Your choices are TanOS Active and TanOS Inactive.

  • The TanOS Active option boots into the active partition. By default, the master partition is the active partition.
  • The TanOS Inactive option boots into the inactive partition. By default, the backup partition is the inactive partition.

Partition swap

You can swap partitions through TanOS menu A-X-3. For more information, see Change the active partition.

  • When you change the active partition, the partition that was set as the inactive partition becomes the active partition, and the partition that was set as the active partition becomes the inactive partition.
  • For example, in the default configuration, the master partition is the active partition and the backup partition is the inactive partition. After you change the active partition, the backup partition becomes the active partition and the master partition becomes the inactive partition.

Mounting states

  • In a normal mount, the active partition is set to the master partition (root is mounted on /).
  • In an inverted mount, the active partition is set to the backup partition (root is mounted on /altroot).

TanOS backup options

TanOS offers multiple options for backup. You can find the available options in TanOS menu B-1. The following sections describe the backup options in detail.

As of TanOS 1.5.6, backup options are also available to schedule on a regular basis. See TanOS menu B-1 for options around this, or refer to Configure and run automatic backups.

Partition sync

A partition sync operates by backing up the current active partition to the inactive partition. The TanOS menu provides some information on the state of the two partitions and when the last partition sync was performed. Partition sync is the easiest way to back up and restore your Tanium Appliance.

Minimal off-box backup

A minimal off-box backup is an ideal option for when you need to back up key material from TanOS. This option is very fast and provides Tanium Server public and private keys, the Tanium Server certificate and key, and the content signing keys. These files are useful to migrate specific keys from one appliance to another (such as moving from proof-of-concept (POC) appliances to production appliances).

You have the option to transfer the backup file to a remote location using SCP, or to drop the backup into the /outgoing directory for manual collection.

$ ls -la
total 304
drwxr-xr-x  11 john.doe  staff    352 May  5 11:21 .
drwxr-xr-x   4 john.doe  staff    128 May  5 11:21 ..
-r--------   1 john.doe  staff   1147 Apr 15 14:13 SOAPServer.crt
-r--------   1 john.doe  staff   1704 Apr 15 14:13 SOAPServer.key
drwxr-xr-x   3 john.doe  staff     96 May  5 11:21 content_public_keys
-rw-r-----   1 john.doe  staff  86016 May  5 11:16 pki.db
-rw-r-----   1 john.doe  staff  20480 Apr 15 14:13 server.db
-rw-r-----   1 john.doe  staff   1037 Apr 15 14:13 tanium-init.dat
-rw-r-----   1 john.doe  staff   9904 Apr 16 11:44 tanium.license
-rw-r-----   1 john.doe  staff    158 Apr 15 14:13 tanium.pub
-rw-r-----   1 john.doe  staff  20480 Apr 15 14:13 tdownloader.db
Figure  1:  Sample contents of a minimal backup

Full off-box backup

A full off-box backup is a complete backup of your Tanium installation. This backup can take some time to perform and stops all Tanium Services for the duration of the backup. This option is ideal for getting a complete backup of your Tanium installation that is available off box.

You have the option to transfer the backup file to a remote location using SCP, or to drop the backup into the /outgoing directory for manual collection.

$ ls -la
total 172280
drwxr-x---   46 john.doe  staff      1472 May  5 11:59 .
drwxr-x---   14 john.doe  staff       448 Apr 15 14:13 ..
drwxr-x---    3 john.doe  staff        96 May  5 11:16 Backup
drwxr-x---  505 john.doe  staff     16160 May  5 11:47 Downloads
drwxr-x---    9 john.doe  staff       288 Apr 26 00:15 Logs
-rw-r-----    1 john.doe  staff        57 May  5 11:59 ResultCache3.txt
-r--------    1 john.doe  staff      1147 Apr 15 14:13 SOAPServer.crt
-r--------    1 john.doe  staff      1704 Apr 15 14:13 SOAPServer.key
drwxr-x---    2 john.doe  staff        64 Apr 16 11:57 SOAPUpload
-rw-r-----    1 john.doe  staff     10856 May  5 11:59 SensorDiffHistory2.sdh
-rw-r-----    1 john.doe  staff     10856 May  5 11:59 SensorDiffHistoryNoTemp2.sdh
drwxr-x---    2 john.doe  staff        64 Apr 15 14:13 Strings
-rw-r-----    1 john.doe  staff       203 May  5 11:56 SystemStatus.txt
drwxr-x---    3 john.doe  staff        96 Apr 15 14:19 TDL_Logs
-r-xr-xr-x    1 john.doe  staff   7419800 Feb 11 21:32 TaniumKeyUtility
-r-xr-x---    1 john.doe  staff   7166072 Feb 11 21:32 TaniumPythonAuthPlugin
-r-xr-xr-x    1 john.doe  staff  27353360 Feb 11 21:32 TaniumServer
-r-xr-xr-x    1 john.doe  staff      2499 Feb 11 21:32 TaniumServerPostgresInstall.sh
-r-xr-xr-x    1 john.doe  staff     10528 Feb 11 21:32 TaniumSpawnHelper
-r-xr-xr-x    1 john.doe  staff   8703152 Feb 11 21:32 TaniumTDownloader
drwxr-x---    2 john.doe  staff        64 Apr 15 14:13 VB
drwxr-x---    5 john.doe  staff       160 Apr 15 14:12 content_public_keys
drwxr-x---    2 john.doe  staff        64 Apr 15 14:13 export
drwxr-x---   13 john.doe  staff       416 Apr 16 11:50 http
drwxr-x---    6 john.doe  staff       192 May  5 11:16 info
drwxr-x---    7 john.doe  staff       224 Apr 15 14:12 init_postgres
-r-xr-x---    1 john.doe  staff   2948144 Feb 11 21:32 libcrypto.so.1.0.0
-r-xr-x---    1 john.doe  staff  16330232 Feb 11 21:32 libpython3.8.so
-r-xr-x---    1 john.doe  staff  16330232 Feb 11 21:32 libpython3.8.so.1.0
-r-xr-x---    1 john.doe  staff    473544 Feb 11 21:32 libssl.so.1.0.0
-r-xr-x---    1 john.doe  staff        26 Feb 11 21:32 libtbbmalloc.so
-r-xr-x---    1 john.doe  staff    863840 Feb 11 21:32 libtbbmalloc.so.2
-r--r-----    1 john.doe  staff     39001 Feb 11 21:32 mime_types.json
-r-xr-x---    1 john.doe  staff    352048 Feb 11 21:32 pkcs11.so
-rw-r-----    1 john.doe  staff     86016 May  5 11:59 pki.db
drwxr-x---    4 john.doe  staff       128 Apr 15 14:13 plugins
drwxr-xr-x    3 john.doe  staff        96 Apr 15 14:13 python27
drwxr-xr-x    3 john.doe  staff        96 Apr 15 14:13 python37
drwxr-x---    4 john.doe  staff       128 Apr 15 14:12 python38
-rw-r-----    1 john.doe  staff     20480 Apr 15 14:13 server.db
drwxr-x---    3 john.doe  staff        96 Apr 15 14:12 swidtag
-rw-r-----    1 john.doe  staff      1037 Apr 15 14:13 tanium-init.dat
-rw-r-----    1 john.doe  staff      9904 Apr 16 11:44 tanium.license
-rw-r-----    1 john.doe  staff       158 Apr 15 14:13 tanium.pub
-rw-r-----    1 john.doe  staff     20480 Apr 15 14:13 tdownloader.db
-r--------    1 john.doe  staff      1147 Apr 15 14:13 trusted-module-servers.crt
Figure  2:  Sample contents of a full backup

Tanium database backup

Use this option to back up the Tanium database. By default, Tanium performs an automated backup of the database on a daily basis. This backup is copied to the /outgoing directory for retrieval. This database backup file is crucial if a database restore is necessary. The database backup is quick and does not require the Tanium service to be stopped.

The Tanium Appliance will also keep on disk up to 7 days (rolling) of database backups.

Backup exclusions

  • Generally, any TanOS configuration that is specific to that appliance. This includes:
    • Auth key file (SSH keys)
    • FQDN, IP, routes
  • TanOS local user accounts and their account policies (such as password policy)
  • Scheduled jobs (such as backups)

TanOS backup recommendations

The following section describes standard guidance for backups. You can reference these as standard guidelines. Your backup strategy may differ based on the specific needs of your organization.

Define a disaster recovery plan

Tanium recommends that you define a disaster recovery plan early during deployment. The restoration and backup options that fit your disaster recovery (DR) plan vary depending on your specific needs and requirements.

To properly define a DR plan, define specific restore time objectives (RTO) and restore point objectives (RPO).

  • RPO: The expected point of recovery. For example, an RPO of 1 month means recovery point within 30 days of the time of failure.
  • RTO: What is the amount of downtime that is acceptable to recover the system.

Your frequency for backup and recovery may vary depending on your RPO and RTO.

General recommendations

Tanium recommends performing a partition sync (or VM snapshot where applicable) prior to any major changes, including:

  • TanOS upgrades
  • Tanium Server upgrades
  • Tanium Modules upgrades

Tanium recommends that you use the automated backup features available in TanOS 1.5.6 and later for automating many of the backup options listed below. For detailed steps, see Configure and run automatic backups.

Tanium Appliance - virtual

For virtual deployments, Tanium recommends that you back up the Tanium Appliance virtual appliance image file. You can use these image files to restore the Tanium Appliance to the specific time the backup was taken, and is the preferred method to quickly restore the entire appliance during major failures. Tanium recommends performing this backup once a day and holding on disk as many copies as you deem necessary and in accordance with your company’s RPO.

There are a number of third party tools on the market that accomplish this task. Contact your virtual infrastructure team to determine if your organization already uses one of these solutions. Tanium does not allow third party backup agents to be installed on the appliance; however, open-vm-tools is installed on the virtual appliance to allow these applications to quiesce the file system before taking the snapshots that are used to save the image to persistent storage.

If you are not able to perform image-based backups and you do not use clustered Tanium Servers, Tanium recommends partition syncs every two weeks and monthly full off-box backups for all Tanium Servers. Note that only one recovery partition is maintained.

For environments with Tanium deployed in a redundant cluster, Tanium recommends monthly partition syncs and quarterly full off-box backups. You have the ability to restore the Tanium Server from a secondary Tanium Server in a redundant cluster with minimal downtime, so the full off-box backup is less critical in this scenario. It is only necessary to perform the full off-box backup on the secondary Tanium Server of the cluster to reduce Tanium down time.

Tanium Appliance - physical

For physical appliances that do not use clustered Tanium Servers, Tanium recommends weekly partition syncs and monthly full off-box backups for all Tanium Servers.

For environments with Tanium deployed in a redundant cluster, Tanium recommends monthly partition syncs and quarterly full off-box backups. You have the ability to restore Tanium Server from a secondary Tanium Server in a redundant cluster with minimal downtime, so the full off-box backup is less critical in this scenario. It is only necessary to perform the full off-box backup on the secondary Tanium Server of the cluster to reduce Tanium down time.

Backup automation

Automated backups with TanOS 1.5.6 and later

Use the automated backup options that are available out of the box. For detailed steps, see Configure and run automatic backups.

To fully automate the transfer of backup files to an off-box location, you must set up an SCP destination that the appliance can access.

Automated backups with TanOS 1.5.5 and earlier

You can optionally set up automation to retrieve Tanium backups from the outgoing folder with a cron job that runs daily from a remote system. The following is a sample script that can be scheduled with a cron job to retrieve the backup files:

#!/bin/bash

#Usage: ./DownloadBackups.sh <Server IP address>
#Once configured, you can call this script from a cron job on a daily basis to retain backups.

TanOSIP=$1
bu_dest=<backup storage location>

#Get database backup
echo "Copying database backup"
echo "get outgoing/*.pgdump" $bu_dest | sftp [email protected]$TanOSIP
echo "Done!"

#Get Tanium Server Backup
echo "Copying Server backup"
echo "get outgoing/bu_*zip" $bu_dest | sftp [email protected]$TanOSIP
echo "Done!"

If you need further assistance with this script, contact your Technical Account Manager (TAM).

TanOS restore options

The following section details specific options for Tanium Appliance restoration. The options are organized from least impactful to most impactful. Review each option sequentially to determine the correct procedures to follow for your restoration.

These steps are not designed to be run without the assistance of a TAM. Consult with your TAM before you run any steps in this section.

Option 1: Restore TanOS to a partition sync (physical or virtual)

The quickest way to restore to a known good restore point is to use the partition sync feature on TanOS.

Prerequisites

  • A previous partition sync that is within your RPO
  • The password that was used on the previous partition sync (if the TanOS password was changed after partition sync)
  • Access to a tanadmin privileged account

Notes

  • Your partition syncs for each Tanium Server and the Module Server must be initiated within 30 minutes of each other (to minimize configuration drift between the Tanium Server and the Tanium Module Server).

Steps

  1. Log into the TanOS console as a user with the tanadmin role. Make sure the appliance is booted to the master partition in normal mounting mode.
  2. Initiate a partition swap (B-X-3). TanOS automatically reboots.
  3. Boot into the TanOS Active partition. You will now be in inverted mounting mode.
  4. Perform a partition sync (B-1-1). This syncs your backup partition (which is active) to your master partition (inactive).
  5. Changes on the master partition will be overwritten.
  6. Initiate a partition swap (B-X-3). TanOS automatically reboots.
  7. Repeat these steps for all Tanium Servers and Tanium Module Servers in the cluster.
    • Start with all Tanium Servers followed by the Tanium Module Servers.

Option 2: Restore TanOS using a VM image or snapshot (virtual)

These steps are only available for virtual appliances as these steps require a VM image (or snapshot) to be available for restoration.

Prerequisites

  • A known good image/snapshot for appliance restore.
  • The restore point must have Tanium Module Server and Tanium Server images that are at the same time.

Steps

  1. Turn off all appliances (Tanium Server, Tanium Module Server).
  2. Restore the virtual appliances to the image or snapshot (state).
  3. Start the Tanium Servers first, followed by the Tanium Module Server.
  4. Perform your standard checkout steps (including the following):
    1. Import/reimport/upgrade a module (upgrade is okay in this state instead of a reimport).
    2. Make sure modules can load without errors.
    3. Make sure modules service accounts are present.
    4. Make sure plugin schedules are set to running and not disabled.

Option 3: File-level restoration (physical or virtual)

You can restore individual files on your TanOS appliance deployment. This is an option when you know which files you need to restore and have a good version in your backup.

Prerequisites

  • A known good backup of the files
    • Key material can be found in minimal backups
    • Tanium folders can be found in full backups
  • Root shell key to restore the file on the appliance

Notes

  • You may need to stop services before performing file restoration. You can do this in the TanOS menu (2-1) or using systemctl [stop, start, status] <service name> on the command line.
  • You can restore some files from TanOS menus such as:
    • Tanium Server SOAP certificates
    • Tanium public and private keys (Tanium Core Platform 7.3 and earlier)

Steps

  1. Extract the files from the backup and upload to the /incoming folder of the appliance.
  2. Get a root shell to the box (menu B-5-3). You will need to request additional access from your TAM.
  3. Obtain the needed files from /opt/home/tancopy/incoming and copy to a staging location such as /home/tanadmin/.
  4. Run ls -alh on the files you are about to replace and take a screenshot of the file permissions and ownership.
  5. Back up the existing files to a staging location.
  6. Copy the restore files to the destination.
  7. Make sure the permissions remain the same after you copy the file to the destination.
    • Check the previous files for permissions and match those permissions for the newly restored files.
    • You may need to run the chown and chmod commands to fix permissions and ownership. Refer to the Linux command help to understand how to use these commands.
  8. After you finish, clean up the staging folder that you created.
  9. Log out of the root shell and revoke your root shell access.

Option 4: Recover failed member of a redundant TS cluster (physical or virtual)

If you have a redundant cluster set up for Tanium Server, you can perform the following steps to recover the cluster.

Prerequisites

  • One member of the redundant cluster must be in good working order.
  • Replication and communication must be healthy within the cluster prior to failure.

Restore secondary Tanium Server steps

If the secondary Tanium Server is down, and your environment is operational (but in a reduced redundancy state):

  1. Set up the new appliance.
  2. Conduct initial configuration:
  3. Install the role for this appliance.
    • Make sure that you match the platform version of the primary server.
  4. Register the remote Tanium Module Server to this new secondary Tanium Server.
  5. (HA) Reconfigure clustering.
    1. Set up IPSEC.
    2. Set up clustering with the cluster pair. Make sure the new appliance joins the cluster (do not initialize the cluster on the new appliance).
    3. After the new appliance joins the cluster, it will re-sync the database.
  6. In a web browser, log in to the Tanium Console and reimport all modules to ensure they match the same version as the primary Tanium Server. Be careful not to accidentally upgrade modules on the secondary Tanium Server.
    • After each module import, verify that the module loads and service accounts and other settings are restored (connections, detect alerts, and so on).
  7. Manually restore the following:
    • SSH auth keys
    • Reconfigure any TanOS settings
    • Backup schedules (if any)
    • (Optional) If you have Tanium web console users (not AD synced), recreate them in TanOS. This step is not needed if you are joining a cluster.
  8. Ensure plugin schedules are in place and perform standard checkout steps:
    • Import/reimport/upgrade a module. It is okay to upgrade modules in this state.
    • Make sure modules can load without errors.
    • Make sure package files are cached.
    • Make sure packages, sensors, users, computer groups, and other objects are restored.

Restore primary Tanium Server steps

If the primary Tanium Server is down, your Tanium is not functional until you perform the following steps:

  1. Promote the secondary Tanium Server to primary by following the steps at Initiate database server failover.
  2. Stand up a new server by following the instructions in Restore secondary Tanium Server steps.

Option 5: Recover failed TMS with standby TMS (physical or virtual)

You can perform the following procedure when recovering a TMS server failure with a standby (inactive) TMS.

Prerequisites

  • The most recent TMS sync must be successful and meet your RPO.

Steps

  1. Register your standby module server with your primary (and secondary) Tanium Servers. See Configure the Tanium Server to use the remote Module Server.
  2. Make sure plugin schedules are in place and perform standard checkout steps:
    1. Import/reimport/upgrade a module. It is okay to upgrade modules in this state.
    2. Make sure modules can load without errors.
    3. Make sure package files are cached.
    4. Make sure packages, sensors, users, computer groups, and other objects are restored.
  3. Your environment is now fully operational. Follow the steps at Deploying a standby Module Server to deploy a new standby Tanium Module Server:

Option 6: Fully restore to a new appliance (physical or virtual)

Use this procedure for a full restore of TanOS to a new appliance. You can use these steps for a complete failure or to migrate to a new appliance.

Prerequisites

  • Minimum: minimal backup and database backup
  • Recommended: full backup and database backup
  • Destination appliance is connected and has an IP address (provisioned)

The steps in this section assume you are only restoring a single Tanium Appliance. If you need assistance with a coordinated restore with multiple appliances (such as a Tanium Server and a Tanium Module Server), consult your TAM.

General Steps

  1. Set up the new appliance.
  2. Conduct initial configuration:
  3. Install the role needed for this backup.
    • Make sure that you match the platform version of the backup.
  4. Shut down all Tanium services.
  5. Get a root shell to the box (B-5-3). You will need to request additional access from your TAM.
  6. Restore the content of the backup to the TaniumServer Directory (do this as root).
    1. Follow the steps to decrypt the backup provided within the tgz file.
    2. Sftp the <backup>_internal.tgz file (final file containing the actual backup) to the incoming folder of the appliance.
    3. Copy <backup>_internal.tgz file to the root directory (/):
      cp /opt/home/tancopy/incoming/<backup>_internal.tgz /
    4. Extract the payload:
      tar -xvf <backup>_internal.tgz

      This restores the files from the previous appliance to the new appliance in the correct file locations.

  7. Re-apply ACL.
  8. Start all Tanium services.

Tanium Server steps

  1. Re-register the Tanium Module Server to the new Tanium Server.
  2. Reconfigure clustering, if applicable.
    1. Set up IPSEC.
    2. Set up clustering with the cluster pair. Make sure the new appliance joins the cluster (do not initialize the cluster on the new appliance).
    3. After the new appliance joins the cluster, it will re-sync the database.
  3. In a web browser, log in to the Tanium Console and reimport all of the previous modules.
    • After each module imports, verify the module loads and service accounts and other settings are restored (connections, detect alerts, and so on).
  4. Manually restore the following:
    • SSH auth keys
    • Reconfigure any TanOS settings
    • Backup schedules (if any)
    • (Optional) If you have Tanium web console users (not AD synced), recreate them in TanOS. This step is not needed if you are joining a cluster.
  5. Make sure plugin schedules are in place and perform standard checkout steps:
    • Import/reimport/upgrade a module. It is okay to upgrade modules in this state.
    • Make sure modules can load without errors.
    • Make sure package files are cached.
    • Make sure packages, sensors, users, computer groups, and other objects are restored.

Tanium Module Server steps

  1. Re-register the Tanium Module Server to the Tanium Server.
    1. For redundant clusters, register the Tanium Module Server to each Tanium Server appliance.
    2. For redundant cluster, recluster the Tanium Server appliance.
  2. If a secondary Tanium Module Server is present:
    1. Set up IPSEC with the secondary Tanium Module Server.
    2. Reconfigure Tanium Module Server sync as previously configured.
  3. In a web browser, log in to the Tanium Console and reimport all the modules again.
    • After each module imports, verify the module loads and service accounts and other settings are restored (connections, detect alerts, and so on).
  4. Manually restore the following:
    • SSH auth keys (for example, tanadmin)
    • Reconfigure any TanOS-specific settings (such as password policies)
    • Backup schedules (if any)
  5. Make sure plugin schedules are in place and perform standard checkout steps:
    1. Import/Reimport/Upgrade a module. It is okay to upgrade modules in this state.
    2. Make sure modules can load without errors
    3. Make sure module service accounts are present.
    4. Make sure module configurations are correctly set.
    5. Make sure plugin schedules are running and not disabled.
    6. Run a TanOS Health Check (menu 3-5) and resolve any issues.

Tanium Zone Server steps

  1. Reconfigure the zone server to match the settings on the previous zone server. See Install the Tanium Zone Server.
  2. Set up communications with the associated Tanium Zone Server Hub.
  3. Manually restore the following:
    • SSH auth keys (for example, tanadmin)
    • Reconfigure any TanOS-specific settings (such as password policies)
    • Backup schedules (if any)
  4. Make sure you see clients successfully registering through the zone servers.
    • Test asking questions to clients connected through the zone server.
    • Test issuing packages to the clients behind the zone server.