HashiCorp Vault Integration
CheckTick uses HashiCorp Vault for secure encryption key management and ethical data recovery. This guide covers deployment, configuration, and developer integration.
Overview
Vault provides:
- Key Escrow: Secure storage for user encryption keys (enables recovery when users forget credentials)
- Split-Knowledge Security: Platform master key split between Vault and offline custodian
- Audit Logging: Immutable audit trail for compliance
- AppRole Authentication: Secure application access
Why Vault?
Users are custodians of patient data, not owners. It's unethical to allow permanent data loss when users forget their password AND recovery phrase. Vault enables platform-assisted recovery while maintaining security through:
- Dual-approval workflow (two admins required)
- Mandatory time delays (users can cancel suspicious requests)
- Identity verification requirements
- Complete audit trail
Quick Start (Docker Hub)
The simplest deployment uses the official Vault image from Docker Hub:
# Pull and run Vault
docker run -d --name vault \
-p 8200:8200 \
-e VAULT_ADDR=http://127.0.0.1:8200 \
-v vault-data:/vault/file \
hashicorp/vault:1.21.1 \
vault server -dev
# Initialize (production mode)
docker exec -it vault vault operator init -key-shares=4 -key-threshold=3
For production, see Deployment Options below.
Architecture
Platform Master Key (split-knowledge: Vault + Custodian)
ββ Organisation Keys (derived from platform key + org owner passphrase)
β ββ Team Keys (derived from org key)
β β ββ Survey KEKs (encrypted with team key)
β ββ Direct Survey KEKs (for org-level surveys)
ββ Individual User Keys (ALL stored in Vault for recovery)
ββ Username/Password Users β Survey KEK encrypted in Vault
ββ SSO (OIDC) Users β Survey KEK encrypted with identity key
ββ Recovery Phrase β Always available as fallback
Split-Knowledge Security
The platform master key uses a split-knowledge design:
Platform Master Key = Vault Component β Custodian Component
- Vault Component: Stored in HashiCorp Vault (accessible to CheckTick)
- Custodian Component: Stored offline by platform administrators
- Neither component alone can decrypt data
- Both required to reconstruct the full platform key
Key Hierarchy
| Level | Key Size | Storage | Purpose |
|---|---|---|---|
| Platform Master | 64 bytes | Split (Vault + Offline) | Root of all derivation |
| Organisation | 32 bytes | Derived on-demand | Org-level encryption |
| Team | 32 bytes | Derived on-demand | Team-level encryption |
| Survey KEK | 32 bytes | Vault (encrypted) | Encrypts survey data |
| User Recovery | 32 bytes | Vault (escrowed) | Emergency recovery |
Vault Paths
secret/platform/master-key # Vault component of platform key
secret/organisations/{id}/master-key # Org key metadata
secret/teams/{id}/team-key # Team key metadata
secret/surveys/{id}/kek # Org/team survey KEKs
secret/users/{id}/surveys/{id}/recovery-kek # Individual user recovery keys
Deployment Options
Option 1: Northflank (Recommended for Production)
Northflank provides managed container hosting with persistent storage.
Step 1: Create Volume
- Navigate to your project β Add new β Addon β Volume
- Configure:
- Name:
vault-data - Size: 10GB
- Type: NVMe
Step 2: Create Service
- Add new β Service β External Image
- Configure:
- Service name:
vault - Image:
hashicorp/vault:1.21.1 - Port:
8200(HTTP, Private only - do NOT make public)
Security: Keep Vault private. Only your CheckTick webapp should access it via Northflank's internal networking.
Step 3: Mount Volume
- Volume:
vault-data - Mount path:
/vault/file
Step 4: Command Override
/bin/sh -c 'printf "ui = true\nlistener \"tcp\" {\n address = \"0.0.0.0:8200\"\n tls_disable = true\n}\nstorage \"file\" {\n path = \"/vault/file\"\n}\napi_addr = \"http://127.0.0.1:8200\"\n" > /vault/config/vault.hcl && vault server -config=/vault/config/vault.hcl'
Note:
tls_disable = trueis correctβNorthflank's load balancer handles TLS termination.
Step 5: Environment Variables
| Variable | Value |
|---|---|
VAULT_ADDR |
http://127.0.0.1:8200 |
SKIP_CHOWN |
true |
SKIP_SETCAP |
true |
Step 6: Network Configuration
Critical Security Step: Ensure Vault is only accessible internally.
- In Vault service settings β Networking:
- β Private Port: 8200 enabled
-
β Public Port: Disabled (or remove public exposure)
-
Note the internal URL:
vault.PROJECT_ID.svc.cluster.local:8200 -
In CheckTick webapp service β Environment Variables: ```bash # If you have TLS configured on Vault: VAULT_ADDR=https://vault.PROJECT_ID.svc.cluster.local:8200 VAULT_TLS_VERIFY=true # Verify Vault's TLS certificate
# If using Northflank's load balancer TLS only: VAULT_ADDR=http://vault.PROJECT_ID.svc.cluster.local:8200 VAULT_TLS_VERIFY=false # Load balancer handles TLS ```
- Replace
PROJECT_IDwith your actual Northflank project ID (found in project URL)
Step 7: Deploy and Initialize
See Initialization below.
Option 2: Docker Compose (Development)
Create docker-compose.vault.yml:
version: '3.8'
services:
vault:
image: hashicorp/vault:1.21.1
container_name: vault
ports:
- "8200:8200"
environment:
VAULT_ADDR: http://127.0.0.1:8200
cap_add:
- IPC_LOCK
volumes:
- vault-data:/vault/file
command: >
vault server -dev-root-token-id="dev-token"
restart: unless-stopped
volumes:
vault-data:
For production Docker Compose, use file storage instead of dev mode:
command: >
sh -c "vault server -config=/vault/config/vault.hcl"
Option 3: Kubernetes (Enterprise)
For large-scale deployments, use the official Helm chart:
helm repo add hashicorp https://helm.releases.hashicorp.com
helm install vault hashicorp/vault \
--set server.ha.enabled=true \
--set server.ha.replicas=3
Initialization
After deployment, initialize Vault (one-time only):
Step 1: Connect to Vault
# Docker
docker exec -it vault /bin/sh
# Northflank: Use Shell button in dashboard
# Kubernetes: kubectl exec -it vault-0 -- /bin/sh
Step 2: Initialize
export VAULT_ADDR=http://127.0.0.1:8200
# Initialize with Shamir's Secret Sharing
vault operator init -key-shares=4 -key-threshold=3
Output (save immediately!):
Unseal Key 1: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Unseal Key 2: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Unseal Key 3: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Unseal Key 4: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Initial Root Token: hvs.xxxxxxxxxxxxxxxxxxxxxxxxxxxx
Step 3: Store Keys Securely
Vault Unseal Keys (for Vault infrastructure):
| Key | Storage Location |
|---|---|
| Unseal Key 1 | Admin 1's password manager |
| Unseal Key 2 | Admin 2's password manager |
| Unseal Key 3 | Physical safe (printed, sealed) |
| Unseal Key 4 | Encrypted cloud backup |
| Root Token | Both admins' password managers (temporary, will be revoked) |
Note: Custodian component shares will use the same distribution pattern (see Step 5b).
Step 4: Unseal
Vault starts sealed and requires 3 of 4 keys to unseal:
vault operator unseal <key1>
vault operator unseal <key2>
vault operator unseal <key3>
# Verify
vault status
# Should show: Sealed: false
Step 5: Run Setup Script
From your CheckTick environment:
# Set environment
export VAULT_ADDR=https://your-vault-url:8200
export VAULT_TOKEN=<root-token>
# Run setup
cd vault/
python setup_vault.py
This creates:
- KV v2 secrets engine
- CheckTick policies
- AppRole authentication
- Platform master key (vault component stored in Vault, custodian component outputted)
Save the AppRole credentials to your .env file:
VAULT_ADDR=https://your-vault-url:8200
VAULT_ROLE_ID=<from-setup-output>
VAULT_SECRET_ID=<from-setup-output>
# DO NOT add PLATFORM_CUSTODIAN_COMPONENT (see next step)
Step 5b: Split Custodian Component
The setup script outputs a custodian component hex string. Split it into Shamir shares for security:
# Copy the custodian component from setup_vault.py output
python manage.py split_custodian_component \
--custodian-component=<paste-hex-here>
# Output:
# Share 1: 801-abc123def456...
# Share 2: 802-xyz789ghi012...
# Share 3: 803-jkl345mno678...
# Share 4: 804-pqr901stu234...
Distribute shares (aligned with Vault unseal keys):
| Share | Storage Location |
|---|---|
| Share 1 | Admin 1's password manager (same person who has Unseal Key 1) |
| Share 2 | Admin 2's password manager (same person who has Unseal Key 2) |
| Share 3 | Physical safe (with Unseal Key 3) |
| Share 4 | Encrypted cloud backup (with Unseal Key 4) |
Threshold: Need any 3 of 4 shares to perform platform recovery.
Security: Never store custodian shares in application environment variables or database.
Step 5c: Platform Key Versioning
CheckTick supports platform key versioning, enabling key rotation without breaking access to existing encrypted surveys.
How Versioning Works
Each platform key version consists of:
- Vault Component: Stored in PostgreSQL database (PlatformKeyVersion model)
- Custodian Component: Stored offline (YubiKeys/paper/USB) via Shamir shares
- Version ID: String identifier (e.g., "v1", "v2", "v3")
When a survey KEK is escrowed, the platform key version is recorded in the database. During recovery, the system uses the correct versioned components to decrypt that specific KEK, even after platform key rotation.
Initial Version Setup
The initial setup script creates version "v1". To create subsequent versions:
# Option A: Generate NEW platform key (Option B rotation)
python manage.py create_platform_key_version --version v2 --activate
# Option B: Rotate ONLY Shamir shares (Option A rotation)
# See "Platform Key Rotation" section below
Check Active Version
python manage.py shell
>>> from checktick_app.surveys.models import PlatformKeyVersion
>>> active = PlatformKeyVersion.get_active_version()
>>> print(f"Active: {active.version}")
Active: v1
Version Lifecycle
- Created: Version exists but not yet active
- Active: Used for new KEK escrows (only one active at a time)
- Retired: No longer used for new escrows, but still needed for decrypting old surveys
Important: Never delete retired versions while escrows exist referencing them.
Step 6: Revoke Root Token
vault token revoke <root-token>
Step 7: Test Connection
python manage.py test_vault_connection
Developer Integration
Get Vault Client
from checktick_app.surveys.vault_client import get_vault_client
vault = get_vault_client()
Escrow Survey KEK (During Survey Creation)
def create_survey_with_escrow(survey, user, user_password):
"""Create survey with triple-path encryption."""
vault = get_vault_client()
# Generate survey KEK
survey_kek = os.urandom(32)
# Path 1: Password-encrypted (database)
survey.encrypted_kek = encrypt_with_password(survey_kek, user_password)
# Path 2: Recovery phrase (database)
survey.recovery_encrypted_kek = encrypt_with_recovery(survey_kek, user.recovery_phrase)
# Path 3: Platform escrow (Vault)
vault_path = vault.escrow_user_survey_kek(
user_id=user.id,
survey_id=survey.id,
survey_kek=survey_kek
)
survey.vault_recovery_path = vault_path
survey.save()
return survey_kek
Recover Escrowed KEK
def execute_platform_recovery(recovery_request, admin_user):
"""Execute recovery after dual authorization + time delay."""
# Get custodian component (from secure offline storage)
custodian_component = bytes.fromhex(settings.PLATFORM_CUSTODIAN_COMPONENT)
vault = get_vault_client()
# Recover KEK
survey_kek = vault.recover_user_survey_kek(
user_id=recovery_request.user_id,
survey_id=recovery_request.survey_id,
admin_id=admin_user.id,
verification_notes="Recovery request approved",
platform_custodian_component=custodian_component
)
return survey_kek
Organisation/Team Key Derivation
# Derive organisation key
org_key = vault.derive_organisation_key(
org_id=org.id,
org_owner_passphrase=passphrase,
platform_custodian_component=custodian_component
)
# Derive team key from org key
team_key = vault.derive_team_key(
team_id=team.id,
org_key=org_key
)
# Encrypt survey KEK with hierarchy key
vault.encrypt_survey_kek(
survey_kek=survey_kek,
hierarchy_key=team_key,
vault_path=f'surveys/{survey.id}/kek'
)
VaultClient Methods
| Method | Purpose |
|---|---|
get_platform_master_key() |
Reconstruct platform key from split components |
derive_organisation_key() |
Derive org key from platform key + passphrase |
derive_team_key() |
Derive team key from org key |
escrow_user_survey_kek() |
Store user's KEK for recovery |
recover_user_survey_kek() |
Recover KEK (requires custodian component) |
encrypt_survey_kek() |
Encrypt KEK with hierarchy key |
decrypt_survey_kek() |
Decrypt KEK from Vault |
health_check() |
Check Vault connectivity and status |
Recovery Workflows
Individual User Recovery
When a user forgets both password AND recovery phrase:
- User submits recovery request via UI
- Admin reviews and verifies identity (email, video call, etc.)
- Primary admin approves the request
- Secondary admin approves (dual authorization)
- Time delay starts (e.g., 24 hours)
- User notified and can cancel if suspicious
- After delay, admin executes recovery with new password
- KEK re-encrypted with user's new password
- Audit trail recorded for compliance
Organisation Member Recovery
If org member loses access:
- Organisation owner provides passphrase
- System derives org key
- System decrypts survey KEK
- User sets new password
- KEK re-encrypted with new password
Catastrophic Recovery
If organisation owner forgets passphrase:
- Platform admins retrieve custodian component
- Business verification (legal documentation)
- Platform admins derive org key via emergency process
- Owner sets new passphrase
- Keys re-derived
Platform Key Rotation
CheckTick supports two platform key rotation strategies:
Option A: Rotate Shamir Shares Only (Recommended)
Use when: - Custodian employee leaves company - Suspected compromise of physical shares - Routine annual/biennial rotation policy - YubiKey hardware replacement
Benefits: - Platform master key remains unchanged - Old surveys remain decryptable automatically - Only one set of custodian shares needed at a time - Simpler operational model
Workflow:
# Step 1: Reconstruct custodian component from existing shares
python manage.py reconstruct_custodian_component
# Step 2: Rotate shares (generates NEW vault + custodian components)
python manage.py rotate_platform_key_shares \
--version v1 \
--existing-custodian-component platform_custodian_component_v1.bin
# Step 3: Split new custodian component
python manage.py split_custodian_component \
platform_custodian_component_v1_rotated_20260219_143000.bin
# Step 4: Distribute NEW shares to custodians
# Step 5: DESTROY old physical shares (wipe YubiKeys, shred paper)
# Step 6: DELETE temporary files (shred -vfz -n 10 *.bin)
Database Updates:
- Vault component updated in PlatformKeyVersion table
- shares_last_rotated timestamp updated
- Version ID remains the same (e.g., "v1")
Security Notes: - Cryptographic key material never changes - Old surveys decrypt with existing database records - No need to maintain multiple custodian share sets
Option B: Generate New Platform Key (Maximum Security)
Use when: - Cryptographic algorithm weakness discovered - Regulatory requirement for key material refresh - Complete cryptographic "clean slate" after security incident
Benefits: - Perfect forward secrecy (old key compromise doesn't affect new surveys) - Fresh cryptographic material - Clear security boundaries between time periods
Trade-offs: - Must maintain multiple custodian share sets (one per version) - More complex custodian management over time - Requires careful documentation of which shares work with which surveys
Workflow:
# Step 1: Create new platform key version with fresh random material
python manage.py create_platform_key_version --version v2
# Step 2: Split new custodian component
python manage.py split_custodian_component \
platform_custodian_component_v2.bin
# Step 3: Distribute NEW shares to custodians
# Important: KEEP old v1 shares for decrypting old surveys
# Step 4: Activate new version for new escrows
python manage.py activate_platform_key_version --version v2
# Step 5: DELETE temporary files (shred -vfz -n 10 *.bin)
Database Updates:
- New row created in PlatformKeyVersion table
- Version "v2" becomes active
- Version "v1" automatically retired (but kept for old surveys)
Custodian Share Management:
| Version | Status | Custodian Shares | Used For |
|---|---|---|---|
| v1 | Retired | Keep in safe | Decrypting surveys created before rotation |
| v2 | Active | Distribute now | New KEK escrows |
Important: Label custodian share storage clearly: - "Platform Key v1 - Surveys before 2026-02-19" - "Platform Key v2 - Surveys after 2026-02-19"
Choosing a Rotation Strategy
| Factor | Option A (Shamir Shares) | Option B (New Key) |
|---|---|---|
| Operational Complexity | β Simple | β Complex |
| Custodian Share Sets | One at a time | Accumulate over time |
| Forward Secrecy | β Same key | β Fresh key |
| Old Survey Decryption | β Automatic | β Requires correct version shares |
| Recommended For | Routine rotation | Security incidents |
| Rotation Frequency | Every 2 years | Only when necessary |
Recommendation: Use Option A for routine rotation, Option B only for security incidents or compliance requirements.
Rotation Policies
Recommended Schedule: - Shamir Shares (Option A): Every 2 years - Platform Key Material (Option B): Only when required
Triggers for Immediate Rotation: - Custodian employee termination - Suspected physical share compromise - YubiKey reported lost/stolen - Regulatory audit requirement - Security incident involving platform access
Monitoring Share Health:
python manage.py shell
>>> from checktick_app.surveys.models import PlatformKeyVersion
>>> active = PlatformKeyVersion.get_active_version()
>>> if active.needs_share_rotation(rotation_policy_days=730):
... print(f"β οΈ Shares for {active.version} need rotation")
Rollback Procedure
If a rotation goes wrong:
# Reactivate old version
python manage.py activate_platform_key_version --version v1
# New escrows will use v1 again
# No data loss - both versions coexist in database
Note: This requires reconstructing the old platform key from old custodian shares.
Monitoring
Health Check Endpoint
from django.http import JsonResponse
from checktick_app.surveys.vault_client import get_vault_client
def vault_health(request):
vault = get_vault_client()
health = vault.health_check()
status = 200 if not health.get('sealed') and health.get('initialized') else 503
return JsonResponse(health, status=status)
CLI Health Check
curl -s https://your-vault-url/v1/sys/health | jq
Key Metrics
vault_core_unsealed- Should be1(unsealed)vault_token_count- Active tokensvault_audit_log_request_total- Audit events
Alerts
Configure alerts for:
- Vault sealed (
vault_core_unsealed == 0) - High request latency (> 1s at p99)
- Authentication failures
- Recovery requests > 5/day (suspicious activity)
Backup & Recovery
Automated Backup
#!/bin/bash
# vault-backup.sh
BACKUP_DIR=/backups/vault
DATE=$(date +%Y%m%d_%H%M%S)
# Raft snapshot (for HA deployments)
vault operator raft snapshot save ${BACKUP_DIR}/vault-${DATE}.snap
# Or file storage backup
tar -czf ${BACKUP_DIR}/vault-${DATE}.tar.gz /vault/file
# Upload to cloud storage
aws s3 cp ${BACKUP_DIR}/vault-${DATE}.* s3://your-bucket/vault-backups/
# Cleanup old backups (keep 30 days)
find ${BACKUP_DIR} -mtime +30 -delete
Backup Security
Critical: Vault backups contain sensitive key material and must be secured properly.
Encryption at Rest
#!/bin/bash
# vault-backup-encrypted.sh
BACKUP_DIR=/backups/vault
DATE=$(date +%Y%m%d_%H%M%S)
ENCRYPTION_KEY=/secure/vault-backup-key.gpg
# Create snapshot
vault operator raft snapshot save ${BACKUP_DIR}/vault-${DATE}.snap
# Encrypt with GPG
gpg --encrypt --recipient vault-backup@checktick.uk \
--output ${BACKUP_DIR}/vault-${DATE}.snap.gpg \
${BACKUP_DIR}/vault-${DATE}.snap
# Remove unencrypted backup
rm ${BACKUP_DIR}/vault-${DATE}.snap
# Upload encrypted backup
aws s3 cp ${BACKUP_DIR}/vault-${DATE}.snap.gpg \
s3://your-bucket/vault-backups/ \
--server-side-encryption AES256
# Cleanup old backups (keep 30 days)
find ${BACKUP_DIR} -name "*.gpg" -mtime +30 -delete
aws s3 ls s3://your-bucket/vault-backups/ | \
awk '{if ($1 < "'$(date -d '30 days ago' +%Y-%m-%d)'") print $4}' | \
xargs -I {} aws s3 rm s3://your-bucket/vault-backups/{}
S3 Bucket Security
Required S3 bucket policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "DenyUnencryptedObjectUploads",
"Effect": "Deny",
"Principal": "*",
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::your-bucket/vault-backups/*",
"Condition": {
"StringNotEquals": {
"s3:x-amz-server-side-encryption": "AES256"
}
}
},
{
"Sid": "DenyInsecureTransport",
"Effect": "Deny",
"Principal": "*",
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::your-bucket/vault-backups/*",
"arn:aws:s3:::your-bucket"
],
"Condition": {
"Bool": {
"aws:SecureTransport": "false"
}
}
}
]
}
IAM Permissions
Create dedicated IAM user for backups (least privilege):
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:DeleteObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::your-bucket/vault-backups/*",
"arn:aws:s3:::your-bucket"
]
}
]
}
Backup Verification
Test restoration quarterly:
# Download encrypted backup
aws s3 cp s3://your-bucket/vault-backups/vault-LATEST.snap.gpg /tmp/
# Decrypt
gpg --decrypt /tmp/vault-LATEST.snap.gpg > /tmp/vault-restore.snap
# Spin up test Vault instance
docker run -d --name vault-test -p 8201:8200 hashicorp/vault:1.21.1
# Initialize and restore
vault operator init -key-shares=1 -key-threshold=1
vault operator unseal <test-key>
vault operator raft snapshot restore -force /tmp/vault-restore.snap
# Verify keys are accessible
vault kv get secret/platform/master-key
# Cleanup
docker stop vault-test && docker rm vault-test
rm /tmp/vault-*.snap*
Restore Procedure
# Stop Vault
docker stop vault
# Restore data
tar -xzf vault-backup.tar.gz -C /
# Start Vault
docker start vault
# Unseal (required after restart)
vault operator unseal <key1>
vault operator unseal <key2>
vault operator unseal <key3>
Network Security
Firewall Rules
Vault should only be accessible to CheckTick webapp. Configure firewall rules:
# Docker network isolation
docker network create --internal vault-network
docker run --network vault-network vault
docker run --network vault-network checktick-web
IP Whitelisting
For production deployments, restrict Vault access by IP:
Northflank: Use Private Networking
- Enable Private Network in project settings
- Deploy Vault with private service only (no public port)
- CheckTick webapp accesses via internal DNS:
vault.checktick-private.svc.cluster.local
AWS/Cloud: Security Group Rules
# Only allow CheckTick webapp security group
aws ec2 authorize-security-group-ingress \
--group-id sg-vault \
--protocol tcp \
--port 8200 \
--source-group sg-checktick-webapp
Self-Hosted: iptables
# Allow only from webapp server IP
iptables -A INPUT -p tcp --dport 8200 -s <webapp-ip> -j ACCEPT
iptables -A INPUT -p tcp --dport 8200 -j DROP
TLS Configuration
Production must use TLS. Two deployment patterns:
Option 1: TLS Termination at Load Balancer (Recommended)
- Load balancer handles TLS (simpler)
- Vault configured with
tls_disable = true - Internal traffic HTTP only (acceptable if network is isolated)
- Set
VAULT_TLS_VERIFY=falsein CheckTick.env
Option 2: End-to-End TLS (Maximum Security)
- Vault configured with TLS certificates
- CheckTick validates Vault certificate
- Set
VAULT_TLS_VERIFY=truein CheckTick.env
Generate self-signed cert for testing:
cd vault/
./generate-tls.sh
Vault config with TLS:
listener "tcp" {
address = "0.0.0.0:8200"
tls_disable = false
tls_cert_file = "/vault/config/vault-cert.pem"
tls_key_file = "/vault/config/vault-key.pem"
}
CheckTick environment variables:
VAULT_ADDR=https://vault.internal:8200
VAULT_TLS_VERIFY=true # Enable certificate verification
# For self-signed certs, provide CA:
# VAULT_CACERT=/path/to/ca.pem
Mutual TLS (mTLS) - Enterprise
For highest security, require client certificates:
listener "tcp" {
address = "0.0.0.0:8200"
tls_disable = false
tls_cert_file = "/vault/config/vault-cert.pem"
tls_key_file = "/vault/config/vault-key.pem"
tls_client_ca_file = "/vault/config/client-ca.pem"
tls_require_and_verify_client_cert = true
}
Security Best Practices
β Do
- Network: Restrict Vault to private network / IP whitelist
- TLS: Enable TLS verification (
VAULT_TLS_VERIFY=true) in production - Credentials: Store
VAULT_ROLE_IDandVAULT_SECRET_IDin secure secret manager (not .env files) - Rotation: Rotate
VAULT_SECRET_IDevery 90 days (automated via CI/CD) - Custodian Shares: Store in multiple offline locations (password managers + physical safe)
- Passphrases: Use 20+ character org owner passphrases
- Audit Logs: Review Vault audit logs weekly, integrate with SIEM
- Unseal Keys: Keep in separate physical locations
- Root Token: Revoke immediately after setup and verify revocation
- Backups: Encrypt backups at rest with GPG before cloud storage
- Testing: Test backup restoration quarterly
β Don't
- Commit
.envfile to version control - Share custodian component via email/chat
- Use weak passphrases
- Log decrypted keys in application logs
- Store all unseal keys in one location
- Use root token for application access
- Expose Vault on public internet without IP whitelist
- Disable TLS verification in production
- Store
VAULT_SECRET_IDin CI/CD logs or container environment variables (visible viadocker inspect) - Skip backup encryption
Troubleshooting
Connection Refused
- Check Vault is running:
docker ps | grep vault - Check port:
curl -v http://localhost:8200/v1/sys/health - For Northflank: Use HTTPS external URL
Vault Sealed
Vault seals on restart (security feature). Unseal with 3 of 4 keys:
vault operator unseal <key1>
vault operator unseal <key2>
vault operator unseal <key3>
Authentication Failed
- Verify
VAULT_ROLE_IDandVAULT_SECRET_ID - Check AppRole exists:
vault read auth/approle/role/checktick-app - Generate new secret_id if expired: 4.
bash
vault write -f auth/approle/role/checktick-app/secret-id
Key Not Found
Run setup script:
python vault/setup_vault.py
Production Checklist
Before going live:
Infrastructure
- Vault deployed and unsealed
- TLS enabled (or behind TLS load balancer with
VAULT_TLS_VERIFY=false) - TLS verification enabled if end-to-end TLS (
VAULT_TLS_VERIFY=true) - Network access restricted (private network or IP whitelist)
- Unseal keys stored in 4 separate physical/digital locations
- Custodian shares split and stored offline (4 locations, need 3 to recover)
Authentication & Authorization
- Root token revoked and verified with
vault token lookup <token>(should fail) - AppRole credentials secured (not in version control)
-
VAULT_ROLE_IDandVAULT_SECRET_IDin secure secret manager - VAULT_SECRET_ID rotation policy documented (90-day cycle)
- Token TTL configured: 1h access, 8h max
Audit & Monitoring
- Audit logging enabled and verified (
/vault/logs/audit.log) - Log rotation configured (logrotate or equivalent)
- SIEM integration configured (Elasticsearch/Splunk)
- Alerts configured for:
- Vault sealed events
- Failed authentication (>5/min)
- Recovery requests (>5/day)
- Excessive token generation
- P99 latency >1s
- Monitoring dashboard deployed (Grafana/Prometheus)
Backup & Recovery
- Automated backup procedure tested
- Backups encrypted with GPG before cloud storage
- S3 bucket security policy enforces encryption
- IAM permissions follow least privilege
- Backup restoration tested successfully
- Quarterly backup test scheduled
Application Integration
-
test_vault_connectionpasses - Platform recovery workflow tested with test shares
- Rate limiting enabled (recovery requests, authentication)
- Environment variables documented (not in code)
Documentation
- Disaster recovery runbook completed
- Custodian share distribution logged (who has which shares)
- Unseal key distribution logged
- On-call procedures documented
- Incident response playbook reviewed
Performance
Expected Latency
| Operation | Latency |
|---|---|
| Platform key reconstruction | ~5ms |
| Organisation key derivation | ~200ms (PBKDF2 200k iterations) |
| Team key derivation | ~200ms |
| Survey KEK encrypt/decrypt | ~10ms |
| Total unlock (team survey) | ~420ms |
Caching
The VaultClient caches authenticated connections. Token TTL: 1 hour (configurable).
Related Documentation
- Encryption Overview - How CheckTick encrypts data
- Self-Hosting Configuration - General self-hosting
- Data Governance - Compliance and retention
Getting Help
- Integration questions: support@checktick.uk
- Security reviews: security@checktick.uk
- HashiCorp Vault docs: https://developer.hashicorp.com/vault/docs