Security Architecture¶

Comprehensive security implementation for enterprise AI applications with PII protection, secrets detection, and compliance logging.

Overview¶

The Empathy Framework implements a defense-in-depth security model with multiple layers of protection:

Input Sanitization - PII scrubbing before LLM processing
Secrets Detection - Automatic detection of API keys, passwords, tokens
Audit Logging - JSONL audit trail for compliance (HIPAA, GDPR, SOC2)
Encryption at Rest - AES-256-GCM for sensitive data
Access Controls - Role-based access control (RBAC) for wizards

Architecture Diagram¶

┌─────────────────────────────────────────────────────────────┐
│                      User Input                              │
└─────────────────────┬───────────────────────────────────────┘
                      │
                      ▼
┌─────────────────────────────────────────────────────────────┐
│              1. PII Scrubber                                 │
│  ┌─────────────────────────────────────────────────────┐    │
│  │ • SSN, Credit Cards, Phone Numbers                  │    │
│  │ • Healthcare: MRN, Patient ID, DOB, Insurance       │    │
│  │ • Financial: Account Numbers, Routing Numbers       │    │
│  └─────────────────────────────────────────────────────┘    │
└─────────────────────┬───────────────────────────────────────┘
                      │ (Scrubbed Text)
                      ▼
┌─────────────────────────────────────────────────────────────┐
│              2. Secrets Detector                             │
│  ┌─────────────────────────────────────────────────────┐    │
│  │ • API Keys (AWS, Stripe, GitHub, OpenAI)            │    │
│  │ • OAuth Tokens, JWT                                 │    │
│  │ • Private Keys (RSA, SSH)                           │    │
│  │ • Database Connection Strings                       │    │
│  └─────────────────────────────────────────────────────┘    │
└─────────────────────┬───────────────────────────────────────┘
                      │ (Validated Text)
                      ▼
┌─────────────────────────────────────────────────────────────┐
│              3. Audit Logger                                 │
│  ┌─────────────────────────────────────────────────────┐    │
│  │ • User ID, Timestamp, Action                        │    │
│  │ • PII Items Removed, Secrets Detected               │    │
│  │ • JSONL Format for SIEM Integration                 │    │
│  └─────────────────────────────────────────────────────┘    │
└─────────────────────┬───────────────────────────────────────┘
                      │ (Logged)
                      ▼
┌─────────────────────────────────────────────────────────────┐
│              4. LLM Processing                               │
│  ┌─────────────────────────────────────────────────────┐    │
│  │ • OpenAI, Anthropic, Google, etc.                   │    │
│  │ • Receives ONLY scrubbed, validated text            │    │
│  │ • No PII or secrets sent to external APIs           │    │
│  └─────────────────────────────────────────────────────┘    │
└─────────────────────┬───────────────────────────────────────┘
                      │ (Response)
                      ▼
┌─────────────────────────────────────────────────────────────┐
│                   User Response                              │
└─────────────────────────────────────────────────────────────┘

PII Scrubbing¶

Standard PII Patterns¶

Automatically detected and removed:

Type	Pattern	Example
SSN	`\b\d{3}-\d{2}-\d{4}\b`	`123-45-6789`
Credit Card	Luhn algorithm	`4111-1111-1111-1111`
Phone (US)	`\b\d{3}-\d{3}-\d{4}\b`	`555-123-4567`
Email	RFC 5322	`user@example.com`
IP Address	IPv4/IPv6	`192.168.1.1`

Healthcare-Specific PHI¶

For Healthcare Wizards (HIPAA compliance):

Type	Pattern	Example
MRN	`\bMRN:?\s*\d{6,10}\b`	`MRN: 123456`
Patient ID	`\bPT\d{6,10}\b`	`PT123456`
DOB	`\b\d{1,2}/\d{1,2}/\d{4}\b`	`01/15/1980`
Insurance ID	`\bINS\d{8,12}\b`	`INS12345678`
Provider NPI	`\b\d{10}\b` (validated)	`1234567890`

Implementation Example¶

from empathy_llm_toolkit import EmpathyLLM
from empathy_llm_toolkit.security import PIIScrubber

# Initialize with security enabled
llm = EmpathyLLM(
    provider="anthropic",
    api_key=os.getenv("ANTHROPIC_API_KEY"),
    enable_security=True  # Enables PII scrubbing
)

# Example with PHI
user_input = """
Patient John Doe (SSN: 123-45-6789, MRN: 987654)
called from 555-123-4567 about diabetes medication.
"""

# Process with automatic PII scrubbing
response = await llm.interact(
    user_id="doctor@hospital.com",
    user_input=user_input,
    context={"classification": "SENSITIVE"}
)

# PHI is automatically removed before sending to LLM
# Audit log records: ['ssn', 'mrn', 'phone', 'name']

Secrets Detection¶

Supported Secret Types¶

Type	Detection Method	Example Pattern
AWS Access Key	`AKIA[0-9A-Z]{16}`	`AKIAIOSFODNN7EXAMPLE`
Stripe API Key	`sk_live_[0-9a-zA-Z]{24}`	`sk_live_...`
GitHub Token	`ghp_[0-9a-zA-Z]{36}`	`ghp_...`
OpenAI API Key	`sk-[0-9a-zA-Z]{48}`	`sk-...`
JWT	Base64 + signature validation	`eyJ...`
Private Keys	`-----BEGIN PRIVATE KEY-----`	RSA/SSH keys

Implementation Example¶

from empathy_llm_toolkit.security import SecretsDetector

detector = SecretsDetector()

code_snippet = """
import openai
openai.api_key = "sk-XXXXXXXXXXXXXXXXXXXXXXXXXXXX"
"""

# Detect secrets
detections = detector.detect(code_snippet)

for secret in detections:
    print(f"⚠️ {secret.secret_type}: Line {secret.line}")
    print(f"   Severity: {secret.severity}")
    print(f"   Recommendation: {secret.remediation}")

# Output:
# ⚠️ OPENAI_API_KEY: Line 2
#    Severity: HIGH
#    Recommendation: Remove from code, use environment variables

Audit Logging¶

Log Format (JSONL)¶

{
  "timestamp": "2025-11-25T10:30:00Z",
  "event_id": "evt_abc123",
  "user_id": "doctor@hospital.com",
  "action": "llm_interaction",
  "classification": "SENSITIVE",
  "security": {
    "pii_scrubbed": 4,
    "pii_types": ["ssn", "mrn", "phone", "name"],
    "secrets_detected": 0,
    "encryption_used": true
  },
  "performance": {
    "duration_ms": 1234,
    "tokens_used": 500
  },
  "compliance": {
    "hipaa_compliant": true,
    "retention_days": 90
  }
}

Compliance Requirements¶

Regulation	Retention	Encryption	Audit Trail
HIPAA	90 days minimum	AES-256-GCM required	All PHI access
GDPR	Data subject request	At rest + in transit	All processing
SOC2	180 days	Recommended	All access

Implementation Example¶

from empathy_llm_toolkit.security import AuditLogger

logger = AuditLogger(
    log_file="/var/log/empathy/audit.jsonl",
    retention_days=90  # HIPAA minimum
)

# Automatically logs all interactions when security is enabled
logger.log_interaction(
    user_id="doctor@hospital.com",
    action="view_patient_record",
    classification="SENSITIVE",
    pii_scrubbed=4,
    secrets_detected=0
)

# Query audit logs
logs = logger.query(
    user_id="doctor@hospital.com",
    start_date="2025-11-01",
    end_date="2025-11-30"
)

print(f"Total interactions: {len(logs)}")
print(f"Total PII scrubbed: {sum(log['security']['pii_scrubbed'] for log in logs)}")

Encryption¶

Data at Rest¶

AES-256-GCM encryption for sensitive data:

from empathy_llm_toolkit.security import encrypt_sensitive_data

# Encrypt PHI before storing
encrypted_data = encrypt_sensitive_data(
    data={"patient_id": "PT123456", "diagnosis": "Diabetes Type 2"},
    encryption_key=os.getenv("ENCRYPTION_KEY"),  # 32-byte key
    classification="SENSITIVE"
)

# Store encrypted data
database.store(encrypted_data)

# Decrypt when needed (with authorization)
decrypted = decrypt_sensitive_data(
    encrypted_data,
    encryption_key=os.getenv("ENCRYPTION_KEY")
)

Data in Transit¶

All API communications use TLS 1.2+:

llm = EmpathyLLM(
    provider="anthropic",
    api_key=os.getenv("ANTHROPIC_API_KEY"),
    enable_security=True,
    tls_verify=True  # Enforce TLS certificate validation
)

Access Controls¶

Role-Based Access Control (RBAC)¶

from empathy_llm_toolkit.wizards import HealthcareWizard
from empathy_llm_toolkit.security import AccessControl

# Define roles
access_control = AccessControl()
access_control.add_role("physician", permissions=["read_phi", "write_phi"])
access_control.add_role("nurse", permissions=["read_phi"])
access_control.add_role("admin", permissions=["read_phi", "write_phi", "view_audit_logs"])

# Check permissions before granting access
if access_control.has_permission(user_role="nurse", permission="read_phi"):
    wizard = HealthcareWizard(llm)
    result = await wizard.process(
        user_input="Patient handoff for bed 312",
        user_id="nurse@hospital.com"
    )

Best Practices¶

✅ Do¶

Always enable security for production: enable_security=True
Use environment variables for API keys and encryption keys
Review audit logs daily for suspicious activity
Implement access controls for sensitive operations
Encrypt data at rest for SENSITIVE classification
Test PII scrubbing before production deployment
Sign BAA agreements with LLM providers (for HIPAA)

❌ Don't¶

Never disable security in production
Never commit secrets to version control
Never skip encryption for healthcare data
Never ignore audit log alerts
Never share encryption keys across environments
Never bypass access controls for convenience

Security Testing¶

PII Scrubbing Test¶

import pytest
from empathy_llm_toolkit.security import PIIScrubber

def test_pii_scrubbing():
    scrubber = PIIScrubber()

    text = "Patient SSN 123-45-6789 called from 555-123-4567"
    scrubbed = scrubber.scrub(text)

    # Verify PII removed
    assert "123-45-6789" not in scrubbed
    assert "555-123-4567" not in scrubbed

    # Verify scrubbed items tracked
    items = scrubber.get_scrubbed_items(text)
    assert len(items) == 2
    assert any(item['type'] == 'ssn' for item in items)

Secrets Detection Test¶

def test_secrets_detection():
    detector = SecretsDetector()

    code = 'api_key = "sk_live_XXXXXXXXXXXXXXXXXXXXXXXXXXXX"'
    detections = detector.detect(code)

    assert len(detections) > 0
    assert detections[0].secret_type == SecretType.STRIPE_KEY

Security Architecture¶

Overview¶

Architecture Diagram¶

PII Scrubbing¶

Standard PII Patterns¶

Healthcare-Specific PHI¶

Implementation Example¶

Secrets Detection¶

Supported Secret Types¶

Implementation Example¶

Audit Logging¶

Log Format (JSONL)¶

Compliance Requirements¶

Implementation Example¶

Encryption¶

Data at Rest¶

Data in Transit¶

Access Controls¶

Role-Based Access Control (RBAC)¶

Best Practices¶

✅ Do¶

❌ Don't¶

Security Testing¶

PII Scrubbing Test¶

Secrets Detection Test¶

See Also¶