Step 1: Confirm scope and success criteria
-
List what’s in scope: Scheduler, Web UI, DB, EEM, agents, integrations (file watchers, alarms, custom scripts).
-
Define success: “All critical jobs run successfully for N cycles, monitoring works, notifications work, no missed SLAs.”
-
Define downtime tolerance and cutover window.
Step 2: Inventory your current v12.x environment
-
Capture versions: AutoSys (Scheduler, agents), OS, DB, EEM, Java/Tomcat (if used), plugins.
-
Export current configuration:
-
Instance configs, environment variables, service configs
-
Job definitions, calendars, machine definitions, global vars
-
-
Identify custom dependencies:
-
Scripts, wrappers, external libraries
-
Integrations (SMTP, SNMP, ticketing, REST calls)
-
-
Identify critical workloads:
-
SLA jobs, month-end jobs, revenue-impacting flows
-
Upstream/downstream dependencies
-
Step 3: Review compatibility & prerequisites
-
Validate OS support for v24.x across Scheduler/Web/agents.
-
Validate DB version support (upgrade DB if needed).
-
Validate EEM version compatibility (upgrade EEM if needed).
-
Decide your security target:
-
Keep current mode initially, or
-
Implement TLS/mTLS + SAML2 as part of upgrade (recommended but can be phased)
-
Step 4: Choose migration method (recommended: parallel)
Option A — Parallel build (best practice)
-
Build a new v24.x environment side-by-side (new servers/VMs).
-
Migrate and test in isolation.
-
Cut over production workloads at the end.
Option B — In-place upgrade (only if constraints)
-
Upgrade components on the same servers.
-
Higher risk; rollback must be very solid.
Step 5: Prepare non-prod (DEV/UAT) and run a rehearsal
-
Clone or restore a copy of the v12 DB into non-prod.
-
Install AutoSys v24.x in non-prod.
-
Run DB upgrade scripts in non-prod (per vendor procedure).
-
Import/validate job definitions, calendars, machines, permissions.
-
Run sample critical workflows and validate results.
Step 6: Prepare production foundations
-
Provision infrastructure for v24.x (Scheduler/Web/DB connectivity).
-
Ensure DNS, firewall ports, routing, and NTP time sync.
-
Prepare certificates if enabling TLS/mTLS.
-
Prepare SSO setup if enabling SAML2 (IdP metadata, callback URLs, test users).
-
Create a detailed cutover runbook and rollback runbook.
Step 7: Take backups and freeze changes (production)
-
Announce a change freeze window (no job edits during cutover prep).
-
Take full backups:
-
AutoSys DB backup (validated restore)
-
$AUTOSYS directory (and config files)
-
EEM config backup
-
Custom scripts repos + server snapshots if possible
-
Step 8: Install AutoSys v24.x components (production)
-
Install Scheduler components.
-
Install/configure Web UI (Tomcat/JRE per v24 requirements).
-
Connect to DB and run DB upgrade steps (as per release docs).
-
Validate base services:
-
Scheduler starts cleanly
-
CLI commands work
-
Web UI login works (local auth/EEM/SSO depending on plan)
-
Step 9: Migrate/upgrade agents (phased)
-
Start with a pilot set of agents (non-critical hosts).
-
Validate:
-
Heartbeat/communication
-
Job execution
-
Exit codes and stdout/stderr capture
-
-
Expand rollout to critical tiers in waves.
-
Keep a clear backout plan per wave.
Step 10: Enable/transition security features (optional phased approach)
Recommended phasing
-
Go-live first with minimum change (if needed).
-
Then enable:
-
TLS/mTLS between Scheduler and agents
-
SAML2 SSO for UI/Web Services
-
-
Validate again after security changes (auth + job execution).
Step 11: Full validation (pre-cutover + post-cutover)
-
Run a “golden batch” list:
-
Critical boxes and downstream jobs
-
Calendars and date conditions
-
Event-based triggers
-
Alerts/notifications
-
-
Validate monitoring:
-
New Monitor views
-
Alarms visibility
-
Logs (search, comparison if used)
-
-
Validate integrations:
-
Email/notifications
-
Tickets/incidents
-
Any API consumers
-
Step 12: Cutover execution
-
Stop new scheduling on v12 (quiesce) at agreed time.
-
Let running jobs finish or decide controlled stop.
-
Switch control to v24:
-
Point clients/users to new UI
-
Ensure scheduling starts from v24
-
-
Closely monitor for at least 1–2 full business cycles.
Step 13: Hypercare and stabilization (first 1–2 weeks)
-
Daily health checks: service uptime, job success rates, SLA misses.
-
Fix: agent edge cases, permissions, missing libraries, path issues.
-
Capture lessons learned and finalize documentation.
Step 14: Decommission old v12 (after sign-off)
-
Keep v12 in read-only/standby for an agreed period.
-
Archive configs, DB backups, and audit evidence.
-
Decommission servers and revoke old credentials/certs.
No comments:
Post a Comment
Note: only a member of this blog may post a comment.