Retail SaaS platform; multi-region, multi-tenant cloud infrastructure (OCI)
• Provisioned high-availability, multi-region cloud infrastructure across 3 data centres (VPC/CIDR, Kubernetes, load
balancers, DNS/TLS, IAM) using OCI-CLI and automated pipelines — ensuring zero manual intervention during
rollouts.
• Executed a zero-downtime SSO/identity-provider endpoint migration across all regions — ensuring seamless
authentication cutover with zero service interruption.
• Automated credential lifecycle management for 200+ customer environments using OCI Vault, Bash, and Chef —
covering identity and service account password rotation, and secrets sync — achieving 100% compliance with zero
expiry incidents (Rockstar Award 2023).
• Reduced MTTR for Log4j vulnerability remediation from days to minutes by embedding automated security fixes
directly into the credential rotation pipeline.
• Built and shipped RSMM (Retail Scheduled Maintenance Monitor) — an APEX-based outage monitoring platform
deployed to every customer environment — replacing manual host-by-host execution with automated scheduling,
cuting engineer effort per maintenance activity by ~90%.
• Engineered an internal SRE operations platform in Oracle APEX (REST APIs, Shell, SQL/PL/SQL, Object Storage) —
automating DR reboots for 1, 000+ VMs monthly, alert suppression, inventory management, and database cloning
(Retail Innovator Award 2024).
• Integrated OCI Monitoring and Prometheus dashboards into the operations platform for real-time environment
level observability — eliminating manual console access during production outages.
• Automated disaster recovery VM patching, reboots, and switchover/switchback workflows using OCI APIs and
Rundeck runbooks — validating SLO readiness through integration tests and auto-patching verification.
• Migrated self-managed databases to Autonomous DB and managed DBaaS with zero data loss; engineered cross
region migration tooling for data warehouses and automated database cloning via SQL and Object Storage.
• Automated OS upgrades (OL7 → OL8) across all customer environments using Chef, Bash, and Python — with full
CPU regression testing — and built containerised patch pipelines (Docker/Podman) covering 100% of quarterly
patch releases.
• Executed VM instance migrations with automated right-sizing and pre/post validation; built a cloud-CLI tool to
detect underutilised instances and drove infrastructure cost optimisation through CPU baselining and capacity
tuning.
• Automated TLS/SSL certificate rotation for all load balancers via OCI APIs — achieving zero manual renewals and
100% uptime compliance across all endpoints.
• Provisioned and patched 13+ SaaS platform versions with zero regression; owned end-to-end deployment
readiness across regions — covering DB provisioning, monitoring, backups, and proxies.
• Delivered on-call production support — performing root cause analysis (RCA) and resolving automation failures
within live outage windows — while owning the defect backlog and maintaining SLA compliance across all
environments.
- Company industry:
- IT Services