Brett Michaelis

brett.michaelis@gmail.com 801-310-2818 Orem, UT 84057 linkedin.com/in/brettmichaelis
Summary
Senior Site Reliability Engineer with 10+ years architecting and operating hybrid cloud and bare metal infrastructure at scale. Deep experience building Kubernetes platforms across AWS, GCP, and private data centers, with hands-on GPU/CUDA workload management including bare metal server provisioning via PXE boot and Saltstack. Track record of direct collaboration with ML engineers and data scientists to build the infrastructure foundation that accelerates AI development. Strong observability stack (Prometheus, Grafana, Mimir), IaC discipline (Terraform), and an automation-first approach to platform reliability and self-service.
Core Skills
Experience
Operations Engineer
Smarty.com | Orem, UT
  • Leading migration of a legacy Grafana observability platform to a GitOps-managed deployment, auditing and rationalizing all alerting across production services as part of the initiative.
  • Operate observability stack using Prometheus, Grafana, Mimir, and Alloy for metrics collection, long-term storage, and dashboarding.
  • Implement canary deployments via Nomad for progressive production rollouts, enabling confident releases with automated rollback.
  • Driving company-wide migration from Bitbucket to GitHub, including full re-implementation of all CI/CD workflows in GitHub Actions.
  • Manage multi-cloud deployments (Tier.Net, UpCloud, GCP, AWS, Hetzner) with Terraform, Nomad, and Bitbucket Pipelines, improving uptime and deployment velocity.
  • Automate repetitive workflows with Bash and Go, reducing manual toil across operations.
Senior DevOps Engineer
Five9.com
  • Orchestrated multi-cloud Kubernetes deployments on GCP using Helm and Terraform to support high-availability SaaS workloads at scale.
  • Built self-service deployment tooling and automation including ArgoCD-based GitOps workflows, enabling engineering teams to provision and release independently.
  • Partnered with product and engineering teams to define and track SLOs/SLIs, supporting customer-facing uptime goals.
  • Streamlined incident response with Five Whys, improving on-call processes and reliability through blameless postmortems.
Software Engineer / DevOps Engineer
Vivint SmartHome
  • Provisioned and configured bare metal servers at scale using PXE boot and Saltstack, building GPU/CUDA compute capacity for ML model training workloads.
  • Managed Ceph storage clusters to support high-throughput data access across a 1.5 PB ML data lake on GCP.
  • Collaborated directly with ML engineers and data scientists to translate research requirements into scalable, production-ready infrastructure.
  • Developed and deployed Golang-based microservices, optimizing performance and reducing latency.
  • Implemented TICK stack for observability and system health monitoring across ML and application workloads.
Director, IT & Software Development
Unicity International
  • Led global infrastructure modernization, migrating legacy apps to containerized, cloud-native environments.
  • Standardized multi-cloud deployments (AWS EC2, S3) to improve scalability and global availability.
  • Introduced reliability practices, including error budgeting and deployment automation.
Assistant Director, Web Development
Utah Valley University
  • Directed university-wide web development projects, improving service reliability and scalability for mission-critical systems.
Counterintelligence Agent
U.S. Army – Utah National Guard
  • Conducted secure intelligence operations, leveraging structured incident response and AAR methods.
Education
Bachelor of Science: Information Systems
Utah Valley University | Orem, UT