22:40 Nia: Okay Miles, let's get really practical here. Our listeners are senior SREs who are probably thinking, "This all sounds fascinating, but what do I actually do on Monday morning?" How does someone start making this transition?
2:11 Miles: Great question. The first thing I'd recommend is what I call the "AI literacy audit." You need to honestly assess your current understanding of machine learning concepts and identify the biggest gaps. You don't need to become a data scientist, but you do need to understand how models work well enough to reason about their failure modes.
23:11 Nia: What would that look like in practice? Are we talking about taking online courses, or is there a more hands-on approach?
23:17 Miles: I'd recommend starting with your own organization's AI systems, if you have any. Shadow the ML engineers, understand the deployment pipelines, learn about the monitoring that's already in place. There's no substitute for getting your hands dirty with real systems.
11:27 Nia: That makes sense. Learn by doing, rather than learning in isolation. What if your organization doesn't have AI systems yet?
23:37 Miles: Then you have a huge opportunity to get ahead of the curve. Start experimenting with simple AI tools and services. Set up a small language model, play with some computer vision APIs, understand how these systems behave when they work and when they don't.
23:51 Nia: So build your intuition through experimentation. What about the technical skills? What should people be focusing on?
23:57 Miles: Python is essential—most AI tooling is Python-based. You'll also want to get comfortable with containerization and orchestration, because AI workloads often have complex dependencies and resource requirements. Kubernetes knowledge becomes even more valuable in an AI world.
24:12 Nia: What about monitoring and observability tools? Are there specific platforms that SREs should be learning?
24:17 Miles: The landscape is still evolving rapidly, but I'd focus on understanding the concepts rather than specific tools. Learn about model performance metrics, data drift detection, and bias monitoring. The tools will change, but the underlying principles are becoming standardized.
24:32 Nia: That's smart advice. Focus on principles over tools. What about team dynamics? How should senior SREs be thinking about collaboration with data scientists and ML engineers?
10:59 Miles: This is crucial. You need to build bridges between the traditional ops world and the ML world. Start by learning their language—understand terms like "features," "training," and "inference." But also help them understand operational concepts like SLOs, error budgets, and incident response.
24:58 Nia: So it's a two-way education process. You're learning their domain while helping them understand yours.
1:09 Miles: Exactly. And don't underestimate the value of your existing SRE experience. Your understanding of distributed systems, failure modes, and operational best practices is incredibly valuable to ML teams who might not have that background.
25:16 Nia: What about career positioning? How should senior SREs be thinking about their career trajectory in this AI-driven world?
25:23 Miles: I see several paths emerging. Some SREs are becoming AI platform engineers, building the infrastructure that supports ML workloads. Others are becoming AI safety engineers, focusing specifically on the reliability and safety of AI systems. And some are becoming hybrid SRE/ML engineers who can work across both domains.
25:42 Nia: Those all sound like valuable and well-compensated roles. What skills would be most important for each path?
25:47 Miles: For AI platform engineering, focus on infrastructure automation, containerization, and resource management. For AI safety engineering, learn about bias detection, model validation, and ethical AI principles. For the hybrid role, you need a bit of everything—enough ML knowledge to debug model issues and enough ops knowledge to keep systems running.
26:06 Nia: What about staying current? This field seems to be moving incredibly fast.
26:10 Miles: It is, but that's actually an advantage for experienced SREs. You're already used to continuous learning and adapting to new technologies. The key is to focus on fundamentals rather than chasing every new trend.
26:21 Nia: Can you give some specific recommendations for staying current?
26:23 Miles: Follow the research, but focus on papers that have operational implications. Join communities where SREs and ML engineers intersect—there are some great Slack groups and forums emerging. And most importantly, experiment. Set up your own AI systems and break them in interesting ways.
26:39 Nia: That last point is really important. You learn more from breaking things than from reading about them.
7:53 Miles: Absolutely. And don't be afraid to fail. The AI field is so new that everyone is figuring it out as they go. Your SRE experience in dealing with complex, distributed systems actually gives you a huge advantage in understanding how AI systems can fail.
26:58 Nia: What about organizations that are hesitant to invest in AI reliability? How should SREs make the business case?
27:03 Miles: Focus on risk and compliance. AI systems that fail can create legal liability, regulatory issues, and massive reputational damage. The cost of proper AI reliability engineering is tiny compared to the potential cost of AI system failures.
27:17 Nia: So it's about framing it as risk mitigation rather than just operational efficiency.
1:09 Miles: Exactly. And use concrete examples from other industries. Point to cases where AI system failures have caused real business damage. The business case for AI reliability is becoming clearer every day.
27:32 Nia: This has been incredibly helpful. Any final thoughts for senior SREs who are feeling overwhelmed by all of this?
27:38 Miles: Remember that you're not starting from zero. Your existing skills in systems thinking, troubleshooting, and operational excellence are incredibly valuable in an AI world. You're not learning a completely new field—you're extending your existing expertise into a new domain.
27:52 Nia: And the demand for people who can bridge these worlds is only going to grow.
7:53 Miles: Absolutely. Organizations desperately need people who can make AI systems reliable and safe. If you start building these skills now, you'll be incredibly well-positioned for the future.