BeFreed
    Categories>Technology>Build Production AI Workflows with TypeScript and Temporal

    Build Production AI Workflows with TypeScript and Temporal

    25 min
    |
    |
    8 apr 2026
    TechnologyAIBusiness

    Learn to build production AI workflows using TypeScript and Temporal. Master prompt versioning, LLM-as-judge evaluators, and secure, scalable orchestration.

    Build Production AI Workflows with TypeScript and Temporal

    Miglior citazione da Build Production AI Workflows with TypeScript and Temporal

    “

    The number one mistake that causes systems to fail in production is mixing non-deterministic I/O, like LLM calls, directly into your main logic. You must treat orchestration like a distributed systems problem, keeping your pure orchestration logic in Workflows while pushing all messy I/O into a dedicated Step layer.

    ”

    Questa lezione audio è stata creata da un membro della comunità BeFreed

    Domanda di input

    Build a TypeScript framework for production AI workflows. Separate Workflows (pure orchestration) from Steps (the I/O layer — LLM calls, HTTP, DB queries — each result cached for replay). Store prompts as version-controlled .prompt files with YAML and Liquid templates; swap providers in one line. Add LLM-as-judge Evaluators for quality scoring. Scale via Temporal for retries, replay, and parallel execution. Encrypt secrets with AES-256-GCM. Stack: TypeScript, Temporal, Vercel AI SDK, Zod.

    Voci dei presentatori
    Lenaplay
    Milesplay
    Stile di apprendimento
    Approfondito
    Fonti di conoscenza
    Hands-on Machine Learning With Scikit-learn And Tensorflow
    Artificial Intelligence and Generative AI for Beginners
    Make your own neural network
    Artificial Intelligence and Machine Learning for Business
    Developing Backbone.js Applications
    ChatGPT for Dummies

    Domande frequenti

    Scopri di più

    Boost Productivity with AI

    Boost Productivity with AI

    PIANO DI APPRENDIMENTO

    Boost Productivity with AI

    In an era of rapid automation, mastering AI is essential for staying competitive and efficient. This plan is designed for professionals and business leaders who want to move beyond basic tools to build autonomous agents and scalable digital workflows.

    3 h 45 m•4 Sezioni
    Deep Dive: AI Architecture & Model Training

    Deep Dive: AI Architecture & Model Training

    PIANO DI APPRENDIMENTO

    Deep Dive: AI Architecture & Model Training

    This comprehensive path is essential for engineers and data scientists looking to move beyond basic scripts into architectural design. It provides the technical depth needed to build, optimize, and scale robust AI systems in professional environments.

    2 h 43 m•4 Sezioni
    Build AI Team with Openclaw and AI

    Build AI Team with Openclaw and AI

    PIANO DI APPRENDIMENTO

    Build AI Team with Openclaw and AI

    As organizations pivot toward automation, the ability to integrate agentic workflows with human leadership is becoming a critical competitive advantage. This plan is designed for technical leaders and managers who need to master OpenClaw implementation and modern team scaling strategies.

    4 h 8 m•4 Sezioni
    AI agent for software development

    AI agent for software development

    PIANO DI APPRENDIMENTO

    AI agent for software development

    As software engineering shifts toward automation, mastering AI agents is becoming a critical skill for modern developers. This plan is ideal for programmers looking to transition from traditional development to building autonomous, intelligent systems using Python and neural networks.

    3 h 9 m•4 Sezioni
    AI Profit, Use Cases & Success Psychology

    AI Profit, Use Cases & Success Psychology

    PIANO DI APPRENDIMENTO

    AI Profit, Use Cases & Success Psychology

    This plan bridges the gap between technical AI implementation and the mental resilience required to lead in a digital economy. It is designed for entrepreneurs and leaders who want to turn automation into a sustainable profit engine.

    3 h 29 m•4 Sezioni
    Master AI for work efficiency

    Master AI for work efficiency

    PIANO DI APPRENDIMENTO

    Master AI for work efficiency

    As AI rapidly transforms every industry, professionals who can effectively leverage these tools gain a significant competitive advantage in productivity and innovation. This learning plan is ideal for knowledge workers, managers, and anyone looking to work smarter by harnessing AI's capabilities while future-proofing their career in an increasingly automated workplace.

    3 h 5 m•4 Sezioni
    The AI Tools Shaping How We Work in 2025
    BLOG

    The AI Tools Shaping How We Work in 2025

    Discover how AI is quietly transforming work in 2025—powering smarter learning, faster creation, and real-world productivity through tools like BeFreed, Runway, and Tenspect.

    BeFreed Team

    How to Use AI in Your Work in 2025: Practical, Not Hype
    BLOG

    How to Use AI in Your Work in 2025: Practical, Not Hype

    Discover practical, proven ways to use AI in your daily work in 2025—from learning faster and automating tasks to building smarter products and collaborating more effectively.

    BeFreed Team

    Creato da alumni della Columbia University a San Francisco

    BeFreed Riunisce Una Community Globale Di 1,000,000 Menti Curiose
    Scopri di piu su come si parla di BeFreed nel web

    "Instead of endless scrolling, I just hit play on BeFreed. It saves me so much time."

    @Moemenn
    platform
    star
    star
    star
    star
    star

    "I never knew where to start with nonfiction—BeFreed’s book lists turned into podcasts gave me a clear path."

    @Chloe, Solo founder, LA
    platform
    comments
    12
    likes
    117

    "Perfect balance between learning and entertainment. Finished ‘Thinking, Fast and Slow’ on my commute this week."

    @Raaaaaachelw
    platform
    star
    star
    star
    star
    star

    "Crazy how much I learned while walking the dog. BeFreed = small habits → big gains."

    @Matt, YC alum
    platform
    comments
    12
    likes
    108

    "Reading used to feel like a chore. Now it’s just part of my lifestyle."

    @Erin, Investment Banking Associate , NYC
    platform
    comments
    254
    likes
    17

    "Feels effortless compared to reading. I’ve finished 6 books this month already."

    @djmikemoore
    platform
    star
    star
    star
    star
    star

    "BeFreed turned my guilty doomscrolling into something that feels productive and inspiring."

    @Pitiful
    platform
    comments
    96
    likes
    4.5K

    "BeFreed turned my commute into learning time. 20-min podcasts are perfect for finishing books I never had time for."

    @SofiaP
    platform
    star
    star
    star
    star
    star

    "BeFreed replaced my podcast queue. Imagine Spotify for books — that’s it. 🙌"

    @Jaded_Falcon
    platform
    comments
    201
    thumbsUp
    16

    "It is great for me to learn something from the book without reading it."

    @OojasSalunke
    platform
    star
    star
    star
    star
    star

    "The themed book list podcasts help me connect ideas across authors—like a guided audio journey."

    @Leo, Law Student, UPenn
    platform
    comments
    37
    likes
    483

    "Makes me feel smarter every time before going to work"

    @Cashflowbubu
    platform
    star
    star
    star
    star
    star

    Creato da alumni della Columbia University a San Francisco

    BeFreed Riunisce Una Community Globale Di 1,000,000 Menti Curiose
    Scopri di piu su come si parla di BeFreed nel web

    "Instead of endless scrolling, I just hit play on BeFreed. It saves me so much time."

    @Moemenn
    platform
    star
    star
    star
    star
    star

    "I never knew where to start with nonfiction—BeFreed’s book lists turned into podcasts gave me a clear path."

    @Chloe, Solo founder, LA
    platform
    comments
    12
    likes
    117

    "Perfect balance between learning and entertainment. Finished ‘Thinking, Fast and Slow’ on my commute this week."

    @Raaaaaachelw
    platform
    star
    star
    star
    star
    star

    "Crazy how much I learned while walking the dog. BeFreed = small habits → big gains."

    @Matt, YC alum
    platform
    comments
    12
    likes
    108

    "Reading used to feel like a chore. Now it’s just part of my lifestyle."

    @Erin, Investment Banking Associate , NYC
    platform
    comments
    254
    likes
    17

    "Feels effortless compared to reading. I’ve finished 6 books this month already."

    @djmikemoore
    platform
    star
    star
    star
    star
    star

    "BeFreed turned my guilty doomscrolling into something that feels productive and inspiring."

    @Pitiful
    platform
    comments
    96
    likes
    4.5K

    "BeFreed turned my commute into learning time. 20-min podcasts are perfect for finishing books I never had time for."

    @SofiaP
    platform
    star
    star
    star
    star
    star

    "BeFreed replaced my podcast queue. Imagine Spotify for books — that’s it. 🙌"

    @Jaded_Falcon
    platform
    comments
    201
    thumbsUp
    16

    "It is great for me to learn something from the book without reading it."

    @OojasSalunke
    platform
    star
    star
    star
    star
    star

    "The themed book list podcasts help me connect ideas across authors—like a guided audio journey."

    @Leo, Law Student, UPenn
    platform
    comments
    37
    likes
    483

    "Makes me feel smarter every time before going to work"

    @Cashflowbubu
    platform
    star
    star
    star
    star
    star

    "Instead of endless scrolling, I just hit play on BeFreed. It saves me so much time."

    @Moemenn
    platform
    star
    star
    star
    star
    star

    "I never knew where to start with nonfiction—BeFreed’s book lists turned into podcasts gave me a clear path."

    @Chloe, Solo founder, LA
    platform
    comments
    12
    likes
    117

    "Perfect balance between learning and entertainment. Finished ‘Thinking, Fast and Slow’ on my commute this week."

    @Raaaaaachelw
    platform
    star
    star
    star
    star
    star

    "Crazy how much I learned while walking the dog. BeFreed = small habits → big gains."

    @Matt, YC alum
    platform
    comments
    12
    likes
    108

    "Reading used to feel like a chore. Now it’s just part of my lifestyle."

    @Erin, Investment Banking Associate , NYC
    platform
    comments
    254
    likes
    17

    "Feels effortless compared to reading. I’ve finished 6 books this month already."

    @djmikemoore
    platform
    star
    star
    star
    star
    star

    "BeFreed turned my guilty doomscrolling into something that feels productive and inspiring."

    @Pitiful
    platform
    comments
    96
    likes
    4.5K

    "BeFreed turned my commute into learning time. 20-min podcasts are perfect for finishing books I never had time for."

    @SofiaP
    platform
    star
    star
    star
    star
    star

    "BeFreed replaced my podcast queue. Imagine Spotify for books — that’s it. 🙌"

    @Jaded_Falcon
    platform
    comments
    201
    thumbsUp
    16

    "It is great for me to learn something from the book without reading it."

    @OojasSalunke
    platform
    star
    star
    star
    star
    star

    "The themed book list podcasts help me connect ideas across authors—like a guided audio journey."

    @Leo, Law Student, UPenn
    platform
    comments
    37
    likes
    483

    "Makes me feel smarter every time before going to work"

    @Cashflowbubu
    platform
    star
    star
    star
    star
    star

    "Instead of endless scrolling, I just hit play on BeFreed. It saves me so much time."

    @Moemenn
    platform
    star
    star
    star
    star
    star

    "I never knew where to start with nonfiction—BeFreed’s book lists turned into podcasts gave me a clear path."

    @Chloe, Solo founder, LA
    platform
    comments
    12
    likes
    117

    "Perfect balance between learning and entertainment. Finished ‘Thinking, Fast and Slow’ on my commute this week."

    @Raaaaaachelw
    platform
    star
    star
    star
    star
    star

    "Crazy how much I learned while walking the dog. BeFreed = small habits → big gains."

    @Matt, YC alum
    platform
    comments
    12
    likes
    108

    "Reading used to feel like a chore. Now it’s just part of my lifestyle."

    @Erin, Investment Banking Associate , NYC
    platform
    comments
    254
    likes
    17

    "Feels effortless compared to reading. I’ve finished 6 books this month already."

    @djmikemoore
    platform
    star
    star
    star
    star
    star

    "BeFreed turned my guilty doomscrolling into something that feels productive and inspiring."

    @Pitiful
    platform
    comments
    96
    likes
    4.5K

    "BeFreed turned my commute into learning time. 20-min podcasts are perfect for finishing books I never had time for."

    @SofiaP
    platform
    star
    star
    star
    star
    star

    "BeFreed replaced my podcast queue. Imagine Spotify for books — that’s it. 🙌"

    @Jaded_Falcon
    platform
    comments
    201
    thumbsUp
    16

    "It is great for me to learn something from the book without reading it."

    @OojasSalunke
    platform
    star
    star
    star
    star
    star

    "The themed book list podcasts help me connect ideas across authors—like a guided audio journey."

    @Leo, Law Student, UPenn
    platform
    comments
    37
    likes
    483

    "Makes me feel smarter every time before going to work"

    @Cashflowbubu
    platform
    star
    star
    star
    star
    star
    1.5K Ratings4.7
    Inizia il tuo percorso di apprendimento, ora
    BeFreed App
    BeFreed

    Impara qualsiasi cosa, personalizzato

    DiscordLinkedIn
    Riassunti di libri in evidenza
    Crucial ConversationsThe Perfect MarriageInto the WildNever Split the DifferenceAttachedGood to GreatSay Nothing
    Categorie di tendenza
    Self HelpCommunication SkillRelationshipMindfulnessPhilosophyInspirationProductivity
    Liste di lettura delle celebrita
    Elon MuskCharlie KirkBill GatesSteve JobsAndrew HubermanJoe RoganJordan Peterson
    Collezione premiata
    Pulitzer PrizeNational Book AwardGoodreads Choice AwardsNobel Prize in LiteratureNew York TimesCaldecott MedalNebula Award
    Argomenti in evidenza
    ManagementAmerican HistoryWarTradingStoicismAnxietySex
    Migliori libri per anno
    2025 Best Non Fiction Books2024 Best Non Fiction Books2023 Best Non Fiction Books
    Autori in evidenza
    Chimamanda Ngozi AdichieGeorge OrwellO. J. SimpsonBarbara O'NeillWinston ChurchillCharlie Kirk
    BeFreed vs altre app
    BeFreed vs. Other Book Summary AppsBeFreed vs. ElevenReaderBeFreed vs. ReadwiseBeFreed vs. Anki
    Strumenti di apprendimento
    Knowledge VisualizerAI Podcast Generator
    Informazioni
    Chi siamoarrow
    Prezziarrow
    FAQarrow
    Blogarrow
    Carrierearrow
    Partnershiparrow
    Programma Ambassadorarrow
    Directoryarrow
    BeFreed
    Try now
    © 2026 BeFreed
    Termini di utilizzoInformativa sulla privacy
    BeFreed

    Impara qualsiasi cosa, personalizzato

    DiscordLinkedIn
    Riassunti di libri in evidenza
    Crucial ConversationsThe Perfect MarriageInto the WildNever Split the DifferenceAttachedGood to GreatSay Nothing
    Categorie di tendenza
    Self HelpCommunication SkillRelationshipMindfulnessPhilosophyInspirationProductivity
    Liste di lettura delle celebrita
    Elon MuskCharlie KirkBill GatesSteve JobsAndrew HubermanJoe RoganJordan Peterson
    Collezione premiata
    Pulitzer PrizeNational Book AwardGoodreads Choice AwardsNobel Prize in LiteratureNew York TimesCaldecott MedalNebula Award
    Argomenti in evidenza
    ManagementAmerican HistoryWarTradingStoicismAnxietySex
    Migliori libri per anno
    2025 Best Non Fiction Books2024 Best Non Fiction Books2023 Best Non Fiction Books
    Strumenti di apprendimento
    Knowledge VisualizerAI Podcast Generator
    Autori in evidenza
    Chimamanda Ngozi AdichieGeorge OrwellO. J. SimpsonBarbara O'NeillWinston ChurchillCharlie Kirk
    BeFreed vs altre app
    BeFreed vs. Other Book Summary AppsBeFreed vs. ElevenReaderBeFreed vs. ReadwiseBeFreed vs. Anki
    Informazioni
    Chi siamoarrow
    Prezziarrow
    FAQarrow
    Blogarrow
    Carrierearrow
    Partnershiparrow
    Programma Ambassadorarrow
    Directoryarrow
    BeFreed
    Try now
    © 2026 BeFreed
    Termini di utilizzoInformativa sulla privacy

    Punti chiave

    1

    The Production-Ready AI Blueprint

    0:00

    Lena: You know, Miles, I was looking at some AI agent demos recently, and they all seem to work perfectly... until they don't. The moment a process crashes or an API times out, the whole thing just vanishes into thin air.

    0:14

    Miles: Exactly! That’s the "production problem." Most people treat orchestration like a simple LLM loop, but as of April 2026, we have to treat it like a distributed systems problem. If you don't have a way to resume from a mid-run checkpoint, your incident response is basically just guesswork.

    0:31

    Lena: Right, and it's fascinating that the fix isn't just "more AI"—it's actually about strict architectural separation. We’re talking about keeping your pure orchestration logic in Workflows while pushing all that messy I/O, like LLM calls and DB queries, into a dedicated Step layer.

    0:49

    Miles: That’s the key. By using a stack with TypeScript, Temporal, and the Vercel AI SDK, you can ensure every step is cached for replay-safety. So let's dive into how we actually build this durable framework.

    2

    The Architecture of Durable Intent

    1:03

    Lena: So, if we’re moving away from those simple "if-then" loops and into this more robust territory, where do we actually start? I mean, when I think about a production environment, the first thing that comes to mind is stability. How do we translate that "distributed systems" mindset into actual TypeScript code?

    1:22

    Miles: It starts with a fundamental shift in how you view your code’s execution. In a standard setup, if your server restarts, your function dies, and whatever progress you made is gone. But with a framework like Temporal, we use what’s called "durable execution." Imagine your code is running in a stateful virtual machine that just doesn't care if the physical hardware fails. It’s like having an "undo" button for your entire system’s state, except it’s actually a "resume exactly where you left off" button.

    1:50

    Lena: That sounds like magic, but I know it’s grounded in some pretty heavy-duty engineering. You mentioned separating "Workflows" from "Steps." Why is that distinction so critical? Why can’t I just put my LLM calls right inside the main logic?

    2:04

    Miles: That is the number one mistake that causes systems to fail in production. Workflows in a durable system like this must be deterministic. That means if I run the same code ten times with the same inputs, it must take the exact same path every single time. But LLM calls—and really any I/O, like a database query or an HTTP request—are inherently non-deterministic. The API might be down, the response might change, or the network might lag.

    2:31

    Lena: Oh, I see. So if the workflow engine tries to "replay" the logic to recover from a crash, and it hits a new LLM call that returns a slightly different answer, the whole execution path diverges?

    2:44

    Miles: Precisely. The engine gets confused because the "history" it recorded doesn't match the code it’s running. That’s why we push all that non-deterministic stuff into "Steps" or "Activities." When a Workflow calls a Step, Temporal records the result of that step in its event history. If the system crashes and needs to recover, it doesn't actually re-run the Step. It just looks at the history, sees "Oh, last time we ran this LLM call, we got this JSON," and it just injects that result back into the workflow.

    3:14

    Lena: That’s brilliant. It’s essentially a transparent caching layer for every single interaction with the outside world. It makes the "Workflow" a pure coordinator of logic, while the "Steps" handle the heavy lifting of talking to the internet.

    3:27

    Miles: Exactly. And because we’re using TypeScript, we can use Zod to define strict contracts between these two layers. Every Step has a Zod schema for its input and its output. So, even before the code runs, we have compile-time and runtime guarantees that the data flowing through our AI pipeline is exactly what we expect. No more "undefined is not a function" halfway through a twenty-minute AI reasoning task.

    3:50

    Lena: It’s like building a high-speed railway where the Workflows are the tracks—solid, predictable, and unchanging—and the Steps are the specialized trains that carry the data. If a train breaks down, the tracks are still there, and the system knows exactly where the next train needs to start from.

    4:08

    Miles: That’s a great way to put it. And because we’re building this in 2026, we’re seeing that this isn't just a "nice to have." As multi-agent systems grow, the communication protocol layer becomes the absolute bottleneck for reliability. One recent study—ProtocolBench—actually found that the choice of protocol can change overall completion time by over 36 percent. If your orchestration isn't durable, you’re not just risking a crash; you’re leaving a massive amount of efficiency on the table.

    4:37

    Lena: So we’ve got our tracks and our trains. But how do we actually tell the trains what to do? In an AI context, that means managing prompts. And I've heard that just hard-coding strings into your TypeScript files is a recipe for disaster.

    4:51

    Miles: Oh, it’s a total time bomb. Treating prompts like "sticky notes" instead of production code is how you end up with a mess. We need to treat them like the versioned, mission-critical assets they are.

    3

    The Prompt as a Versioned Asset

    5:02

    Lena: I love that phrase—"prompts are production code." It feels like we're finally giving the "instructions" part of AI the respect it deserves. If we aren't hard-coding them, where are they living? And how do we keep track of them when things inevitably change?

    5:16

    Miles: In a professional-grade framework, you want to store your prompts as standalone `.prompt` files. Think of these as a hybrid between a config file and a template. We use YAML for the metadata at the top—things like the model version, the temperature, and the author—and then Liquid templates for the actual prompt text.

    5:34

    Lena: Liquid templates? Like what’s used in Jekyll or Shopify? That seems really smart because it allows for conditional logic right inside the prompt.

    0:14

    Miles: Exactly! You can have a single prompt file that says, "If the user is a premium subscriber, use this tone, otherwise use that tone." But the real magic is in the versioning. You don't just "edit" a prompt. You create a new version, like `v1.1.0`. And your code doesn't just ask for "the prompt"; it asks for a specific version.

    6:02

    Lena: That solves the "who changed what" problem. I’ve seen so many teams where someone tweaks a prompt to be "friendlier," and suddenly the agent stops outputting valid JSON, and nobody can figure out why because the old prompt is just... gone.

    6:17

    Miles: Right. It’s a process failure. By keeping these in version control—right next to your TypeScript code—you get a full audit trail. And since we’re using that Step/Workflow split we talked about, the Step can load the specific `.prompt` file it needs, populate the Liquid variables, and send it off to the LLM. If the LLM call fails, Temporal handles the retry. If the prompt version was wrong, you can roll back the config in seconds.

    6:42

    Lena: It makes the whole thing feel so much more... engineering-led. It reminds me of the "Prompt Deck" approach I read about, where you manage these as disk-stored templates. It makes A/B testing so much easier, doesn't it? You could run one version of a prompt through a Step and compare it against another.

    6:58

    Miles: Absolutely. And when you combine that with the Vercel AI SDK, you can swap providers—going from Claude to Gemini or GPT-4o—in literally one line of code. The prompt stays the same, the metadata might change, but the business logic in your Workflow remains untouched. You’re effectively decoupling your "Intent" from the "Provider."

    7:18

    Lena: That "Provider Independence" is a huge deal. With models evolving so fast—I mean, we’re seeing new releases every other month in 2026—you don't want to be locked into an SDK that only works with one company. You want your architecture to be future-proof.

    7:33

    Miles: It’s about high-level policy not depending on low-level details. That’s the Dependency Inversion Principle from the SOLID world, applied to AI. Your Workflow shouldn't care if it’s talking to Anthropic or OpenAI. It should only care that it’s getting a valid response that matches its Zod schema.

    7:50

    Lena: So, we’ve got this durable execution, we’ve got versioned prompts, and we’re using Zod to make sure the data is clean. But how do we know if the AI is actually doing a good job? In a production workflow, you can't just "vibe check" every response.

    8:04

    Miles: You can't. And that’s where the next layer of our framework comes in—the "Evaluator" layer. This is where we use the LLM itself as a judge to score the quality of our own production outputs.

    4

    LLM-as-Judge and the Quality Gate

    8:16

    Lena: "LLM-as-Judge"—it sounds a bit like having the fox guard the henhouse, doesn't it? If the model is making the mistakes, can we really trust it to catch them?

    8:25

    Miles: It sounds counterintuitive, but it’s actually one of the most effective patterns we have. The key is that the "Judge" model is often a more capable, slower model—or it's given a much more specific, narrow rubric than the "Worker" model. Think of it like a senior editor reviewing a junior writer's work. The senior editor has a checklist of exactly what to look for.

    8:46

    Lena: So in our framework, we’d have a Step dedicated specifically to evaluation?

    3:27

    Miles: Exactly. After your main "Reasoning Step" finishes, the Workflow passes that output to an "Evaluator Step." This step doesn't just ask, "Is this good?" It uses a Zod-defined rubric. It might check for factual accuracy against a source document, verify the tone, or ensure no sensitive information was leaked.

    9:09

    Lena: And since this is happening inside a Temporal workflow, if the Evaluator gives a low score, the Workflow can actually loop back and tell the first agent to try again, right?

    9:18

    Miles: You’ve hit the nail on the head. That’s a "self-correcting" workflow. You can define a retry policy that says, "If the quality score is below 0.8, re-run the reasoning step with the evaluator’s feedback as a new input." This is how you move from "stochastic AI" to "reliable software."

    9:35

    Lena: I love that. It’s like the "Cognitive Operating System" concept—COCO—where you have continuous oversight. It’s not just a one-and-done prompt; it’s a conversation with a supervisor built directly into the code.

    9:46

    Miles: And it’s not just about the "Judge" model’s opinion. We can also use "Safety Tech" probes. In medical or financial apps, we might have specific evaluators that look for TLS downgrades, session hijacking tokens, or metadata leakage. The ProtocolBench research showed that different protocols like ANP or Agora handle these security probes differently. By building this into our framework, we make security an active part of the workflow, not just an afterthought.

    10:11

    Lena: It’s fascinating how this all connects. The durability of Temporal ensures the loop doesn't break, the versioned prompts ensure we can tune the "Judge," and the Zod schemas ensure the feedback is structured. It’s a total safety net.

    10:25

    Miles: It really is. But there’s a missing piece here. To run all these agents, evaluators, and loops at scale—especially if we’re talking about thousands of concurrent users—we need to talk about secrets. We’re passing API keys, customer data, and JWTs through these workflows. How do we keep that side of things from becoming a security nightmare?

    10:46

    Lena: Right, because if Temporal is recording the "history" of every step, isn't it also recording all those sensitive payloads in its database? That seems like a huge liability if someone gets access to the Temporal server.

    10:58

    Miles: It’s a massive liability if you don't handle it right. That’s why the next pillar of our framework has to be selective payload encryption.

    5

    Securing the Pulse of the System

    11:06

    Lena: Okay, so if the Temporal server is essentially a "black box" that stores our execution history, we need to make sure that history is encrypted before it ever leaves our worker, right?

    3:27

    Miles: Exactly. We use AES-256-GCM for this. The trick is to implement what’s called a "Data Converter" in the Temporal SDK. This acts as a middleman. Before any data—like a prompt or a result—is sent to the Temporal server, the Data Converter encrypts it. And when the worker pulls data back from the server, the Data Converter decrypts it.

    11:39

    Lena: So the Temporal server itself only ever sees scrambled gibberish? It knows there’s a workflow running, it knows which steps are complete, but it has no idea what the agents are actually saying?

    2:44

    Miles: Precisely. This is "Zero-Knowledge" orchestration. Even if your Temporal cluster was compromised, the attacker wouldn't be able to read your customer’s PII or your proprietary prompts. But there’s a nuance here—"selective" encryption. You don't necessarily want to encrypt everything.

    12:06

    Lena: Why not? Wouldn't it be safer to just scramble the whole thing?

    12:10

    Miles: Well, think about debugging. If everything is encrypted, the Temporal UI—which is normally this amazing tool for seeing exactly what happened in a workflow—becomes useless. You can't see the state of your system. So, we use a pattern where we only wrap "Sensitive" values.

    12:27

    Lena: Oh, like a TypeScript wrapper? I wrap my `apiKey` or `userSocialSecurityNumber` in a special `Sensitive` class, and the Data Converter only targets those specific objects for encryption?

    12:39

    Miles: That’s it. It’s the "secure payload" pattern. It gives you the best of both worlds: you get the observability for your logic and non-sensitive data, but your secrets are protected by industry-standard AES-256. It’s a crucial step for any production AI app, especially in 2026 when data privacy regulations are getting even stricter.

    12:58

    Lena: And since we’re using TypeScript, we can probably even use some type-level magic to ensure that we *can't* accidentally pass a sensitive value without it being wrapped.

    13:07

    Miles: You definitely can. You can define your Zod schemas to expect that `EncryptedPayload` type for certain fields. It’s all about creating "pit of success" architecture—making it easier to do the right thing than the wrong thing.

    13:19

    Lena: This whole stack—TypeScript, Temporal, Zod, AES encryption—it feels very "traditional software engineering," which is exactly what’s been missing from the AI world. We’re applying 20 years of distributed systems wisdom to these new, unpredictable models.

    13:36

    Miles: That’s the only way to ship at scale. I mean, look at what companies like Snap or Coinbase have done. They moved away from homegrown "SAGA" patterns and into durable execution because they realized they couldn't spend all their time writing retry logic and state-tracking code. They needed the infrastructure to handle the "how" so they could focus on the "what."

    13:54

    Lena: It’s about velocity, really. If I know my system is fault-tolerant and secure, I can iterate on my agent’s reasoning much faster. I’m not afraid to deploy a new prompt because I have a versioned rollback and a durable execution engine backing me up.

    3:27

    Miles: Exactly. But there’s one more challenge when you move from one agent to multiple agents. They have to talk to each other. And just like we need a protocol for the internet, we need a protocol for agents. This is where the world of AGENTS.md and CLAUDE.md comes in.

    6

    The Language of Agent Collaboration

    14:27

    Lena: I’ve been seeing these `.md` files in the root of almost every major AI repo lately. It’s like a `README.md`, but specifically for the AI agent that’s browsing the code, right?

    14:39

    Miles: That’s exactly what it is. It’s "AI-readable documentation." In 2026, we’ve moved past just hoping the agent "gets" our project structure. We’re being explicit. And there are two main competing standards right now—AGENTS.md and CLAUDE.md.

    14:55

    Lena: Okay, so if I’m building our production framework, which one should I use? Or do I need both?

    15:00

    Miles: It depends on your team’s tools. AGENTS.md is the open, tool-agnostic standard. It was built by a huge coalition—Sourcegraph, OpenAI, Google—and it’s now part of the Agentic AI Foundation. It’s great if your team uses a mix of tools, like Cursor, GitHub Copilot, and custom scripts. It provides a universal baseline for code style, testing rules, and project boundaries.

    15:23

    Lena: And CLAUDE.md is the Anthropic-specific one?

    6:17

    Miles: Right. It’s purpose-built for Claude Code. It’s a bit more "opinionated" but also much more powerful. It supports hierarchical rules—so you can have different instructions for different subdirectories in a monorepo—and it has this amazing "import" system where you can compose instructions from multiple files.

    15:45

    Lena: That seems perfect for a complex AI workflow. You could have a `CLAUDE.md` that imports your "Security Standards" and your "Zod Validation Rules," and then a specific one for your "Payment Agent" that only activates when the agent is working in the `services/billing` folder.

    2:44

    Miles: Precisely. It’s "contextual documentation." Instead of flooding the agent with a 10,000-word prompt, you use "progressive disclosure." The agent only reads the rules that are relevant to the files it’s actually touching. It saves tokens, reduces "context rot," and drastically improves accuracy.

    16:17

    Lena: I love that term—"context rot." It’s so true; if you give an LLM too much irrelevant info, it starts losing track of the important stuff. By using these structured `.md` files, we’re essentially giving the agent a map of our framework’s "rules of the road."

    16:33

    Miles: And the best part is, they can coexist. A common pattern now is to put your universal rules in `AGENTS.md` and then have your `CLAUDE.md` just import that file and add a few Claude-specific tweaks. It’s about creating a "handbook" for your agents, so they know exactly how to interact with your Workflows and Steps without you having to re-explain it in every prompt.

    16:55

    Lena: It’s another layer of the "Prompts as Code" philosophy. We’re moving the instructions out of the "runtime" and into the "filesystem." It’s more maintainable, more auditable, and just... cleaner.

    10:25

    Miles: It really is. But even with the best documentation and the most durable execution, you’re eventually going to hit a scale where you need to coordinate dozens of specialized agents—maybe an "Analytics Agent," a "Translation Agent," and a "Compliance Agent"—all working on the same task. How do you manage that without it becoming a chaotic mess?

    17:27

    Lena: That’s the "Orchestration vs. Choreography" debate, right? Do we have one central brain, or do we let them all talk to each other?

    7

    Scaling the Multi-Agent Mesh

    17:35

    Lena: When I think about a "mesh" of agents, I imagine something like a Slack channel where everyone is shouting at once. That sounds like a nightmare for a production system. How do we keep the coordination "durable" when we have so many moving parts?

    17:49

    Miles: You’re right to be wary. In distributed systems, there are two main ways to handle this. "Choreography" is that Slack channel idea—each agent reacts to events and talks to others directly. It’s very flexible, but it’s almost impossible to debug because there’s no single source of truth.

    18:06

    Lena: And "Orchestration" is more like a conductor in an orchestra?

    3:27

    Miles: Exactly. One central "Workflow" (our Temporal workflow) owns the state machine. It knows that Agent A must finish before Agent B starts. In 2026, for production AI, orchestration is almost always the better choice. It gives you that "centralized observability" the Akka team talks about. You can look at one execution log and see exactly how the "Compliance Agent" handed off data to the "Writing Agent."

    18:37

    Lena: But does a central orchestrator become a bottleneck? What if I need to run 50 agents in parallel on a massive data-processing task?

    18:45

    Miles: That’s where Temporal shines. You can use its "Child Workflows" or "Parallel Activities." Your main workflow can spawn 50 "Worker Steps," and Temporal handles the scheduling, the retries for each one, and the final "Join" where it aggregates all the results. It’s like having a project manager who can perfectly coordinate 50 people without ever losing a single status update.

    19:08

    Lena: And because each of those workers is a "Step," their results are cached. So if the 49th worker fails, you don't have to re-run the first 48!

    19:19

    Miles: Spot on. That’s the "Efficiency" play. The ProtocolBench research found that in a "Streaming Queue" scenario—where a coordinator is feeding tasks to workers—the choice of communication protocol can significantly impact throughput. Using something like the "Agent Communication Protocol" or ACP can minimize latency while keeping the load balanced.

    19:40

    Lena: It’s interesting that we’re seeing "Protocol Routers" now, too. The idea that a system can look at a task and say, "For this high-security medical task, I’ll use the ANP protocol with its DID-based identity, but for this high-speed data task, I’ll swap to A2A for the throughput."

    19:58

    Miles: It’s the next level of maturity. We’re not just building "an AI app" anymore; we’re building "AI networks." And our framework needs to be flexible enough to handle those different "interaction modes." Maybe one agent needs to stream its thoughts token-by-token, while another just needs to submit a final report.

    20:17

    Lena: It’s a lot to take in, but it all points to one thing: the era of the "unreliable AI demo" is over. If you want to build something that actually survives the real world, you have to build it on these durable, structured foundations.

    20:33

    Miles: You really do. And it’s not as daunting as it sounds once you have the right mental model. Workflow for logic, Step for I/O, Zod for contracts, and Temporal for the "durable pulse" that keeps it all alive.

    Lena: So, we’ve covered the architecture, the prompts, the quality gates, the security, and the scaling. I think it’s time to put this all into a concrete "Playbook" that our listeners can actually use.

    8

    The Production AI Playbook

    20:46

    Lena: Okay, Miles, let’s get practical. If someone is sitting down tomorrow to start building this, what does the "Day One" checklist look like?

    20:55

    Miles: Step one: Get your environment right. Install the Temporal CLI and the Vercel AI SDK. Don't even think about writing business logic until you have your "Workflow" and "Activity" folders separated. That’s your structural foundation.

    6:17

    Lena: Right. And for the prompts?

    21:11

    Miles: Step two: Create a `prompts/` directory. Every prompt gets its own `.prompt` file with YAML frontmatter. Use Liquid for your templates. And here’s the rule: no prompt is allowed to be used in code unless it has a version tag. If you’re just "editing the string," you’re doing it wrong.

    21:28

    Lena: I love that rule. It forces discipline from the start. What about the "I/O Layer"?

    21:33

    Miles: Step three: Every LLM call, every database fetch, every external API request must be wrapped in a Temporal Activity—our "Step." And every one of those steps must have a Zod schema for its output. If the LLM returns a hallucinated field, Zod catches it at the boundary, and the Step fails gracefully instead of crashing your whole system.

    21:56

    Lena: And Step four has to be security, right?

    21:59

    Miles: Definitely. Configure your Temporal Data Converter with AES-256-GCM encryption. Wrap your sensitive data in that `Sensitive` type we talked about. This ensures your prompts and customer data stay private, even in your history logs.

    22:14

    Lena: And Step five? How do we handle the "Vibe Check" problem?

    22:18

    Miles: Build an "Evaluator Step" into your main Workflow loop. Don't just trust the output. Use a "Judge" model with a specific Zod rubric to score every reasoning task. If the score is low, use Temporal’s "Retry" logic to give it another go. This is how you guarantee quality.

    22:34

    Lena: This feels like a complete "System of Record" for AI. It’s not just code; it’s a way of operating. I can imagine this being a huge relief for developers who are tired of their agents being "flakey."

    22:46

    Miles: It’s about peace of mind. When you have this stack, an "incident" isn't a disaster. It’s just a workflow that’s "waiting for retry." You can see exactly where it is, you can inspect the (encrypted) state, and you can tune your prompts to fix the underlying issue without ever losing the work that’s already been done.

    23:02

    Lena: It’s a complete paradigm shift. We’re moving from "hope-based engineering" to "durable engineering." It’s the difference between a toy and a tool.

    3:27

    Miles: Exactly. And the best part is, this stack scales. Whether you’re running one agent or a mesh of 100, these principles remain the same. The architecture protects you from the complexity.

    23:23

    Lena: Well, Miles, this has been an absolute deep dive. I feel like I have a whole new perspective on what "production-ready" actually means for AI.

    23:31

    Miles: I’m glad. It’s a fascinating time to be building. We’re finally seeing the "software engineering" side of AI catch up to the "research" side.

    9

    Reflections on the Durable Future

    23:40

    Lena: You know, Miles, as we bring this to a close, what sticks with me most is how much of this "AI future" is actually built on very solid, traditional engineering values. Separation of concerns, version control, strict typing, encryption—it’s like we’re taking the "wild west" of LLMs and putting a proper infrastructure around it.

    24:01

    Miles: That’s the real takeaway. The models are powerful, but they’re just engines. To build a car that actually goes somewhere reliably, you need the chassis, the brakes, the dashboard, and the seatbelts. This framework—this "durable" stack—is the rest of the car.

    24:18

    Lena: It’s about building trust. If we want AI to handle our medical records, our financial trades, or our business operations, we need to know that the system is resilient. We need to know that it doesn't just "forget" what it was doing because a server blinked.

    6:17

    Miles: Right. And as we move toward 2027 and beyond, the teams that succeed won't just be the ones with the "best" prompts. They’ll be the ones with the best *systems*. The ones who can iterate safely, scale effortlessly, and recover from failures without anyone even noticing.

    24:48

    Lena: It’s an exciting challenge. For everyone listening, I hope this gives you a concrete path forward. Whether you’re starting a new project or trying to fix a "flakey" agent, maybe try applying just one of these patterns this week. Start with the Workflow/Step split, or move your prompts into version-controlled files.

    25:06

    Miles: Even small steps make a huge difference in the long run. Building with durability in mind changes how you think about every line of code you write.

    6:58

    Lena: Absolutely. Well, thank you so much for joining us for this deep dive. It’s been a blast exploring the "production problem" and its solutions with you.

    25:22

    Miles: Same here, Lena. It’s always a pleasure.

    25:24

    Lena: To our listeners, take a moment to reflect on your own AI workflows. Where is the "fragility" in your system? And how could a bit of durable execution or a structured quality gate help you build something that truly lasts? Thanks for listening, and happy building.

    Contenuti simili

    podcast cover
    Keras Reinforcement Learning ProjectsRebooting AIWhat Is ChatGPT Doing ... and Why Does It Work?How to Stay Smart in a Smart World
    19 sources
    Architecting Intelligent Data Agents
    Discover how to design agentic AI systems that transform data analysis through autonomous workflows, reflection capabilities, and multi-agent collaboration—essential knowledge for building powerful analytical systems.
    29 min
    podcast cover
    Artificial Intelligence and Generative AI for BeginnersChatGPT for DummiesKeras Reinforcement Learning ProjectsPython Cookbook
    17 sources
    Build a Private AI Encyclopedia for Your Data
    Stop drowning in messy digital archives. Learn how to use a TypeScript CLI and AI agents to turn raw data into a structured, local-first personal wiki.
    27 min
    podcast cover
    Agentic AI Architecture: Types, Components & Best PracticesAgentic AI Architectures And Design Patterns | by Anil Jain - MediumKey components of a data-driven agentic AI application - AWSAgentic AI Architecture Framework for Enterprises
    6 sources
    Building Agentic AI Systems in Software Engineering
    Deep dive into revolutionary AI that thinks, plans, and acts autonomously. Learn practical architectures, implementation patterns, and governance frameworks for building intelligent systems in software companies.
    10 min
    podcast cover
    source 1Guide To Organizational AI Agent Integration & ImplementationImplementing Agentic AI in Internal Communications | Simpplrsource 4
    6 sources
    AI Agents for 3Cat: Beyond Automation
    Strategic guide to building department-specific AI agents for content, production, and editorial teams. Avoid the 85% failure rate with proven implementation frameworks that enhance creativity rather than replace it.
    15 min
    podcast cover
    source 1source 2source 3source 4
    6 sources
    Catching the AI Wave: Tools, Workflows & Multimodal Mastery
    Learn to harness AI's true potential through practical tool mastery, agent workflows, and multimodal systems. Discover how to amplify your abilities rather than compete with AI in this revolutionary era.
    8 min
    podcast cover
    Human + MachineImpromptuWhat To Do When Machines Do EverythingHuman/Machine
    25 sources
    Why AI Agents and RPA Work Better Together
    Stop forcing rigid RPA tools to do complex thinking. Learn how n8n orchestrates a hybrid architecture that combines legacy scripts with agentic reasoning.
    18 min
    book cover
    AI Superpowers
    Kai-Fu Lee
    A thought-provoking exploration of AI's future, comparing China and Silicon Valley's approaches and their global impact.
    11 min
    book cover
    Our Final Invention
    James Barrat
    A chilling exploration of artificial intelligence's potential to surpass and possibly exterminate humanity, urging caution in our technological pursuits.
    9 min