
Podcast
KubeFM
By KubeFM
79
5
Discover all the great things happening in the world of Kubernetes, learn (controversial) opinions from the experts and explore the successes (and failures) of running Kubernetes at scale.
Discover all the great things happening in the world of Kubernetes, learn (controversial) opinions from the experts and explore the successes (and failures) of running Kubernetes at scale.
A Journey Through Kafkian SplitDNS in a Multitenant Kubernetes, with Fabián Sellés Rosa
Episode in
KubeFM
Fabián Sellés Rosa, Tech Lead of the Runtime team at Adevinta, walks through a real engineering investigation that started with a simple request: allowing tenants to use third-party Kafka services. What seemed straightforward turned into a complex DNS resolution problem that required testing seven different approaches before a working solution was found.
You will learn:
Why Kafka's multi-step DNS resolution creates unique challenges in multi-tenant environments, where bootstrap servers and dynamic broker lists complicate standard DNS approaches
The iterative debugging process from Route 53 split DNS through Kubernetes native pod DNS config, custom DNS servers, Kafka proxies, and CoreDNS solutions
How to implement the final solution using node-local DNS and CoreDNS templating with practical details including ndots configuration and Kyverno automation
Platform engineering evaluation criteria for assessing solutions based on maintainability, self-service capability, and evolvability in multi-tenant environments
Sponsor
This episode is sponsored by LearnKube — get started on your Kubernetes journey through comprehensive online, in-person or remote training.
More info
Find all the links and info for this episode here: https://ku.bz/NsBZ-FwcJ
Interested in sponsoring an episode? Learn more.
31:12
More Kubernetes Than I Bargained For, with Amos Wenger
Episode in
KubeFM
Amos Wenger walks through his production incident where adding a home computer as a Kubernetes node caused TLS certificate renewals to fail. The discussion covers debugging techniques using tools like netshoot and K9s, and explores the unexpected interactions between Kubernetes overlay networks and consumer routers.
You will learn:
How Kubernetes networking assumptions break when mixing cloud VMs with nodes behind consumer routers, and why cert-manager challenges fail in NAT environments
The differences between CNI plugins like Flannel and Calico, particularly how they handle IPv6 translation
Debugging techniques for network issues using tools like netshoot, K9s, and iproute2
Best practices for mixed infrastructure including proper node labeling, taints, and scheduling controls
Sponsor
This episode is sponsored by LearnKube — get started on your Kubernetes journey through comprehensive online, in-person or remote training.
More info
Find all the links and info for this episode here: https://ku.bz/6Ll_7slr9
Interested in sponsoring an episode? Learn more.
25:14
The Karpenter Effect: Redefining Kubernetes Operations, with Tanat Lokejaroenlarb
Episode in
KubeFM
Tanat Lokejaroenlarb shares the complete journey of replacing EKS Managed Node Groups and Cluster Autoscaler with AWS Karpenter. He explains how this migration transformed their Kubernetes operations, from eliminating brittle upgrade processes to achieving significant cost savings of €30,000 per month through automated instance selection and AMD adoption.
You will learn:
How to decouple control plane and data plane upgrades using Karpenter's asynchronous node rollout capabilities
Cost optimization strategies including flexible instance selection, automated AMD migration, and the trade-offs between cheapest-first selection versus performance considerations
Scaling and performance tuning techniques such as implementing over-provisioning with low-priority placeholder pods
Policy automation and operational practices using Kyverno for user experience simplification, implementing proper Pod Disruption Budgets
Sponsor
This episode is sponsored by StormForge by CloudBolt — automatically rightsize your Kubernetes workloads with ML-powered optimization
More info
Find all the links and info for this episode here: https://ku.bz/T6hDSWYhb
Interested in sponsoring an episode? Learn more.
26:25
Building Kubernetes (a lite version) from scratch in Go, with Owumi Festus
Episode in
KubeFM
Festus Owumi walks through his project of building a lightweight version of Kubernetes in Go. He removed etcd (replacing it with in-memory storage), skipped containers entirely, dropped authentication, and focused purely on the control plane mechanics. Through this process, he demonstrates how the reconciliation loop, API server concurrency handling, and scheduling logic actually work at their most basic level.
You will learn:
How the reconciliation loop works - The core concept of desired state vs current state that drives all Kubernetes operations
Why the API server is the gateway to etcd - How Kubernetes prevents race conditions using optimistic concurrency control and why centralized validation matters
What the scheduler actually does - Beyond simple round-robin assignment, understanding node affinity, resource requirements, and the complex scoring algorithms that determine pod placement
The complete pod lifecycle - Step-by-step walkthrough from kubectl command to running pod, showing how independent components work together like an orchestra
Sponsor
This episode is sponsored by StormForge by CloudBolt — automatically rightsize your Kubernetes workloads with ML-powered optimization
More info
Find all the links and info for this episode here: https://ku.bz/pf5kK9lQF
Interested in sponsoring an episode? Learn more.
33:11
Graphs in your head, or how to assess a Kubernetes workload, with Oleksii Kolodiazhnyi
Episode in
KubeFM
Understanding what's actually happening inside a complex Kubernetes system is one of the biggest challenges architects face.
Oleksii Kolodiazhnyi, Senior Architect at Mirantis, shares his structured approach to Kubernetes workload assessment. He breaks down how to move from high-level business understanding to detailed technical analysis, using visualization tools and systematic documentation.
You will learn:
A top-down assessment methodology that starts with business cases and use cases before diving into technical details
Practical visualization techniques using tools like KubeView, K9s, and Helm dashboard to quickly understand resource interactions
Systematic resource discovery approaches for different scenarios, from well-documented Helm-based deployments to legacy applications with hard-coded configurations buried in containers
Documentation strategies for creating consumable artifacts that serve different audiences, from business stakeholders to new team members joining the project
Sponsor
This episode is sponsored by StormForge by CloudBolt — automatically rightsize your Kubernetes workloads with ML-powered optimization
More info
Find all the links and info for this episode here: https://ku.bz/zDThxGQsP
Interested in sponsoring an episode? Learn more.
42:26
Our Journey to GitOps: Migrating to ArgoCD with Zero Downtime, with Andrew Jeffree
Episode in
KubeFM
Andrew Jeffree from SafetyCulture walks through their complete migration of 250+ microservices from a fragile Helm-based setup to GitOps with ArgoCD, all without any downtime. He explains how they replaced YAML configurations with a domain-specific language built in CUE, creating a better developer experience while adding stronger validation and reducing operational pain points.
You will learn:
Zero-downtime migration techniques using temporary deployments with prune-last sync options to ensure healthy services before removing legacy ones
How CUE lang improves on YAML by providing schema validation, early error detection, and a cleaner interface for developers
Human-centric platform engineering approaches that prioritize developer experience and reduce on-call burden through empathy-driven design decisions
Sponsor
This episode is brought to you by Testkube—where teams run millions of performance tests in real Kubernetes infrastructure. From air-gapped environments to massive scale deployments, orchestrate every testing tool in one platform. Check it out at testkube.io
More info
Find all the links and info for this episode here: https://ku.bz/Xvyp1_Qcv
Interested in sponsoring an episode? Learn more.
34:35
The Double-Edged Sword of AI-Assisted Kubernetes Operations, with Mai Nishitani
Episode in
KubeFM
Mai Nishitani, Director of Enterprise Architecture at NTT Data and AWS Community Builder, demonstrates how Model Context Protocol (MCP) enables Claude to directly interact with Kubernetes clusters through natural language commands.
You will learn:
How MCP servers work and why they're significant for standardizing AI integration with DevOps tools, moving beyond custom integrations to a universal protocol
The practical capabilities and critical limitations of AI in Kubernetes operations
Why fundamental troubleshooting skills matter more than ever as AI abstractions can fail in unexpected ways, especially during crisis scenarios and complex system failures
How DevOps roles are evolving from manual administration toward strategic architecture and orchestration
Sponsor
This episode is brought to you by Testkube—where teams run millions of performance tests in real Kubernetes infrastructure. From air-gapped environments to massive scale deployments, orchestrate every testing tool in one platform. Check it out at testkube.io
More info
Find all the links and info for this episode here: https://ku.bz/3hWvQjXxp
Interested in sponsoring an episode? Learn more.
50:28
The Making of Flux: The Future, a KubeFM Original Series
Episode in
KubeFM
In this closing episode, Bryan Ross (Field CTO at GitLab), Jane Yan (Principal Program Manager at Microsoft), Sean O’Meara (CTO at Mirantis) and William Rizzo (Strategy Lead, CTO Office at Mirantis) discuss how GitOps evolves in practice.
How enterprises are embedding Flux into developer platforms and managed cloud services.
Why bridging CI/CD and infrastructure remains a core challenge—and how GitOps addresses it.
What leading platform teams (GitLab, Microsoft, Mirantis) see as the next frontier for GitOps.
Sponsor
Join the Flux maintainers and community at FluxCon, November 11th in Atlanta—register here
More info
Find all the links and info for this episode here: https://ku.bz/tVqKwNYQH
Interested in sponsoring an episode? Learn more.
26:52
The Data Engineer's guide to optimizing Kubernetes, with Niels Claeys
Episode in
KubeFM
Niels Claeys shares how his team at DataMinded built Conveyor, a data platform processing up to 1.5 million core hours monthly. He explains the specific optimizations they discovered through production experience, from scheduler changes that immediately reduce costs by 10-15% to achieving 97% spot instance usage without reliability issues.
You will learn:
Why the default Kubernetes scheduler wastes money on batch workloads and how switching from "least allocated" to "most allocated" scheduling enables faster scale-down and better resource utilization
How to achieve 97% spot instance adoption through strategic instance type diversification, region selection, and Spark-specific techniques
Node pool design principles that balance Kubernetes overhead with workload efficiency
Platform-specific gotchas like AWS cross-AZ data transfer costs that can spike bills unexpectedly
Sponsor
This episode is brought to you by Testkube—where teams run millions of performance tests in real Kubernetes infrastructure. From air-gapped environments to massive scale deployments, orchestrate every testing tool in one platform. Check it out at testkube.io
More info
Find all the links and info for this episode here: https://ku.bz/hGRfkzDJW
Interested in sponsoring an episode? Learn more.
43:01
The Making of Flux: The Scale, a KubeFM Original Series
Episode in
KubeFM
In this episode, Philippe Ensarguet, VP of Software Engineering at Orange, and Arnab Chatterjee, Global Head of Container & AI Platforms at Nomura, share how large enterprises are adopting Flux to drive reliable, compliant, and scalable platforms.
How Orange uses Flux to manage bare-metal Kubernetes through its SYLVR project.
Why Nomura relies on GitOps to balance agility with governance in financial services.
How Flux helps enterprises achieve resilience, compliance, and repeatability at scale.
Sponsor
Join the Flux maintainers and community at FluxCon, November 11th in Atlanta—register here
More info
Find all the links and info for this episode here: https://ku.bz/tWcHlJm7M
Interested in sponsoring an episode? Learn more.
23:09
How We Integrated Native macOS Workloads with Kubernetes, with Vitalii Horbachov
Episode in
KubeFM
Vitalii Horbachov explains how Agoda built macOS VZ Kubelet, a custom solution that registers macOS hosts as Kubernetes nodes and spins up macOS VMs using Apple's native virtualization framework. He details their journey from managing 200 Mac minis with bash scripts to a Kubernetes-native approach that handles 20,000 iOS tests at scale.
You will learn:
How to build hybrid runtime pods that combine macOS VMs with Docker sidecar containers for complex CI/CD workflows
Custom OCI image format implementation for managing 55-60GB macOS VM images with layered copy-on-write disks and digest validation
Networking and security challenges including Apple entitlements, direct NIC access, and implementing kubectl exec over SSH
Real-world adoption considerations including MDM-based host lifecycle management and the build vs. buy decision for Apple infrastructure at scale
Sponsor
This episode is brought to you by Testkube—where teams run millions of performance tests in real Kubernetes infrastructure. From air-gapped environments to massive scale deployments, orchestrate every testing tool in one platform. Check it out at testkube.io
More info
Find all the links and info for this episode here: https://ku.bz/q_JS76SvM
Interested in sponsoring an episode? Learn more.
24:29
The Making of Flux: The Rewrite, a KubeFM Original Series
Episode in
KubeFM
In this episode, Michael Bridgen (the engineer who wrote Flux's first lines) and Stefan Prodan (the maintainer who led the V2 rewrite) share how Flux grew from a fragile hack-day script into a production-grade GitOps toolkit.
How early Flux addressed the risks of manual, unsafe Kubernetes upgrades
Why the complete V2 rewrite was critical for stability, scalability, and adoption
What the maintainers learned about building a sustainable, community-driven open-source project
Sponsor
Join the Flux maintainers and community at FluxCon, November 11th in Salt Lake City—register here
More info
Find all the links and info for this episode here: https://ku.bz/bgkgn227-
Interested in sponsoring an episode? Learn more.
44:58
Scaling CI horizontally with Buildkite, Kubernetes, and multiple pipelines, with Ben Poland
Episode in
KubeFM
Ben Poland walks through Faire's complete CI transformation, from a single Jenkins instance struggling with thousands of lines of Groovy to a distributed Buildkite system running across multiple Kubernetes clusters.
He details the technical challenges of running CI workloads at scale, including API rate limiting, etcd pressure points, and the trade-offs of splitting monolithic pipelines into service-scoped ones.
You will learn:
How to architect CI systems that match team ownership and eliminate shared failure points across services
Kubernetes scaling patterns for CI workloads, including multi-cluster strategies, predictive node provisioning, and handling API throttling
Performance optimization techniques like Git mirroring, node-level caching, and spot instance management for variable CI demands
Migration strategies and lessons learned from moving away from monolithic CI, including proof-of-concept approaches and avoiding the sunk cost fallacy
Sponsor
This episode is brought to you by Testkube—where teams run millions of performance tests in real Kubernetes infrastructure. From air-gapped environments to massive scale deployments, orchestrate every testing tool in one platform. Check it out at testkube.io
More info
Find all the links and info for this episode here: https://ku.bz/klBmzMY5-
Interested in sponsoring an episode? Learn more.
48:02
Not Every Problem Needs Kubernetes, with Danyl Novhorodov
Episode in
KubeFM
Danyl Novhorodov, a veteran .NET engineer and architect at Eneco, presents his controversial thesis that 90% of teams don't actually need Kubernetes. He walks through practical decision-making frameworks, explores powerful alternatives like BEAM runtimes and Actor models, and explains why starting with modular monoliths often beats premature microservices adoption.
You will learn:
The COST decision framework - How to evaluate infrastructure choices based on Complexity, Ownership, Skills, and Time rather than industry hype
Platform engineering vs. managed services - How to honestly assess whether your team can compete with AWS, Azure, and Google's managed container platforms
Evolutionary architecture approach - Why modular monoliths with clear boundaries often provide better foundations than distributed systems from day one
Sponsor
This episode is brought to you by Testkube—where teams run millions of performance tests in real Kubernetes infrastructure. From air-gapped environments to massive scale deployments, orchestrate every testing tool in one platform. Check it out at testkube.io
More info
Find all the links and info for this episode here: https://ku.bz/BYhFw8RwW
Interested in sponsoring an episode? Learn more.
52:40
VerticalPodAutoscaler Went Rogue: It Took Down Our Cluster, with Thibault Jamet
Episode in
KubeFM
Running 30 Kubernetes clusters serving 300,000 requests per second sounds impressive until your Vertical Pod Autoscaler goes rogue and starts evicting critical system pods in an endless loop.
Thibault Jamet shares the technical details of debugging a complex VPA failure at Adevinta, where webhook timeouts triggered continuous pod evictions across their multi-tenant Kubernetes platform.
You will learn:
VPA architecture deep dive - How the recommender, updater, and mutating webhook components interact and what happens when the webhook fails
Hidden Kubernetes limits - How default QPS and burst rate limits in the Kubernetes Go client can cause widespread failures, and why these aren't well documented in Helm charts
Monitoring strategies for autoscaling - What metrics to track for webhook latency and pod eviction rates to catch similar issues before they become critical
Sponsor
This episode is brought to you by Testkube—where teams run millions of performance tests in real Kubernetes infrastructure. From air-gapped environments to massive scale deployments, orchestrate every testing tool in one platform. Check it out at testkube.io
More info
Find all the links and info for this episode here: https://ku.bz/rf1pbWXdN
Interested in sponsoring an episode? Learn more.
37:34
The origin, a KubeFM Original Series
Episode in
KubeFM
This episode unpacks the technical and governance milestones that secured Flux's place in the cloud-native ecosystem, from a 45-minute production outage that led to the birth of GitOps to the CNCF process that defines project maturity and the handover of stewardship after Weaveworks' closure.
You will learn:
How a single incident pushed Weaveworks to adopt Git as the source of truth, creating the foundation of GitOps.
How Flux sustained continuity after Weaveworks shut down through community governance.
Where Flux is heading next with security guidance, Flux v2, and an enterprise-ready roadmap.
Sponsor
Join the Flux maintainers and community at FluxCon, November 11th in Salt Lake City—register here
More info
Find all the links and info for this episode here: https://ku.bz/5Sf5wpd8y
Interested in sponsoring an episode? Learn more.
22:28
Predictive vs Reactive: A Journey to Smarter Kubernetes Scaling, with Jorrick Stempher
Episode in
KubeFM
Jorrick Stempher shares how his team of eight students built a complete predictive scaling system for Kubernetes clusters using machine learning.
Rather than waiting for nodes to become overloaded, their system uses the Prophet forecasting model to proactively anticipate load patterns and scale infrastructure, giving them the 8-9 minutes needed to provision new nodes on Vultr.
You will learn:
How to implement predictive scaling using Prophet ML model, Prometheus metrics, and custom APIs to forecast Kubernetes workload patterns
The Node Ranking Index (NRI) - a unified metric that combines CPU, RAM, and request data into a single comparable number for efficient scaling decisions
Real-world implementation challenges, including data validation, node startup timing constraints, load testing strategies, and the importance of proper research before building complex scaling solutions
Sponsor
This episode is brought to you by Testkube—the ultimate Continuous Testing Platform for Cloud Native applications. Scale fast, test continuously, and ship confidently. Check it out at testkube.io
More info
Find all the links and info for this episode here: https://ku.bz/clbDWqPYp
Interested in sponsoring an episode? Learn more.
26:00
Solving Cold Starts: Uses Istio to Warm Up Java Pods, with Frédéric Gaudet
Episode in
KubeFM
If you're running Java applications in Kubernetes, you've likely experienced the pain of slow pod startups affecting user experience during deployments and scaling events.
Frédéric Gaudet, Senior SRE at BlaBlaCar, shares how his team solved the cold start problem for their 1,500 Java microservices using Istio's warm-up capabilities.
You will learn:
Why Java applications struggle with cold starts and how JIT compilation affects initial request latency in Kubernetes environments
How Istio's warm-up feature works to gradually ramp up traffic to new pods
Why other common solutions fail, including resource over-provisioning, init containers, and tools like GraalVM
Real production impact from implementing this solution, including dramatic improvements in message moderation SLOs at BlaBlaCar's scale of 4,000 pods
Sponsor
This episode is brought to you by Testkube—the ultimate Continuous Testing Platform for Cloud Native applications. Scale fast, test continuously, and ship confidently. Check it out at testkube.io
More info
Find all the links and info for this episode here: https://ku.bz/grxcypt9j
Interested in sponsoring an episode? Learn more.
35:22
Teaching Kubernetes to Scale with a MacBook Screen Lock, with Brian Donelan
Episode in
KubeFM
Brian Donelan, VP Cloud Platform Engineering at JPMorgan Chase, shares his ingenious side project that automatically scales Kubernetes workloads based on whether his MacBook is open or closed.
By connecting macOS screen lock events to CloudWatch, KEDA, and Karpenter, he built a system that achieves 80% cost savings by scaling pods and nodes to zero when he's away from his laptop.
You will learn:
How KEDA differs from traditional Kubernetes HPA - including its scale-to-zero capabilities, event-driven scaling, and extensive ecosystem of 60+ built-in scalers
The technical architecture connecting macOS notifications through CloudWatch to trigger Kubernetes autoscaling using Swift, AWS SDKs, and custom metrics
Cost optimization strategies including how to calculate actual savings, account for API costs, and identify leading indicators of compute demand
Creative approaches to autoscaling signals beyond CPU and memory, including examples from financial services and e-commerce that could revolutionize workload management
Sponsor
This episode is brought to you by Testkube—the ultimate Continuous Testing Platform for Cloud Native applications. Scale fast, test continuously, and ship confidently. Check it out at testkube.io
More info
Find all the links and info for this episode here: https://ku.bz/sFd8TL1cS
Interested in sponsoring an episode? Learn more.
27:44
Building a Carbon and Price-Aware Kubernetes Scheduler, with Dave Masselink
Episode in
KubeFM
Data centers consume over 4% of global electricity and this number is projected to triple in the next few years due to AI workloads.
Dave Masselink, founder of Compute Gardener, discusses how he built a Kubernetes scheduler that makes scheduling decisions based on real-time carbon intensity data from power grids.
You will learn:
How carbon-aware scheduling works - Using real-time grid data to shift workloads to periods when electricity generation has lower carbon intensity, without changing energy consumption
Technical implementation details - Building custom Kubernetes schedulers using the scheduler plugin framework, including pre-filter and filter stages for carbon and time-of-use pricing optimization
Energy measurement strategies - Approaches for tracking power consumption across CPUs, memory, and GPUs
Sponsor
This episode is brought to you by Testkube—the ultimate Continuous Testing Platform for Cloud Native applications. Scale fast, test continuously, and ship confidently. Check it out at testkube.io
More info
Find all the links and info for this episode here: https://ku.bz/zk2xM1lfW
Interested in sponsoring an episode? Learn more.
41:28
You may also like View more
monos estocásticos
monos estocásticos es un podcast sobre inteligencia artificial presentado por Antonio Ortiz (@antonello) y Matías S. Zavia (@matiass).
Sacamos un episodio nuevo cada jueves. Puedes seguirnos en YouTube, LinkedIn y X. Más enlaces en cuonda.com/monos-estocasticos/links
Hacemos todo lo que los monos estocásticos saben hacer: coser secuencias de formas lingüísticas que hemos observado en nuestros vastos datos de entrenamiento según la información probabilística de cómo se combinan. Updated
Somos Eléctricos
Podcast diario dedicado a difundir y a dar a conocer el mundo de los vehículos eléctricos.
En estos podcasts te hablamos de las últimas novedades del sector además de compartir, debatir y opinar sobre distintos temas referentes a los coches eléctricos, energía sostenible y tecnología aplicada a los vehículos.
Finalmente también usamos esta plataforma de podcast para resolver dudas o dar respuesta a las preguntas de nuestros oyentes. Updated
Loop Infinito (by Xataka)
Loop Infinito es un podcast diario de Xataka presentado por Javier Lacort.
Un nuevo episodio cada día de lunes a viernes que analiza la actualidad tecnológica dando contexto y perspectiva.. Updated



