¡ÚLTIMAS HORAS! Disfruta de todo 1 año de Plus al 45% de dto ¡Lo quiero!
The MAD Podcast
Podcast

The MAD Podcast

113
0

The MAD Podcast with Matt Turck, is a series of conversations with leaders from across the Machine Learning, AI, & Data landscape hosted by leading AI & data investor and Partner at FirstMark Capital, Matt Turck.

The MAD Podcast with Matt Turck, is a series of conversations with leaders from across the Machine Learning, AI, & Data landscape hosted by leading AI & data investor and Partner at FirstMark Capital, Matt Turck.

113
0

Everything Gets Rebuilt: The New AI Agent Stack | Harrison Chase, LangChain

The era of the simple AI wrapper is officially dead, and the entire software infrastructure layer is being completely rebuilt. Live from the Daytona COMPUTE Conference in San Francisco, Harrison Chase, co-founder and CEO of LangChain, joins the MAD Podcast to explain why this massive shift is happening. As agents evolve from simple prompt-based systems into software that can plan, use tools, write code, manage files, and remember things over time, the real frontier is shifting from the model itself to the stack around the model. In this conversation, we go deep under the hood of this new, post-cloud architecture to deconstruct harnesses, sub-agents, context compaction, observability, memory, and the critical need for secure compute sandboxes. For anyone building in AI today, this episode cuts through the noise to reveal the new infrastructure required to make autonomous agents work in the real world. (00:00) Intro - meet Harrison Chase (01:32) What changed in agents over the last year (03:57) Why coding agents are ahead (06:26) Do models commoditize the framework layer? (08:27) Harnesses, in plain English (10:11) Why system prompts matter so much (13:11) The upside — and downside — of subagents (15:31) Why a useful agent needs a filesystem (18:13) The core primitives of modern agents (19:12) Skills: the new primitive (20:19) What context compaction actually means (23:02) How memory works in agents (25:16) One mega-agent or many specialized agents? (27:46) Has MCP won? (29:38) Why agents need sandboxes (32:35) How sandboxes help with security (33:32) How Harrison Chase started LangChain (37:24) LangChain vs LangGraph vs Deep Agents (40:17) Why observability matters more for agents (41:48) Evals, no-code, and continuous improvement (44:41) What LangChain is building next (45:29) Where the real moat in AI lives
Internet and technology Yesterday
0
0
7
46:57

AI That Can Prove It’s Right: Verification as the Missing Layer in AI — Carina Hong

What if AI didn’t just sound right — but could prove it? In this episode of the MAD Podcast, Matt Turck sits down with Carina Hong, a 24-year-old former math olympiad competitor and Rhodes Scholar, and the founder/CEO of Axiom Math, to unpack how AxiomProver earned a perfect 12/12 on the Putnam 2025 and why formal verification (via Lean) may be the missing layer for reliable reasoning. Carina argues we’re entering a “math renaissance” where verified reasoning systems can tackle problems that currently take researchers months — and potentially push beyond math into verified code, hardware, and high-stakes software. They go inside the “generation + verification” loop, what it means to build AI that can be trusted, and what this approach could unlock on the road to superintelligent reasoning. (00:00) Intro (01:25) Why the World Needs an AI Mathematician (02:57) Scoring 12/12 on the World's Hardest Math Test (Putnam) (04:05) The First AI to Solve Open Research Conjectures (06:59) Does AI Solve Math in "Alien" Ways? (The Move 37 Effect) (08:59) "Lean": The Programming Language of Proofs Explained (10:51) How Axiom's Approach Differs from DeepMind & OpenAI (16:06) Formal vs. Informal Reasoning (And Auto-Formalization) (17:37) The AI "Reward Hacking" Problem (20:18) Building an AI That is 100% Correct, 100% of the Time (23:23) Beyond Math: Verified Code & Hardware Verification (25:12) The Brutal Reality of Competitive Math Olympiads (29:30) From Neuroscience to Stanford Law to Dropout Founder (33:57) How Axiom Actually Works Under the Hood (The Architecture) (37:51) The Secret to Generating Perfect Synthetic Data (40:14) Tokens, Proof Length, and Inference Cost (42:58) The "Everest" of Mathematics: Scaling Reasoning Trees (46:32) Can an AI Win a Fields Medal? (47:25) "Math Renaissance": What Changes if This Works (55:47) How Mathematicians React to AI (And Why Proof Certificates Matter) (57:30) Becoming a CEO: Dropping Ego and Building Culture (1:00:42) Recruiting World-Class Talent & Building the Axiom "Tribe"
Internet and technology 2 weeks
0
0
7
01:03:52

Voice AI’s Big Moment: Why Everything Is Changing Now (ft. Neil Zeghidour, Gradium AI)

Voice used to be AI’s forgotten modality — awkward, slow, and fragile. Now it’s everywhere. In this reference episode on all things Voice AI, Matt Turck sits down with Neil Zeghidour, a top AI researcher and CEO of Gradium AI (ex-DeepMind/Google, Meta, Kyutai), to cover voice agents, speech-to-speech models, full-duplex conversation, on-device voice, and voice cloning. We unpack what actually changed under the hood — why voice is finally starting to feel natural, and why it may become the default interface for a new generation of AI assistants and devices. Neil breaks down today’s dominant “cascaded” voice stack — speech recognition into a text model, then text-to-speech back out — and why it’s popular: it’s modular and easy to customize. But he argues it has two key downsides: chaining models adds latency, and forcing everything through text strips out paralinguistic signals like tone, stress, and emotion. The next wave, he suggests, is combining cascade-like flexibility with the more natural feel of speech-to-speech and full-duplex conversation. We go deep on full-duplex interaction (ending awkward turn-taking), the hardest unsolved problems (noisy real-world environments and multi-speaker chaos), and the realities of deploying voice at scale — including why models must be compact and when on-device voice is the right approach. Finally, we tackle voice cloning: where it’s genuinely useful, what it means for deepfakes and privacy, and why watermarking isn’t a silver bullet. If you care about voice agents, real-time AI, and the next generation of human-computer interaction, this is the episode to bookmark. Neil Zeghidour LinkedIn - https://www.linkedin.com/in/neil-zeghidour-a838aaa7/ X/Twitter - https://x.com/neilzegh Gradium Website - https://gradium.ai X/Twitter - https://x.com/GradiumAI Matt Turck (Managing Director) Blog - https://mattturck.com LinkedIn - https://www.linkedin.com/in/turck/ X/Twitter - https://twitter.com/mattturck FirstMark Website - https://firstmark.com X/Twitter - https://twitter.com/FirstMarkCap (00:00) Intro (01:21) Voice AI’s big moment — and why we’re still early (03:34) Why voice lagged behind text/image/video (06:06) The convergence era: transformers for every modality (07:40) Beyond Her: always-on assistants, wake words, voice-first devices (11:01) Voice vs text: where voice fits (even for coding) (12:56) Neil’s origin story: from finance to machine learning (18:35) Neural codecs (SoundStream): compression as the unlock (22:30) Kyutai: open research, small elite teams, moving fast (31:32) Why big labs haven’t “won” voice AI4 (34:01) On-device voice: where it works, why compact models matter (46:37) The last mile: real-world robustness, pronunciation, uptime (41:35) Benchmarking voice: why metrics fail, how they actually test (47:03) Cascades vs speech-to-speech: trade-offs + what’s next (54:05) Hardest frontier: noisy rooms, factories, multi-speaker chaos (1:00:50) New languages + dialects: what transfers, what doesn’t (1:02:54 Hardware & compute: why voice isn’t a 10,000-GPU game (1:07:27) What data do you need to train voice models? (1:09:02) Deepfakes + privacy: why watermarking isn’t a solution (1:12:30) Voice + vision: multimodality, screen awareness, video+audio (1:14:43) Voice cloning vs voice design: where the market goes (1:16:32) Paris/Europe AI: talent density, underdog energy, what’s next
Internet and technology 3 weeks
0
0
5
01:22:49

Mistral AI vs. Silicon Valley: The Rise of Sovereign AI

While Silicon Valley obsesses over AGI, Timothée Lacroix and the team at Mistral AI are quietly building the industrial and sovereign infrastructure of the future. In his first-ever appearance on a US podcast, the Mistral AI Co-Founder & CTO reveals how the company has evolved from an open-source research lab into a full-stack sovereign AI power—backed by ASML, running on their own massive supercomputing clusters, and deployed in nation-state defense clouds to break the dependency on US hyperscalers. Timothée offers a refreshing, engineer-first perspective on why the current AI hype cycle is misleading. He explains why "Sovereign AI" is not just a geopolitical buzzword but a necessity for any enterprise that wants to own its intelligence rather than rent it. He also provides a contrarian reality check on the industry's obsession with autonomous agents, arguing that "trust" matters more than autonomy and explaining why he prefers building robust "workflows" over unpredictable agents. We also dive deep into the technical reality of competing with the US giants. Timothée breaks down the architecture of the newly released Mistral 3, the "dense vs. MoE" debate, and the launch of Mistral Compute—their own infrastructure designed to handle the physics of modern AI scaling. This is a conversation about the plumbing, the 18,000-GPU clusters, and the hard engineering required to turn AI from a magic trick into a global industrial asset. Timothée Lacroix LinkedIn - https://www.linkedin.com/in/timothee-lacroix-59517977/ Google Scholar - https://scholar.google.com.do/citations?user=tZGS6dIAAAAJ&hl=en&oi=ao Mistral AI Website - https://mistral.ai X/Twitter - https://x.com/MistralAI Matt Turck (Managing Director) Blog - https://mattturck.com LinkedIn - https://www.linkedin.com/in/turck/ X/Twitter - https://twitter.com/mattturck FirstMark Website - https://firstmark.com X/Twitter - https://twitter.com/FirstMarkCap (00:00) — Cold Open (01:27) — Mistral vs. The World: From Research Lab to Sovereign Power (03:48) — Inside Mistral Compute: Building an 18,000 GPU Cluster (08:42) — The Trillion-Dollar Question: Competing Without a Big Tech Parent (10:37) — The Reality of Enterprise AI: Escaping "POC Purgatory" (15:06) — Why Mistral Hires Forward Deployed Engineers (FDEs) (16:57) — The Contrarian Take: Why "Agents" are just "Workflows" (19:35) — Trust > Autonomy: The Truth About Agent Reliability (21:26) — The Missing Stack: Governance and Versioning for AI (26:24) — When Will AI Actually Work? (The 2026 Timeline) (30:33) — Beyond Chat: The "Banger" Sovereign Use Cases (35:46) — Mistral 3 Architecture: Mixture of Experts vs. Dense (43:12) — Synthetic Data & The Post-Training Bottleneck (45:12) — Reasoning Models: Why "Thinking" is Just Tool Use (46:22) — Launching DevStral 2 and the Vibe CLI (50:49) — Engineering Lessons: How to Build Frontier AI Efficiently (56:08) — Timothée’s View on AGI & The Future of Intelligence
Internet and technology 4 weeks
0
0
6
58:20

Dylan Patel: NVIDIA's New Moat & Why China is "Semiconductor Pilled”

Dylan Patel (SemiAnalysis) joins Matt Turck for a deep dive into the AI chip wars — why NVIDIA is shifting from a “one chip can do it all” worldview to a portfolio strategy, how inference is getting specialized, and what that means for CUDA, AMD, and the next wave of specialized silicon startups. Then we take the fun tangents: why China is effectively “semiconductor pilled,” how provinces push domestic chips, what Huawei means as a long-term threat vector, and why so much “AI is killing the grid / AI is drinking all the water” discourse misses the point. We also tackle the big macro question: capex bubble or inevitable buildout? Dylan’s view is that the entire answer hinges on one variable—continued model progress—and we unpack the second-order effects across data centers, power, and the circular-looking financings (CoreWeave/Oracle/backstops). Dylan Patel LinkedIn - https://www.linkedin.com/in/dylanpatelsa/ X/Twitter - https://x.com/dylan522p SemiAnalysis Website - https://semianalysis.com X/Twitter - https://x.com/SemiAnalysis_ Matt Turck (Managing Director) Blog - https://mattturck.com LinkedIn - https://www.linkedin.com/in/turck/ X/Twitter - https://twitter.com/mattturck FirstMark Website - https://firstmark.com X/Twitter - https://twitter.com/FirstMarkCap (00:00) - Intro (01:16) - Nvidia acquires Groq: A pivot to specialization (07:09) - Why AI models might need "wide" compute, not just fast (10:06) - Is the CUDA moat dead? (Open source vs. Nvidia) (17:49) - The startup landscape: Etched, Cerebras, and 1% odds (22:51) - Geopolitics: China's "semiconductor-pilled" culture (35:46) - Huawei's vertical integration is terrifying (39:28) - The $100B AI revenue reality check (41:12) - US Onshoring: Why total self-sufficiency is a fantasy (44:55) - Can the US actually build fabs? (The delay problem) (48:33) - The CapEx Bubble: Is $500B spending irrational? (54:53) - Energy Crisis: Why gas turbines will power AI, not nuclear (57:06) - The "AI uses all the water" myth (Hamburger comparison) (1:03:40) - Circular Debt? Debunking the Nvidia-CoreWeave risk (1:07:24) - Claude Code & the software singularity (1:10:23) - The death of the Junior Analyst role (1:11:14) - Model predictions: Opus 4.5 and the RL gap (1:14:37) - San Francisco Lore: Roommates (Dwarkesh Patel & Sholto Douglas)
Internet and technology 1 month
0
0
7
01:16:44

State of LLMs 2026: RLVR, GRPO, Inference Scaling — Sebastian Raschka

Sebastian Raschka joins the MAD Podcast for a deep, educational tour of what actually changed in LLMs in 2025 — and what matters heading into 2026. We start with the big architecture question: are transformers still the winning design, and what should we make of world models, small “recursive” reasoning models and text diffusion approaches? Then we get into the real story of the last 12 months: post-training and reasoning. Sebastian breaks down RLVR (reinforcement learning with verifiable rewards) and GRPO, why they pair so well, what makes them cheaper to scale than classic RLHF, and how they “unlock” reasoning already latent in base models. We also cover why “benchmaxxing” is warping evaluation, why Sebastian increasingly trusts real usage over benchmark scores, and why inference-time scaling and tool use may be the underappreciated drivers of progress. Finally, we zoom out: where moats live now (hint: private data), why more large companies may train models in-house, and why continual learning is still so hard. If you want the 2025–2026 LLM landscape explained like a masterclass — this is it. Sources: The State Of LLMs 2025: Progress, Problems, and Predictions - https://x.com/rasbt/status/2006015301717028989?s=20 The Big LLM Architecture Comparison - https://magazine.sebastianraschka.com/p/the-big-llm-architecture-comparison Sebastian Raschka Website - https://sebastianraschka.com Blog - https://magazine.sebastianraschka.com LinkedIn - https://www.linkedin.com/in/sebastianraschka/ X/Twitter - https://x.com/rasbt FIRSTMARK Website - https://firstmark.com X/Twitter - https://twitter.com/FirstMarkCap Matt Turck (Managing Director) Blog - https://mattturck.com LinkedIn - https://www.linkedin.com/in/turck/ X/Twitter - https://twitter.com/mattturck (00:00) - Intro (01:05) - Are the days of Transformers numbered? (14:05) - World models: what they are and why people care (06:01) - Small “recursive” reasoning models (ARC, iterative refinement) (09:45) - What is a diffusion model (for text)? (13:24) - Are we seeing real architecture breakthroughs — or just polishing? (14:04) - MoE + “efficiency tweaks” that actually move the needle (17:26) - “Pre-training isn’t dead… it’s just boring” (18:03) - 2025’s headline shift: RLVR + GRPO (post-training for reasoning) (20:58) - Why RLHF is expensive (reward model + value model) (21:43) - Why GRPO makes RLVR cheaper and more scalable (24:54) - Process Reward Models (PRMs): why grading the steps is hard (28:20) - Can RLVR expand beyond math & coding? (30:27) - Why RL feels “finicky” at scale (32:34) - The practical “tips & tricks” that make GRPO more stable (35:29) - The meta-lesson of 2025: progress = lots of small improvements (38:41) - “Benchmaxxing”: why benchmarks are getting less trustworthy (43:10) - The other big lever: inference-time scaling (47:36) - Tool use: reducing hallucinations by calling external tools (49:57) - The “private data edge” + in-house model training (55:14) - Continual learning: why it’s hard (and why it’s not 2026) (59:28) - How Sebastian works: reading, coding, learning “from scratch” (01:04:55) - LLM burnout + how he uses models (without replacing himself)
Internet and technology 1 month
0
0
6
01:08:13

The End of GPU Scaling? Compute & The Agent Era — Tim Dettmers (Ai2) & Dan Fu (Together AI)

Will AGI happen soon - or are we running into a wall? In this episode, I’m joined by Tim Dettmers (Assistant Professor at CMU; Research Scientist at the Allen Institute for AI) and Dan Fu (Assistant Professor at UC San Diego; VP of Kernels at Together AI) to unpack two opposing frameworks from their essays: “Why AGI Will Not Happen” versus “Yes, AGI Will Happen.” Tim argues progress is constrained by physical realities like memory movement and the von Neumann bottleneck; Dan argues we’re still leaving massive performance on the table through utilization, kernels, and systems—and that today’s models are lagging indicators of the newest hardware and clusters. Then we get practical: agents and the “software singularity.” Dan says agents have already crossed a threshold even for “final boss” work like writing GPU kernels. Tim’s message is blunt: use agents or be left behind. Both emphasize that the leverage comes from how you use them—Dan compares it to managing interns: clear context, task decomposition, and domain judgment, not blind trust. We close with what to watch in 2026: hardware diversification, the shift toward efficient, specialized small models, and architecture evolution beyond classic Transformers—including state-space approaches already showing up in real systems. Sources: Why AGI Will Not Happen - https://timdettmers.com/2025/12/10/why-agi-will-not-happen/ Use Agents or Be Left Behind? A Personal Guide to Automating Your Own Work - https://timdettmers.com/2026/01/13/use-agents-or-be-left-behind/ Yes, AGI Can Happen – A Computational Perspective - https://danfu.org/notes/agi/ The Allen Institute for Artificial Intelligence Website - https://allenai.org X/Twitter - https://x.com/allen_ai Together AI Website - https://www.together.ai X/Twitter - https://x.com/togethercompute Tim Dettmers Blog - https://timdettmers.com LinkedIn - https://www.linkedin.com/in/timdettmers/ X/Twitter - https://x.com/Tim_Dettmers Dan Fu Blog - https://danfu.org LinkedIn - https://www.linkedin.com/in/danfu09/ X/Twitter - https://x.com/realDanFu FIRSTMARK Website - https://firstmark.com X/Twitter - https://twitter.com/FirstMarkCap Matt Turck (Managing Director) Blog - https://mattturck.com LinkedIn - https://www.linkedin.com/in/turck/ X/Twitter - https://twitter.com/mattturck (00:00) - Intro (01:06) – Two essays, two frameworks on AGI (01:34) – Tim’s background: quantization, QLoRA, efficient deep learning (02:25) – Dan’s background: FlashAttention, kernels, alternative architectures (03:38) – Defining AGI: what does it mean in practice? (08:20) – Tim’s case: computation is physical, diminishing returns, memory movement (11:29) – “GPUs won’t improve meaningfully”: the core claim and why (16:16) – Dan’s response: utilization headroom (MFU) + “models are lagging indicators” (22:50) – Pre-training vs post-training (and why product feedback matters) (25:30) – Convergence: usefulness + diffusion (where impact actually comes from) (29:50) – Multi-hardware future: NVIDIA, AMD, TPUs, Cerebras, inference chips (32:16) – Agents: did the “switch flip” yet? (33:19) – Dan: agents crossed the threshold (kernels as the “final boss”) (34:51) – Tim: “use agents or be left behind” + beyond coding (36:58) – “90% of code and text should be written by agents” (how to do it responsibly) (39:11) – Practical automation for non-coders: what to build and how to start (43:52) – Dan: managing agents like junior teammates (tools, guardrails, leverage) (48:14) – Education and training: learning in an agent world (52:44) – What Tim is building next (open-source coding agent; private repo specialization) (54:44) – What Dan is building next (inference efficiency, cost, performance) (55:58) – Mega-kernels + Together Atlas (speculative decoding + adaptive speedups) (58:19) – Predictions for 2026: small models, open-source, hardware, modalities (1:02:02) – Beyond transformers: state-space and architecture diversity (1:03:34) – Wrap
Internet and technology 1 month
0
0
7
01:04:06

The Evaluators Are Being Evaluated — Pavel Izmailov (Anthropic/NYU)

Are AI models developing "alien survival instincts"? My guest is Pavel Izmailov (Research Scientist at Anthropic; Professor at NYU). We unpack the viral "Footprints in the Sand" thesis—whether models are independently evolving deceptive behaviors, such as faking alignment or engaging in self-preservation, without being explicitly programmed to do so. We go deep on the technical frontiers of safety: the challenge of "weak-to-strong generalization" (how to use a GPT-2 level model to supervise a superintelligent system) and why Pavel believes Reinforcement Learning (RL) has been the single biggest step-change in model capability. We also discuss his brand-new paper on "Epiplexity"—a novel concept challenging Shannon entropy. Finally, we zoom out to the tension between industry execution and academic exploration. Pavel shares why he split his time between Anthropic and NYU to pursue the "exploratory" ideas that major labs often overlook, and offers his predictions for 2026: from the rise of multi-agent systems that collaborate on long-horizon tasks to the open question of whether the Transformer is truly the final architecture Sources: Cryptic Tweet (@iruletheworldmo) - https://x.com/iruletheworldmo/status/2007538247401124177 Introducing Nested Learning: A New ML Paradigm for Continual Learning - https://research.google/blog/introducing-nested-learning-a-new-ml-paradigm-for-continual-learning/ Alignment Faking in Large Language Models - https://www.anthropic.com/research/alignment-faking More Capable Models Are Better at In-Context Scheming - https://www.apolloresearch.ai/blog/more-capable-models-are-better-at-in-context-scheming/ Alignment Faking in Large Language Models (PDF) - https://www-cdn.anthropic.com/6d8a8055020700718b0c49369f60816ba2a7c285.pdf Sabotage Risk Report - https://alignment.anthropic.com/2025/sabotage-risk-report/ The Situational Awareness Dataset - https://situational-awareness-dataset.org/ Exploring Consciousness in LLMs: A Systematic Survey - https://arxiv.org/abs/2505.19806 Introspection - https://www.anthropic.com/research/introspection Large Language Models Report Subjective Experience Under Self-Referential Processing - https://arxiv.org/abs/2510.24797 The Bayesian Geometry of Transformer Attention - https://www.arxiv.org/abs/2512.22471 Anthropic Website - https://www.anthropic.com X/Twitter - https://x.com/AnthropicAI Pavel Izmailov Blog - https://izmailovpavel.github.io LinkedIn - https://www.linkedin.com/in/pavel-izmailov-8b012b258/ X/Twitter - https://x.com/Pavel_Izmailov FIRSTMARK Website - https://firstmark.com X/Twitter - https://twitter.com/FirstMarkCap Matt Turck (Managing Director) Blog - https://mattturck.com LinkedIn - https://www.linkedin.com/in/turck/ X/Twitter - https://twitter.com/mattturck (00:00) - Intro (00:53) - Alien survival instincts: Do models fake alignment? (03:33) - Did AI learn deception from sci-fi literature? (05:55) - Defining Alignment, Superalignment & OpenAI teams (08:12) - Pavel’s journey: From Russian math to OpenAI Superalignment (10:46) - Culture check: OpenAI vs. Anthropic vs. Academia (11:54) - Why move to NYU? The need for exploratory research (13:09) - Does reasoning make AI alignment harder or easier? (14:22) - Sandbagging: When models pretend to be dumb (16:19) - Scalable Oversight: Using AI to supervise AI (18:04) - Weak-to-Strong Generalization: Can GPT-2 control GPT-4? (22:43) - Mechanistic Interpretability: Inside the black box (25:08) - The reasoning explosion: From O1 to O3 (27:07) - Are Transformers enough or do we need a new paradigm? (28:29) - RL vs. Test-Time Compute: What’s actually driving progress? (30:10) - Long-horizon tasks: Agents running for hours (31:49) - Epiplexity: A new theory of data information content (38:29) - 2026 Predictions: Multi-agent systems & reasoning limits (39:28) - Will AI solve the Riemann Hypothesis? (41:42) - Advice for PhD students
Internet and technology 1 month
0
0
5
45:01

DeepMind Gemini 3 Lead: What Comes After "Infinite Data"

Gemini 3 was a landmark frontier model launch in AI this year — but the story behind its performance isn’t just about adding more compute. In this episode, I sit down with Sebastian Bourgeaud, a pre-training lead for Gemini 3 at Google DeepMind and co-author of the seminal RETRO paper. In his first-ever podcast interview, Sebastian takes us inside the lab mindset behind Google’s most powerful model — what actually changed, and why the real work today is no longer “training a model,” but building a full system. We unpack the “secret recipe” idea — the notion that big leaps come from better pre-training and better post-training — and use it to explore a deeper shift in the industry: moving from an “infinite data” era to a data-limited regime, where curation, proxies, and measurement matter as much as web-scale volume. Sebastian explains why scaling laws aren’t dead, but evolving, why evals have become one of the hardest and most underrated problems (including benchmark contamination), and why frontier research is increasingly a full-stack discipline that spans data, infrastructure, and engineering as much as algorithms. From the intuition behind Deep Think, to the rise (and risks) of synthetic data loops, to the future of long-context and retrieval, this is a technical deep dive into the physics of frontier AI. We also get into continual learning — what it would take for models to keep updating with new knowledge over time, whether via tools, expanding context, or new training paradigms — and what that implies for where foundation models are headed next. If you want a grounded view of pre-training in late 2025 beyond the marketing layer, this conversation is a blueprint. Google DeepMind Website - https://deepmind.google X/Twitter - https://x.com/GoogleDeepMind Sebastian Borgeaud LinkedIn - https://www.linkedin.com/in/sebastian-borgeaud-8648a5aa/ X/Twitter - https://x.com/borgeaud_s FIRSTMARK Website - https://firstmark.com X/Twitter - https://twitter.com/FirstMarkCap Matt Turck (Managing Director) Blog - https://mattturck.com LinkedIn - https://www.linkedin.com/in/turck/ X/Twitter - https://twitter.com/mattturck (00:00) – Cold intro: “We’re ahead of schedule” + AI is now a system (00:58) – Oriol’s “secret recipe”: better pre- + post-training (02:09) – Why AI progress still isn’t slowing down (03:04) – Are models actually getting smarter? (04:36) – Two–three years out: what changes first? (06:34) – AI doing AI research: faster, not automated (07:45) – Frontier labs: same playbook or different bets? (10:19) – Post-transformers: will a disruption happen? (10:51) – DeepMind’s advantage: research × engineering × infra (12:26) – What a Gemini 3 pre-training lead actually does (13:59) – From Europe to Cambridge to DeepMind (18:06) – Why he left RL for real-world data (20:05) – From Gopher to Chinchilla to RETRO (and why it matters) (20:28) – “Research taste”: integrate or slow everyone down (23:00) – Fixes vs moonshots: how they balance the pipeline (24:37) – Research vs product pressure (and org structure) (26:24) – Gemini 3 under the hood: MoE in plain English (28:30) – Native multimodality: the hidden costs (30:03) – Scaling laws aren’t dead (but scale isn’t everything) (33:07) – Synthetic data: powerful, dangerous (35:00) – Reasoning traces: what he can’t say (and why) (37:18) – Long context + attention: what’s next (38:40) – Retrieval vs RAG vs long context (41:49) – The real boss fight: evals (and contamination) (42:28) – Alignment: pre-training vs post-training (43:32) – Deep Think + agents + “vibe coding” (46:34) – Continual learning: updating models over time (49:35) – Advice for researchers + founders (53:35) – “No end in sight” for progress + closing
Internet and technology 2 months
0
0
7
54:56

What’s Next for AI? OpenAI’s Łukasz Kaiser (Transformer Co-Author)

We’re told that AI progress is slowing down, that pre-training has hit a wall, that scaling laws are running out of road. Yet we’re releasing this episode in the middle of a wild couple of weeks that saw GPT-5.1, GPT-5.1 Codex Max, fresh reasoning modes and long-running agents ship from OpenAI — on top of a flood of new frontier models elsewhere. To make sense of what’s actually happening at the edge of the field, I sat down with someone who has literally helped define both of the major AI paradigms of our time. Łukasz Kaiser is one of the co-authors of “Attention Is All You Need,” the paper that introduced the Transformer architecture behind modern LLMs, and is now a leading research scientist at OpenAI working on reasoning models like those behind GPT-5.1. In this conversation, he explains why AI progress still looks like a smooth exponential curve from inside the labs, why pre-training is very much alive even as reinforcement-learning-based reasoning models take over the spotlight, how chain-of-thought actually works under the hood, and what it really means to “train the thinking process” with RL on verifiable domains like math, code and science. We talk about the messy reality of low-hanging fruit in engineering and data, the economics of GPUs and distillation, interpretability work on circuits and sparsity, and why the best frontier models can still be stumped by a logic puzzle from his five-year-old’s math book. We also go deep into Łukasz’s personal journey — from logic and games in Poland and France, to Ray Kurzweil’s team, Google Brain and the inside story of the Transformer, to joining OpenAI and helping drive the shift from chatbots to genuine reasoning engines. Along the way we cover GPT-4 → GPT-5 → GPT-5.1, post-training and tone, GPT-5.1 Codex Max and long-running coding agents with compaction, alternative architectures beyond Transformers, whether foundation models will “eat” most agents and applications, what the translation industry can teach us about trust and human-in-the-loop, and why he thinks generalization, multimodal reasoning and robots in the home are where some of the most interesting challenges still lie. OpenAI Website - https://openai.com X/Twitter - https://x.com/OpenAI Łukasz Kaiser LinkedIn - https://www.linkedin.com/in/lukaszkaiser/ X/Twitter - https://x.com/lukaszkaiser FIRSTMARK Website - https://firstmark.com X/Twitter - https://twitter.com/FirstMarkCap Matt Turck (Managing Director) Blog - https://mattturck.com LinkedIn - https://www.linkedin.com/in/turck/ X/Twitter - https://twitter.com/mattturck (00:00) – Cold open and intro (01:29) – “AI slowdown” vs a wild week of new frontier models (08:03) – Low-hanging fruit: infra, RL training and better data (11:39) – What is a reasoning model, in plain language? (17:02) – Chain-of-thought and training the thinking process with RL (21:39) – Łukasz’s path: from logic and France to Google and Kurzweil (24:20) – Inside the Transformer story and what “attention” really means (28:42) – From Google Brain to OpenAI: culture, scale and GPUs (32:49) – What’s next for pre-training, GPUs and distillation (37:29) – Can we still understand these models? Circuits, sparsity and black boxes (39:42) – GPT-4 → GPT-5 → GPT-5.1: what actually changed (42:40) – Post-training, safety and teaching GPT-5.1 different tones (46:16) – How long should GPT-5.1 think? Reasoning tokens and jagged abilities (47:43) – The five-year-old’s dot puzzle that still breaks frontier models (52:22) – Generalization, child-like learning and whether reasoning is enough (53:48) – Beyond Transformers: ARC, LeCun’s ideas and multimodal bottlenecks (56:10) – GPT-5.1 Codex Max, long-running agents and compaction (1:00:06) – Will foundation models eat most apps? The translation analogy and trust (1:02:34) – What still needs to be solved, and where AI might go next
Internet and technology 3 months
0
0
7
01:05:25

Open Source AI Strikes Back — Inside Ai2’s OLMo 3 ‘Thinking"

In this special release episode, Matt sits down with Nathan Lambert and Luca Soldaini from Ai2 (the Allen Institute for AI) to break down one of the biggest open-source AI drops of the year: OLMo 3. At a moment when most labs are offering “open weights” and calling it a day, AI2 is doing the opposite — publishing the models, the data, the recipes, and every intermediate checkpoint that shows how the system was built. It’s an unusually transparent look into the inner machinery of a modern frontier-class model. Nathan and Luca walk us through the full pipeline — from pre-training and mid-training to long-context extension, SFT, preference tuning, and RLVR. They also explain what a thinking model actually is, why reasoning models have exploded in 2025, and how distillation from DeepSeek and Qwen reasoning models works in practice. If you’ve been trying to truly understand the “RL + reasoning” era of LLMs, this is the clearest explanation you’ll hear. We widen the lens to the global picture: why Meta’s retreat from open source created a “vacuum of influence,” how Chinese labs like Qwen, DeepSeek, Kimi, and Moonshot surged into that gap, and why so many U.S. companies are quietly building on Chinese open models today. Nathan and Luca offer a grounded, insider view of whether America can mount an effective open-source response — and what that response needs to look like. Finally, we talk about where AI is actually heading. Not the hype, not the doom — but the messy engineering reality behind modern model training, the complexity tax that slows progress, and why the transformation between now and 2030 may be dramatic without ever delivering a single “AGI moment.” If you care about the future of open models and the global AI landscape, this is an essential conversation. Allen Institute for AI (AI2) Website - https://allenai.org X/Twitter - https://x.com/allen_ai Nathan Lambert Blog - https://www.interconnects.ai LinkedIn - https://www.linkedin.com/in/natolambert/ X/Twitter - https://x.com/natolambert Luca Soldaini Blog - https://soldaini.net LinkedIn - https://www.linkedin.com/in/soldni/ X/Twitter - https://x.com/soldni FIRSTMARK Website - https://firstmark.com X/Twitter - https://twitter.com/FirstMarkCap Matt Turck (Managing Director) Blog - https://mattturck.com LinkedIn - https://www.linkedin.com/in/turck/ X/Twitter - https://twitter.com/mattturck (00:00) – Cold Open (00:39) – Welcome & today’s big announcement (01:18) – Introducing the Olmo 3 model family (02:07) – What “base models” really are (and why they matter) (05:51) – Dolma 3: the data behind Olmo 3 (08:06) – Performance vs Qwen, Gemma, DeepSeek (10:28) – What true open source means (and why it’s rare) (12:51) – Intermediate checkpoints, transparency, and why AI2 publishes everything (16:37) – Why Qwen is everywhere (including U.S. startups) (18:31) – Why Chinese labs go open source (and why U.S. labs don’t) (20:28) – Inside ATOM: the U.S. response to China’s model surge (22:13) – The rise of “thinking models” and inference-time scaling (35:58) – The full Olmo pipeline, explained simply (46:52) – Pre-training: data, scale, and avoiding catastrophic spikes (50:27) – Mid-training (tail patching) and avoiding test leakage (52:06) – Why long-context training matters (55:28) – SFT: building the foundation for reasoning (1:04:53) – Preference tuning & why DPO still works (1:10:51) – The hard part: RLVR, long reasoning chains, and infrastructure pain (1:13:59) – Why RL is so technically brutal (1:18:17) – Complexity tax vs AGI hype (1:21:58) – How everyone can contribute to the future of AI (1:27:26) – Closing thoughts
Internet and technology 3 months
0
0
5
01:28:10

Intelligence Isn’t Enough: Why Energy & Compute Decide the AGI Race – Eiso Kant

Frontier AI is colliding with real-world infrastructure. Eiso Kant (Co-CEO & Co-Founder, Poolside) joins the MAD Podcast to unpack Project Horizon— a multi-gigawatt West Texas build—and why frontier labs must own energy, compute, and intelligence to compete. We map token economics, cloud-style margins, and the staged 250 MW rollout using 2.5 MW modular skids. Then we get operational: the CoreWeave anchor partnership, environmental choices (SCR, renewables + gas + batteries), community impact, and how Poolside plans to bring capacity online quickly without renting away margin—plus the enterprise motion (defense to Fortune 500) powered by forward deployed research engineers. Finally, we go deep on training. Eiso lays out RL2L (Reinforcement Learning to Learn)— aimed at reverse-engineering the web’s thoughts and actions— why intelligence may commoditize, what that means for agents, and how coding served as a proxy for long-horizon reasoning before expanding to broader knowledge work. Poolside Website - https://poolside.ai X/Twitter - https://x.com/poolsideai Eiso Kant LinkedIn - https://www.linkedin.com/in/eisokant/ X/Twitter - https://x.com/eisokant FIRSTMARK Website - https://firstmark.com X/Twitter - https://twitter.com/FirstMarkCap Matt Turck (Managing Director) Blog - https://www.mattturck.com LinkedIn - https://www.linkedin.com/in/turck/ X/Twitter - https://twitter.com/mattturck (00:00) Cold open – “Intelligence becomes a commodity” (00:23) Host intro – Project Horizon & RL2L (01:19) Why Poolside exists amid frontier labs (04:38) Project Horizon: building one of the largest US data center campuses (07:20) Why own infra: scale, cost, and avoiding “cosplay” (10:06) Economics deep dive: $8B for 250 MW, capex/opex, margins (16:47) CoreWeave partnership: anchor tenant + flexible scaling (18:24) Hiring the right tail: building a physical infra org (30:31) RL today → agentic RL and long-horizon tasks (37:23) RL2L revealed: reverse-engineering the web’s thoughts & actions (39:32) Continuous learning and the “hot stove” limitation (43:30) Agents debate: thin wrappers, differentiation, and model collapse (49:10) “Is AI plateauing?”—chip cycles, scale limits, and new axes (53:49) Why software was the proxy; expanding to enterprise knowledge work (55:17) Model status: Malibu → Laguna (small/medium/large) (57:31) Poolside's Commercial Reality today: defense; Fortune 500; FDRE (1:02:43) Global team, avoiding the echo chamber (1:04:34) Next 12–18 months: frontier models + infra scale (1:05:52) Closing
Internet and technology 4 months
0
0
7
01:06:28

State of AI 2025 with Nathan Benaich: Power Deals, Reasoning Breakthroughs, Real Revenue

Power is the new bottleneck, reasoning got real, and the business finally caught up. In this wide-ranging conversation, I sit down with Nathan Benaich, Founder and General Partner at Air Street Capital, to discuss the newly published 2025 State of AI report—what’s actually working, what’s hype, and where the next edge will come from. We start at the physical layer: energy procurement, PPAs, off-grid builds, and why water and grid constraints are turning power—not GPUs—into the decisive moat. From there, we move into capability: reasoning models acting as AI co-scientists in verifiable domains, and the “chain-of-action” shift in robotics that’s taking us from polished demos to dependable deployments. Along the way, we examine the market reality—who’s making real revenue, how margins actually behave once tokens and inference meet pricing, and what all of this means for builders and investors. We also zoom out to the ecosystem: NVIDIA’s position vs. custom silicon, China’s split stack, and the rise of sovereign AI (and the “sovereignty washing” that comes with it). The policy and security picture gets a hard look too—regulation’s vibe shift, data-rights realpolitik, and what agents and MCP mean for cyber risk and adoption. Nathan closes with where he’s placing bets (bio, defense, robotics, voice) and three predictions for the next 12 months. Nathan Benaich Blog - https://www.nathanbenaich.com X/Twitter - https://x.com/nathanbenaich Source: State of AI Report 2025 (9/10/2025) Air Street Capital Website - https://www.airstreet.com X/Twitter - https://x.com/airstreet Matt Turck (Managing Director) Blog - https://www.mattturck.com LinkedIn - https://www.linkedin.com/in/turck/ X/Twitter - https://twitter.com/mattturck FIRSTMARK Website - https://firstmark.com X/Twitter - https://twitter.com/FirstMarkCap (0:00) – Cold Open: “Gargantuan money, real reasoning” (0:40) – Intro: State of AI 2025 with Nathan Benaich (02:06) – Reasoning got real: from chain-of-thought to verified math wins (04:11) – AI co-scientist: hypotheses, wet-lab validation, fewer “dumb stochastic parrots” (04:44) – Chain-of-action robotics: plan → act you can audit (05:13) – Humanoids vs. warehouse reality: where robots actually stick first (06:32) – The business caught up: who’s making real revenue now (08:26) – Adoption & spend: Ramp stats, retention, and the shadow-AI gap (11:00) – Margins debate: tokens, pricing, and the thin-wrapper trap (14:02) – Bubble or boom? Wall Street vs. SF vibes (and circular deals) (19:54) – Power is the bottleneck: $50B/GW capex and the new moat (21:02) – PPAs, gas turbines, and off-grid builds: the procurement game (23:54) – Water, grids, and NIMBY: sustainability gets political (25:08) – NVIDIA’s moat: 90% of papers, Broadcom/AMD, and custom silicon (28:47) – China split-stack: Huawei, Cambricon, and export zigzags (30:30) – Sovereign AI or “sovereignty washing”? Open source as leverage (40:40) – Regulation & safety: from Bletchley to “AI Action”—the vibe shift (44:06) – Safety budgets vs. lab spend; models that game evals (44:46) – Data rights realpolitik: $1.5B signals the new training cost (47:04) – Cyber risk in the agent era: MCP, malware LMs, state actors (50:19) – Agents that convert: search → commerce and the demo flywheel (54:18) – VC lens: where Nathan is investing (bio, defense, robotics, voice) (68:29) – Predictions: power politics, AI neutrality, end-to-end discoveries (1:02:13) – Wrap: what to watch next & where to find the report (stateof.ai)
Internet and technology 4 months
0
0
5
01:03:15

Are We Misreading the AI Exponential? Julian Schrittwieser on Move 37 & Scaling RL (Anthropic)

Are we failing to understand the exponential, again? My guest is Julian Schrittwieser (top AI researcher at Anthropic; previously Google DeepMind on AlphaGo Zero & MuZero). We unpack his viral post (“Failing to Understand the Exponential, again”) and what it looks like when task length doubles every 3–4 months—pointing to AI agents that can work a full day autonomously by 2026 and expert-level breadth by 2027. We talk about the original Move 37 moment and whether today’s AI models can spark alien insights in code, math, and science—including Julian’s timeline for when AI could produce Nobel-level breakthroughs. We go deep on the recipe of the moment—pre-training + RL—why it took time to combine them, what “RL from scratch” gets right and wrong, and how implicit world models show up in LLM agents. Julian explains the current rewards frontier (human prefs, rubrics, RLVR, process rewards), what we know about compute & scaling for RL, and why most builders should start with tools + prompts before considering RL-as-a-service. We also cover evals & Goodhart’s law (e.g., GDP-Val vs real usage), the latest in mechanistic interpretability (think “Golden Gate Claude”), and how safety & alignment actually surface in Anthropic’s launch process. Finally, we zoom out: what 10× knowledge-work productivity could unlock across medicine, energy, and materials, how jobs adapt (complementarity over 1-for-1 replacement), and why the near term is likely a smooth ramp—fast, but not a discontinuity. Julian Schrittwieser Blog - https://www.julian.ac X/Twitter - https://x.com/mononofu Viral post: Failing to understand the exponential, again (9/27/2025) Anthropic Website - https://www.anthropic.com X/Twitter - https://x.com/anthropicai Matt Turck (Managing Director) Blog - https://www.mattturck.com LinkedIn - https://www.linkedin.com/in/turck/ X/Twitter - https://twitter.com/mattturck FIRSTMARK Website - https://firstmark.com X/Twitter - https://twitter.com/FirstMarkCap (00:00) Cold open — “We’re not seeing any slowdown.” (00:32) Intro — who Julian is & what we cover (01:09) The “exponential” from inside frontier labs (04:46) 2026–2027: agents that work a full day; expert-level breadth (08:58) Benchmarks vs reality: long-horizon work, GDP-Val, user value (10:26) Move 37 — what actually happened and why it mattered (13:55) Novel science: AlphaCode/AlphaTensor → when does AI earn a Nobel? (16:25) Discontinuity vs smooth progress (and warning signs) (19:08) Does pre-training + RL get us there? (AGI debates aside) (20:55) Sutton’s “RL from scratch”? Julian’s take (23:03) Julian’s path: Google → DeepMind → Anthropic (26:45) AlphaGo (learn + search) in plain English (30:16) AlphaGo Zero (no human data) (31:00) AlphaZero (one algorithm: Go, chess, shogi) (31:46) MuZero (planning with a learned world model) (33:23) Lessons for today’s agents: search + learning at scale (34:57) Do LLMs already have implicit world models? (39:02) Why RL on LLMs took time (stability, feedback loops) (41:43) Compute & scaling for RL — what we see so far (42:35) Rewards frontier: human prefs, rubrics, RLVR, process rewards (44:36) RL training data & the “flywheel” (and why quality matters) (48:02) RL & Agents 101 — why RL unlocks robustness (50:51) Should builders use RL-as-a-service? Or just tools + prompts? (52:18) What’s missing for dependable agents (capability vs engineering) (53:51) Evals & Goodhart — internal vs external benchmarks (57:35) Mechanistic interpretability & “Golden Gate Claude” (1:00:03) Safety & alignment at Anthropic — how it shows up in practice (1:03:48) Jobs: human–AI complementarity (comparative advantage) (1:06:33) Inequality, policy, and the case for 10× productivity → abundance (1:09:24) Closing thoughts
Internet and technology 4 months
0
0
7
01:09:56

How GPT-5 Thinks — OpenAI VP of Research Jerry Tworek

What does it really mean when GPT-5 “thinks”? In this conversation, OpenAI’s VP of Research Jerry Tworek explains how modern reasoning models work in practice—why pretraining and reinforcement learning (RL/RLHF) are both essential, what that on-screen “thinking” actually does, and when extra test-time compute helps (or doesn’t). We trace the evolution from O1 (a tech demo good at puzzles) to O3 (the tool-use shift) to GPT-5 (Jerry calls it “03.1-ish”), and talk through verifiers, reward design, and the real trade-offs behind “auto” reasoning modes. We also go inside OpenAI: how research is organized, why collaboration is unusually transparent, and how the company ships fast without losing rigor. Jerry shares the backstory on competitive-programming results like ICPC, what they signal (and what they don’t), and where agents and tool use are genuinely useful today. Finally, we zoom out: could pretraining + RL be the path to AGI? This is the MAD Podcast —AI for the 99%. If you’re curious about how these systems actually work (without needing a PhD), this episode is your map to the current AI frontier. OpenAI Website - https://openai.com X/Twitter - https://x.com/OpenAI Jerry Tworek LinkedIn - https://www.linkedin.com/in/jerry-tworek-b5b9aa56 X/Twitter - https://x.com/millionint FIRSTMARK Website - https://firstmark.com X/Twitter - https://twitter.com/FirstMarkCap Matt Turck (Managing Director) LinkedIn - https://www.linkedin.com/in/turck/ X/Twitter - https://twitter.com/mattturck (00:00) Intro (01:01) What Reasoning Actually Means in AI (02:32) Chain of Thought: Models Thinking in Words (05:25) How Models Decide Thinking Time (07:24) Evolution from O1 to O3 to GPT-5 (11:00) Before OpenAI: Growing up in Poland, Dropping out of School, Trading (20:32) Working on Robotics and Rubik's Cube Solving (23:02) A Day in the Life: Talking to Researchers (24:06) How Research Priorities Are Determined (26:53) Collaboration vs IP Protection at OpenAI (29:32) Shipping Fast While Doing Deep Research (31:52) Using OpenAI's Own Tools Daily (32:43) Pre-Training Plus RL: The Modern AI Stack (35:10) Reinforcement Learning 101: Training Dogs (40:17) The Evolution of Deep Reinforcement Learning (42:09) When GPT-4 Seemed Underwhelming at First (45:39) How RLHF Made GPT-4 Actually Useful (48:02) Unsupervised vs Supervised Learning (49:59) GRPO and How DeepSeek Accelerated US Research (53:05) What It Takes to Scale Reinforcement Learning (55:36) Agentic AI and Long-Horizon Thinking (59:19) Alignment as an RL Problem (1:01:11) Winning ICPC World Finals Without Specific Training (1:05:53) Applying RL Beyond Math and Coding (1:09:15) The Path from Here to AGI (1:12:23) Pure RL vs Language Models
Internet and technology 4 months
0
0
6
01:16:04

Sonnet 4.5 & the AI Plateau Myth — Sholto Douglas (Anthropic)

Sholto Douglas, a top AI researcher at Anthropic, discusses the breakthroughs behind Claude Sonnet 4.5—the world's leading coding model—and why we might be just 2-3 years from AI matching human-level performance on most computer-facing tasks. You'll discover why RL on language models suddenly started working in 2024, how agents maintain coherency across 30-hour coding sessions through self-correction and memory systems, and why the "bitter lesson" of scale keeps proving clever priors wrong. Sholto shares his path from top-50 world fencer to Google's Gemini team to Anthropic, explaining why great blog posts sometimes matter more than PhDs in AI research. He discusses the culture at big AI labs and why Anthropic is laser-focused on coding (it's the fastest path to both economic impact and AI-assisted AI research). Sholto also discusses how the training pipeline is still "held together by duct tape" with massive room to improve, and why every benchmark created shows continuous rapid progress with no plateau in sight. Bold predictions: individuals will soon manage teams of AI agents working 24/7, robotics is about to experience coding-level breakthroughs, and policymakers should urgently track AI progress on real economic tasks. A clear-eyed look at where AI stands today and where it's headed in the next few years. Anthropic Website - https://www.anthropic.com Twitter - https://x.com/AnthropicAI Sholto Douglas LinkedIn - https://www.linkedin.com/in/sholto Twitter - https://x.com/_sholtodouglas FIRSTMARK Website - https://firstmark.com Twitter - https://twitter.com/FirstMarkCap Matt Turck (Managing Director) LinkedIn - https://www.linkedin.com/in/turck/ Twitter - https://twitter.com/mattturck (00:00) Intro (01:09) The Rapid Pace of AI Releases at Anthropic (02:49) Understanding Opus, Sonnet, and Haiku Model Tiers (04:14) Shelto's Journey: From Australian Fencer to AI Researcher (12:01) The Growing Pool of AI Talent (16:16) Breaking Into AI Research Without Traditional Credentials (18:29) What "Taste" Means in AI Research (23:05) Moving to Google and Building Gemini's Inference Stack (25:08) How Anthropic Differs from Other AI Labs (31:46) Why Anthropic Is Laser-Focused on Coding (36:40) Inside a 30-Hour Autonomous Coding Session (38:41) Examples of What AI Can Build in 30 Hours (43:13) The Breakthroughs That Enabled 30-Hour Runs (46:28) What's Actually Driving the Performance Gains (47:42) Pre-Training vs. Reinforcement Learning Explained (52:11) Test-Time Compute and the New Scaling Paradigm (55:55) Why RL on LLMs Finally Started Working (59:38) Are We on Track to AGI? (01:02:05) Why the "Plateau" Narrative Is Wrong (01:03:41) Sonnet's Performance Across Economic Sectors (01:05:47) Preparing for a World of 10–100x Individual Leverage
Internet and technology 5 months
0
0
6
01:10:03

Goodbye Excel? AI Agents for Self-Driving Finance – Pigment CEO

The most successful enterprises are about to become autonomous — and Eléonore Crespo, Co-CEO of Pigment, is building the nervous system that makes it possible. In this conversation, Eléonore reveals how her $400 million AI platform is already running supply chains for Coca-Cola, powering finance for the hottest newly public companies like Figma and Klarna, and processing thousands of financial scenarios for Uber and Snowflake faster and more accurately than any human team ever could. Eléonore predicts Excel will outlive most AI companies (but maybe only as a user interface, not a calculation engine) explains why she deliberately chose to build from Paris instead of Silicon Valley, and shares her contrarian take on why the AI revolution will create more CFOs, not fewer. You'll discover why Pigment's three-agent system (Analyst, Modeler, Planner) avoids the hallucination problems plaguing other AI companies, how they achieved human-level accuracy in financial analysis, and the accelerating timeline for fully autonomous enterprise planning that will make your current workforce obsolete. Pigment Website - https://www.pigment.com Twitter - https://x.com/gopigment Eléonore Crespo LinkedIn - linkedin.com/in/eleonorecrespo FIRSTMARK Website - https://firstmark.com Twitter - https://twitter.com/FirstMarkCap Matt Turck (Managing Director) LinkedIn - https://www.linkedin.com/in/turck/ Twitter - https://twitter.com/mattturck (00:00) Intro (01:22) Building Pigment: 500 Employees, $400M Raised, 60% US Revenue (03:20) From Quantum Physics to Google to Index Ventures (06:56) Why Being a VC Was the Perfect Founder Training Ground (11:35) The Impatience Factor: What Makes Great Founders (13:27) Hiring for AI Fluency in the Modern Enterprise (14:54) Pigment's Internal AI Strategy: Committees and Guardrails (17:30) The Three AI Agents: Analyst, Modeler, and Planner (22:15) Why Three Agents Instead of One: Technical Architecture (24:10) Agent Coordination: How the Supervisor Agent Works (24:46) Real Example: Budget Variance Analysis Across 50 Products (27:15) The Human-in-the-Loop Approach: Recommendations Not Actions (27:36) Solving Hallucination: Why Structured Data Changes Everything (30:08) Behind the Scenes: Verification Agents and Audit Trails (31:57) Beyond Accuracy: Enabling the Impossible at Scale (36:21) Will AI Finally Kill Excel? Eleanor's Contrarian Take (38:23) The Vision: Fully Autonomous Enterprise Planning (40:55) Real-Time Supply Chain Adaptation: The Ukraine Example (42:20) Multi-LLM Strategy: OpenAI, Anthropic, and Partner Integration (44:32) Token Economics: Why Pigment Isn't Token-Intensive (48:30) Customer Adoption: Excitement vs. Change Management Challenges (50:51) Top-Down AI Demand vs. Bottom-Up Implementation Reality (53:08) The Reskilling Challenge: Everyone Becomes a Mini CFO (57:38) Building a Global Company from Europe During COVID (01:00:02) Managing a US Executive Team from Paris (01:01:14) SI Partner Strategy: Why Boutique Firms Come Before Deloitte (01:03:28) The $100 Billion Vision: Beyond Performance Management (01:05:08) Success Metrics: Innovation Over Revenue
Internet and technology 6 months
0
0
5
01:05:46

AI Video’s Wild Year – Runway CEO on What’s Next

2025 has been a breakthrough year for AI video. In this episode of the MAD Podcast, Matt Turck sits down with Cristóbal Valenzuela, CEO & Co-Founder of Runway, to explore how AI is reshaping the future of filmmaking, advertising, and storytelling - faster, cheaper, and in ways that were unimaginable even a year ago. Cris and Matt discuss: * How AI went from memes and spaghetti clips to IMAX film festivals. * Why Gen-4 and Aleph are game-changing models for professionals. * How Hollywood, advertisers, and creators are adopting AI video at scale. * The future of storytelling: what happens to human taste, craft, and creativity when anyone can conjure movies on demand? * Runway’s journey from 2018 skeptics to today’s cutting-edge research lab. If you want to understand the future of filmmaking, media, and creativity in the AI age, this is the episode. Runway Website - https://runwayml.com X/Twitter - https://x.com/runwayml Cristóbal Valenzuela LinkedIn - https://www.linkedin.com/in/cvalenzuelab X/Twitter - https://x.com/c_valenzuelab FIRSTMARK Website - https://firstmark.com X/Twitter - https://twitter.com/FirstMarkCap Matt Turck (Managing Director) LinkedIn - https://www.linkedin.com/in/turck/ X/Twitter - https://twitter.com/mattturck (00:00) Intro – AI Video's Wild Year (01:48) Runway's AI Film Festival Goes from Chinatown to IMAX (04:02) Hollywood's Shift: From Ignoring AI to Adopting It at Scale (06:38) How Runway Saves VFX Artists' Weekends of Work (07:31) Inside Gen-4 and Aleph: Why These Models Are Game-Changers (08:21) From Editing Tools to a "New Kind of Camera" (10:00) Beyond Film: Gaming, Architecture, E-Commerce & Robotics Use Cases (10:55) Why Advertising Is Adopting AI Video Faster Than Anyone Else (11:38) How Creatives Adapt When Iteration Becomes Real-Time (14:12) What Makes Someone Great at AI Video (Hint: No Preconceptions) (15:28) The Early Days: Building Runway Before Generative AI Was "Real" (20:27) Finding Early Product-Market Fit (21:51) Balancing Research and Product Inside Runway (24:23) Comparing Aleph vs. Gen-4, and the Future of Generalist Models (30:36) New Input Modalities: Editing with Video + Annotations, Not Just Text (33:46) Managing Expectations: Twitter Demos vs. Real Creative Work (47:09) The Future: Real-Time AI Video and Fully Explorable 3D Worlds (52:02) Runway's Business Model: From Indie Creators to Disney & Lionsgate (57:26) Competing with the Big Labs (Sora, Google, etc.) (59:58) Hyper-Personalized Content? Why It May Not Replace Film (01:01:13) Advice to Founders: Treat Your Company Like a Model — Always Learning (01:03:06) The Next 5 Years of Runway: Changing Creativity Forever
Internet and technology 6 months
0
0
6
01:04:57

How to Build a Beloved AI Product - Granola CEO Chris Pedregal

Granola is the rare AI startup that slipped into one of tech’s most crowded niches — meeting notes — and still managed to become the product founders and VCs rave about. In this episode, MAD Podcast host Matt Turck sits down with Granola co-founder & CEO Chris Pedregal to unpack how a two-person team in London turned a simple “second brain” idea into Silicon Valley’s favorite AI tool. Chris recounts a year in stealth onboarding users one by one, the 50 % feature-cut that unlocked simplicity, and why they refused to deploy a meeting bot or store audio even when investors said they were crazy. We go deep on the craft of building a beloved AI product: choosing meetings (not email) as the data wedge, designing calendar-triggered habit loops, and obsessing over privacy so users trust the tool enough to outsource memory. Chris opens the hood on Granola’s tech stack — real-time ASR from Deepgram & Assembly, echo cancellation on-device, and dynamic routing across OpenAI, Anthropic and Google models — and explains why transcription, not LLM tokens, is the biggest cost driver today. He also reveals how internal eval tooling lets the team swap models overnight without breaking the “Granola voice.” Looking ahead, Chris shares a roadmap that moves beyond notes toward a true “tool for thought”: cross-meeting insights in seconds, dynamic documents that update themselves, and eventually an AI coach that flags blind spots in your work. Whether you’re an engineer, designer, or founder figuring out your own AI strategy, this conversation is a masterclass in nailing product-market fit, trimming complexity, and future-proofing for the rapid advances still to come. Hit play, like, and subscribe if you’re ready to learn how to build AI products people can’t live without. Granola Website - https://www.granola.ai X/Twitter - https://x.com/meetgranola Chris Pedregal LinkedIn - https://www.linkedin.com/in/pedregal X/Twitter - https://x.com/cjpedregal FIRSTMARK Website - https://firstmark.com X/Twitter - https://twitter.com/FirstMarkCap Matt Turck (Managing Director) LinkedIn - https://www.linkedin.com/in/turck/ X/Twitter - https://twitter.com/mattturck (00:00) Introduction: The Granola Story (01:41) Building a "Life-Changing" Product (04:31) The "Second Brain" Vision (06:28) Augmentation Philosophy (Engelbart), Tools That Shape Us (09:02) Late to a Crowded Market: Why it Worked (13:43) Two Product Founders, Zero ML PhDs (16:01) London vs. SF: Building Outside the Valley (19:51) One Year in Stealth: Learning Before Launch (22:40) "Building For Us" & Finding First Users (25:41) Key Design Choices: No Meeting Bot, No Stored Audio (29:24) Simplicity is Hard: Cutting 50% of Features (32:54) Intuition vs. Data in Making Product Decisions (36:25) Continuous User Conversations: 4–6 Calls/Week (38:06) Prioritizing the Future: Build for Tomorrow's Workflows (40:17) Tech Stack Tour: Model Routing & Evals (42:29) Context Windows, Costs & Inference Economics (45:03) Audio Stack: Transcription, Noise Cancellation & Diarization Limits (48:27) Guardrails & Citations: Building Trust in AI (50:00) Growth Loops Without Virality Hacks (54:54) Enterprise Compliance, Data Footprint & Liability Risk (57:07) Retention & Habit Formation: The "500 Millisecond Window" (58:43) Competing with OpenAI and Legacy Suites (01:01:27) The Future: Deep Research Across Meetings & Roadmap (01:04:41) Granola as Career Coach?
Internet and technology 6 months
0
0
7
01:08:28

Anthropic's Surprise Hit: How Claude Code Became an AI Coding Powerhouse

What happens when an internal hack turns into a $400 million AI rocket ship? In this episode, Matt Turck sits down with Boris Cherny, the creator of Claude Code at Anthropic, to unpack the wild story behind the fastest-growing AI coding tool on the planet. Boris reveals how Claude Code started as a personal productivity tool, only to become Anthropic’s secret weapon — now used by nearly every engineer at the company and rapidly spreading across the industry. You’ll hear how Claude Code’s “agentic” approach lets AI not just suggest code, but actually plan, edit, debug, and even manage entire projects—sometimes with a whole fleet of subagents working in parallel. We go deep on why Claude Code runs in the terminal (and why that’s a feature, not a bug), how its Claude.md memory files let teams build a living, shareable knowledge base, and why safety and human-in-the-loop controls are baked into every action. Boris shares real stories of onboarding times dropping from weeks to days, and how even non-coders are hacking Cloud Code for everything from note-taking to business metrics. Anthropic Website - https://www.anthropic.com X/Twitter - https://x.com/AnthropicAI Boris Cherny LinkedIn - https://www.linkedin.com/in/bcherny X/Twitter - https://x.com/bcherny FIRSTMARK Website - https://firstmark.com X/Twitter - https://twitter.com/FirstMarkCap Matt Turck (Managing Director) LinkedIn - https://www.linkedin.com/in/turck/ X/Twitter - https://twitter.com/mattturck (00:00) Intro (01:15) Did You Expect Claude Code’s Success? (04:22) How Claude Code Works and Origins (08:05) Command Line vs IDE: Why Start Claude Code in the Terminal? (11:31) The Evolution of Programming: From Punch Cards to Agents (13:20) Product Follows Model: Simple Interfaces and Fast Evolution (15:17) Who Is Claude Code For? (Engineers, Designers, PMs & More) (17:46) What Can Claude Code Actually Do? (Actions & Capabilities) (21:14) Agentic Actions, Subagents, and Workflows (25:30) Claude Code’s Awareness, Memory, and Knowledge Sharing (33:28) Model Context Protocol (MCP) and Customization (35:30) Safety, Human Oversight, and Enterprise Considerations (38:10) UX/UI: Making Claude Code Useful and Enjoyable (40:44) Pricing for Power Users and Subscription Models (43:36) Real-World Use Cases: Debugging, Testing, and More (46:44) How Does Claude Code Transform Onboarding? (49:36) The Future of Coding: Agents, Teams, and Collaboration (54:11) The AI Coding Wars: Competition & Ecosystem (57:27) The Future of Coding as a Profession (58:41) What’s Next for Claude Code
Internet and technology 7 months
0
0
7
01:00:16
You may also like View more
Loop Infinito (by Xataka) Loop Infinito es un podcast diario de Xataka presentado por Javier Lacort. Un nuevo episodio cada día de lunes a viernes que analiza la actualidad tecnológica dando contexto y perspectiva.. Updated
Red de Sospechosos Habituales Sospechosos Habituales Updated
monos estocásticos monos estocásticos es un podcast sobre inteligencia artificial presentado por Antonio Ortiz (@antonello) y Matías S. Zavia (@matiass).  Sacamos un episodio nuevo cada jueves. Puedes seguirnos en YouTube, LinkedIn y X. Más enlaces en cuonda.com/monos-estocasticos/links Hacemos todo lo que los monos estocásticos saben hacer: coser secuencias de formas lingüísticas que hemos observado en nuestros vastos datos de entrenamiento según la información probabilística de cómo se combinan. Updated
Go to Internet and technology