Covering the biggest news of the century - the arrival of smarter-than-human AI. From the author of Simple Bench, which reveals the remaining gap between LLM an...
Never Browse Alone? - Gemini 2 Live and ChatGPT Vision
The ‘Gemini 2 Era’ begins … with screen-sharing? But really, it’s a great free tool, for curiosity satisfying rather than bleeding-edge intelligence. I give you the benchmarks, the highlights and of course, the latest from OpenAI Advanced Voice Mode with Vision. Plus Deep Research in Gemini Advanced, Simple Bench updates, Santa and what might be for some of you Google’s deflating admission. 00:00 - Introduction00:38 - Live Interaction 03:43 - Gemini 2.0 Flash Benchmarks 05:10 - Audio and Image Output06:38 - Project Mariner (+ WebVoyager Bench)08:49 - But Progress Slowing Down?10:43 - OpenAI Announcements + Gameshttps://aistudio.google.com/liveGemini 2.0 Flash Benchmarks: https://deepmind.google/technologies/gemini/Project mariner: https://deepmind.google/technologies/project-mariner/WebVoyager: https://x.com/laurentsifre/status/1858918588683296875/photo/1Gemini Game play: https://www.youtube.com/watch?v=IKuGNHJBGscAdvanced Voice Mode OpenAI: https://www.youtube.com/watch?v=NIQDnWlwYyQhttps://simple-bench.com/Claude Computer Use: https://docs.anthropic.com/en/docs/build-with-claude/computer-useOriol Vinyals Interview: https://www.youtube.com/watch?v=78mEYaztGaw&t=687s
--------
13:40
Sora is Out, But is it a Distraction?
After a 10 month wait, OpenAI have released Sora to paying users. With just a prompt it can generate videos of up to 20 seconds in lower resolutions, and 10 seconds at 1080p if you can fork out $200/month. I’ve tested it and read the system card. The user interface is quite beautiful, even if the videos themselves operate until entirely new rules of physics. But I can’t help wondering if OpenAI want up to focus on releases like this, rather than some quietly broken promises. 80,000 hours Website, Podcast + Channel: https://80000hours.org/https://open.spotify.com/show/2WzJwXWBDnn4iZ7odKwDib https://www.youtube.com/@eightythousandhours/videoshttps://openai.com/sora/Sora Countries: https://help.openai.com/en/articles/10250692-sora-supported-countriesSora Credits: https://help.openai.com/en/articles/10245774-sora-billing-credits-faqhttps://runwayml.com/ and https://pika.art/home DeepMind Veo: https://deepmind.google/technologies/veo/Sam Altman Ads as Last Resort: https://www.windowscentral.com/software-apps/openai-could-chase-intrusive-ads-as-last-resortBut OpenAI Considering Ads: https://www.inc.com/ben-sherry/is-openai-getting-into-the-advertising-business-the-company-is-sending-mixed-messages/91033533OpenAI Backtracks on Microsoft AGI Clause: https://www.ft.com/content/2c14b89c-f363-4c2a-9dfc-13023b6bce65As Microsoft Boast of Labor Savings: https://www.theinformation.com/articles/microsofts-new-sales-pitch-for-ai-spend-less-money-on-humans?rc=sy0ihqOpenAI Military Pivot: https://www.technologyreview.com/2024/12/04/1107897/openais-new-defense-contract-completes-its-military-pivot/Employees Have Doubts: https://www.washingtonpost.com/technology/2024/12/06/openai-anduril-employee-military-ai/?nid=top_pb_signin&arcId=KZIV7PLRHBCVNPAIAAAVUNRHIM&account_location=ONSITE_HEADER_ARTICLE
--------
15:34
o1 Pro Mode – Full Analysis (plus o1 paper highlights)
Oh boy. o1 pro mode out on the same night as o1 full. I read the 49 page paper, ran my own tests, spent my fuel allowance on Pro Mode and will give you all the highlights. Suffice to say the story is not as simple as it first appears. Weights and Biases’ Weave: wandb.me/ai_explainedPlus, GPT-4.5? MLE Bench, Simple Update, Image Analysis and much more o1 System Card: https://cdn.openai.com/o1-system-card-20241205.pdfApollo Research: https://www.apolloresearch.ai/research/scheming-reasoning-evaluationsAltman Tweet: https://x.com/AnonCEOMakeItAi/status/1864763052622504344ChatGPT Pro: https://openai.com/index/introducing-chatgpt-pro/Tibor Blaho: https://x.com/btibor91/status/1864709670470066605Simple-bench.com 00:00 - Introduction00:27 - ChatGPT Pro is $20001:25 - OpenAI Benchmarks03:20 - o1 System Card, o1 and o1 Pro Mode vs o1-preview06:18 - Simple Bench surprising results on sample08:31 - Weight & Biases09:05 - Image Analysis Compared12:51 - More Benchmarks and Safety
--------
16:43
AI Breaks Its Silence: OpenAI’s ‘Next 12 Days’, Genie 2, and a Word of Caution
Calmest before the storm? Whatever analogy you want to use things had gotten quiet toward the end of 2024. But then tonight we got Genie 2, and a series of scheduled announcements from OpenAI. Sora is soon here, and o1, but I dive deeper into what it all means and whether reliability is on a path to being solved, ft: two recent papers. Assembly AI Speech to Text: https://www.assemblyai.com/?utm_source=youtube&utm_medium=influencer&utm_campaign=ai_explained Plus Kling Motion Brush, Simple Bench QwQ update and much more.Genie 2: https://deepmind.google/discover/blog/genie-2-a-large-scale-foundation-world-model/Jim Cramer: https://x.com/jimcramer/status/1864068878692675625Give Us Full o1: https://x.com/tszzl/status/1863882905422106851Verge Scoop: https://x.com/tomwarren/status/1864326361415925861O1 Learning to Reason Benchmarks: https://openai.com/index/learning-to-reason-with-llms/SIMA AI: https://arxiv.org/pdf/2404.10179Genie Paper: https://arxiv.org/pdf/2402.15391My Video on Genie: https://www.youtube.com/watch?v=gGKsfXkSXv8Oasis Minecraft: https://x.com/risphereeditor/status/1852619965511204974LLMs Procedural Knowledge Paper: https://arxiv.org/pdf/2411.12580Bag of Heuristics Paper: https://arxiv.org/pdf/2410.21272Jensen Huang Hallucinations: https://www.tomshardware.com/tech-industry/artificial-intelligence/jensen-says-we-are-several-years-away-from-solving-the-ai-hallucination-problem-in-the-meantime-we-have-to-keep-increasing-our-computationDeepSeek Interview: https://www.chinatalk.media/p/deepseek-ceo-interview-with-chinasKling Motion Brush: https://klingai.com/image-to-videoTim Rocktaschel Book: https://geni.us/ArtificialIntelligence00:43 - OpenAI 12 Days, Sora Turbo, o103:06 - Genie 208:26 - Jensen Huang and Altman Hallucination Predictions09:45 - Bag of Heuristics Paper11:40 - Procedural Knowledge Paper13:02 - AssemblyAI Universal 213:45 - SimpleBench QwQ and Chinese Models14:42 - Kling Motion Brush
--------
15:29
New Google Model Ranked ‘No. 1 LLM’, But There’s a Problem
A new and mysterious Gemini model appears at the top of the leaderboard, but is that the full story? I dig behind the headline to show you some anti-climactic results, give some context with leaks in the last 48 hours of diminishing returns to scaling, and add the response of Altman, OpenAI and co. The future is about to look a lot stranger...80,000 hours Podcast and Channel: https://open.spotify.com/show/2WzJwXWBDnn4iZ7odKwDib https://www.youtube.com/@eightythousandhours/videos You can now gift memberships to AI Insiders (my Patreon w/ exclusive vids, network): https://www.patreon.com/AIExplained/gift ‘There is no wall’: https://x.com/sama/status/1856941766915641580https://x.com/vedantmisra/status/1857148554105544708Gemini Ranking: https://lmarena.ai/?leaderboardAPI not yet up: https://x.com/OfficialLoganK/status/1857106844805681153‘Just Die Chat’: https://x.com/koltregaskes/status/1856754648146653428Google CEO tweet: https://x.com/sundarpichai/status/1857114106928718329Sutskever Quote: https://www.reuters.com/technology/artificial-intelligence/openai-rivals-seek-new-path-smarter-ai-current-methods-hit-limitations-2024-11-11/Another OpenAI Staffer Leaves: https://x.com/RichardMCNgo/status/1856843040427839804Bloomberg Report: https://www.bloomberg.com/news/articles/2024-11-13/openai-google-and-anthropic-are-struggling-to-build-more-advanced-ai?s=09Noam Brown on what OpenAI Researchers Believe: https://x.com/polynoamial/status/1855037689533178289Clive Chan: https://x.com/itsclivetime/status/1855704120495329667Chollet Responds to Altman: https://x.com/fchollet/status/1857060079586975852https://x.com/sama/status/1856940152460869718Altman Emails: https://x.com/TechEmails/status/1857285960997712356Change of Heart: https://sd11.senate.ca.gov/news/senator-wiener-responds-openai-opposition-sb-1047Amodei on ‘Empirical Regularities’: https://lexfridman.com/dario-amodei-transcript/Verge Report: https://www.theverge.com/2024/10/25/24279600/google-next-gemini-ai-model-openai-decemberOpenAI Agents in January: https://www.bloomberg.com/news/articles/2024-11-13/openai-nears-launch-of-ai-agents-to-automate-tasks-for-users?srnd=phx-ai
Covering the biggest news of the century - the arrival of smarter-than-human AI. From the author of Simple Bench, which reveals the remaining gap between LLM and human reasoning. Hype-free, and the British accent is a freebie bonus.