SHUO Blog NewsDaily Brief

Automated AI News Summary: OpenAI Inference Chip, Gemini Computer Use, and Agent Tool Updates

June 25 AI news summary: OpenAI and Broadcom unveil an LLM inference chip, Google brings computer use to Gemini 3.5 Flash, Qualcomm acquires Modular, GitHub adjusts Copilot model selection, and open-source community updates including Krea 2, OpenMontage, and codebase-memory-mcp.

By Auto-curated by Codex via Horizon

Preface

This post was put together by Codex following the SHUO Blog news format, based on data Horizon pulled over the last 48 hours covering AI, LLMs, agents, developer tools, and the open-source community. Horizon's main sources this time include OpenAI News, GitHub Changelog, Hugging Face Blog, Latent Space, Simon Willison, Hacker News, Reddit LocalLLaMA, GitHub Releases, and OSSInsight.

This isn't a single news item — it's an AI summary for the morning of June 25. Each item links back to its original source if you want to read the full article.

1. OpenAI and Broadcom Unveil LLM-Optimized Inference Chip: Jalapeño

OpenAI announced an LLM-optimized inference chip, codenamed Jalapeño, developed with Broadcom. The official summary emphasizes performance, efficiency, and scale, with the goal of making LLM inference more viable for large-scale deployment in terms of both raw performance and per-watt efficiency.

The point here isn't just "OpenAI makes chips too" — it's that AI companies are pulling models, inference services, and hardware supply chains tighter together. Training has long been bottlenecked by GPU supply, and inference is becoming the next constraint on cost and scaling velocity. If OpenAI can control an inference chip tailored to its own workloads, it shifts the cost structure of running model services.

English brief: OpenAI and Broadcom introduced Jalapeño, an LLM-optimized inference chip designed to improve performance, efficiency, and scale for AI systems.

Source: OpenAI: OpenAI and Broadcom unveil LLM-optimized inference chip

2. Google Brings Computer Use to Gemini 3.5 Flash

A post that gained traction on Hacker News today was Google's announcement of Computer use in Gemini 3.5 Flash. The direction here is clear: models should do more than answer in text — they need to understand screens, interact with interfaces, and execute multi-step tasks.

This is the same trajectory as browser agents and desktop agents. If a model is going to actually complete work, it needs to handle UI state, clicks, forms, error messages, and multi-turn corrections. Gemini 3.5 Flash going the computer use route means Google is pushing lighter, lower-latency models into agentic scenarios, not just running the strongest model for demos.

English brief: Google introduced computer use capabilities for Gemini 3.5 Flash, pointing toward models that can interact with interfaces and perform multi-step tasks.

Source: Google Blog: Introducing computer use in Gemini 3.5 Flash; Hacker News discussion

3. Qualcomm Announces Acquisition of Modular — AI Compilers and Hardware Platforms Converge

Qualcomm announced it is acquiring AI startup Modular. Modular was best known among developers for Mojo, MAX, and its compiler and runtime technology for AI workloads. With this acquisition, the integration between AI software stacks and chip platforms becomes more direct.

This pairs well with the OpenAI / Broadcom chip news. AI competition isn't just about the model itself — it's also about how fast and cheaply a model runs and what hardware it runs on. Modular's value lies in shortening the distance between high-level model workloads and underlying hardware. If Qualcomm wants more control over on-device AI, PC AI, and datacenter inference, this kind of software layer is critical.

English brief: Qualcomm is acquiring Modular, bringing AI compiler, runtime, and hardware-adjacent tooling closer to Qualcomm's AI platform strategy.

Source: Qualcomm press release: Qualcomm to Acquire Modular; Modular: Qualcomm to acquire Modular

4. GitHub Copilot Free / Student Moves to Auto Model Selection

GitHub Changelog announced that Copilot Free and Student plans will use Copilot auto model selection as the default and only model selection experience. Auto dynamically picks a model based on the task, so users no longer manually choose.

This looks like product simplification on the surface, but underneath it's model routing strategy. For free and student plans, letting the system decide the model keeps costs in check and spares new users from having to understand differences between models. Power users lose some control, but for most regular users, auto routing is likely to become the default design pattern for AI tools.

English brief: GitHub Copilot Free and Student plans are moving to auto model selection as the default and only model selection experience.

Source: GitHub Changelog: Changes to model selection for Free and Student plans

5. GitHub Enterprise Adds Incident Response Credential Revocation

GitHub Enterprise owners can now use a new self-service credential revocation feature to quickly revoke credentials for specific users when an account is compromised or credentials are leaked. It's closer to a break-glass capability, aimed at shortening incident response time.

Not strictly AI news, but directly relevant to AI coding and agent workflows. As AI agents gain access to repos, package registries, CI/CD, and cloud tokens, credential leaks and permission revocation become more important. Without a fast revoke mechanism, the efficiency agents bring also widens the security blast radius.

English brief: GitHub Enterprise owners can now revoke credentials for incident response, improving response time for compromised accounts or leaked credentials.

Source: GitHub Changelog: Self-service credential revocation for incident response

6. Hugging Face and NVIDIA NeMo AutoModel Accelerate Transformers Fine-Tuning

The Hugging Face Blog published a post about NVIDIA NeMo AutoModel, focused on accelerating Transformers fine-tuning. This kind of update targets users who want to fine-tune within the existing Transformers ecosystem while taking advantage of NVIDIA's training optimization tools.

Many teams don't need to train models from scratch, but they do need to fine-tune for specific tasks, domain data, or internal formats. If the cost, speed, and engineering barrier to fine-tuning keep coming down, more small and medium teams can adapt models for their own needs instead of relying solely on general-purpose models padded with prompts.

English brief: Hugging Face published guidance on accelerating Transformers fine-tuning with NVIDIA NeMo AutoModel, targeting teams that need more efficient model adaptation.

Source: Hugging Face Blog: Accelerating Transformers Fine-Tuning with NVIDIA NeMo AutoModel

7. Krea 2 Open-Sources 12B Image Model Weights and Training Report

Krea 2 showed up in today's Hacker News discussions. Krea released a SOTA open-weights 12B image model along with a technical report covering training and data infrastructure. This matters for the image model ecosystem because high-quality image generation has long been dominated by a handful of closed products.

If Krea 2's open weights and training details are thorough enough, more tools will be able to run high-quality image generation locally, in private deployments, or inside custom workflows. That has implications for design tools, video tools, and agentic creative workflows.

English brief: Krea released Krea 2, an open-weights 12B image model, along with a technical report covering training and data infrastructure.

Source: Krea 2 technical report; Hacker News discussion

8. LocalLLaMA Eyes Qwen-AgentWorld and Local Agent Environment Simulation

LocalLLaMA had a thread today on Qwen-AgentWorld-35B-A3B. This model is described as a 35B MoE with roughly 3B active parameters per token. The focus isn't general chat — it's simulating agent interaction environments including MCP, terminal, SWE, Android, web, and OS.

This direction is worth watching. Agent training isn't just about "knowing the answer" — it also requires understanding how the environment responds after an action is taken. If a model can learn to predict terminal, browser, OS, or MCP tool outputs, it could enable better planning, simulation, or self-correction before agents act in the real world.

English brief: Qwen-AgentWorld-35B-A3B drew attention as a language world model trained to simulate agent environments such as MCP, terminal, SWE, Android, web, and OS interactions.

Source: Reddit: Qwen-AgentWorld-35B-A3B discussion

OSSInsight flagged two open-source projects related to agent workflows today.

The first is OpenMontage, described as an open-source agentic video production system with 12 pipelines, 52 tools, and 500+ agent skills — essentially turning an AI coding assistant into a video production studio. The second is codebase-memory-mcp, pitched as a high-efficiency code intelligence MCP server that indexes a codebase into a persistent knowledge graph and claims to significantly reduce token usage.

Both projects show agent tools diverging into vertical capabilities: one goes all-in on video production workflows, the other on codebase memory and MCP infrastructure. Going forward, the open-source agent ecosystem won't just be about "general chat agents" — we'll see more tools purpose-built for specific workflows.

English brief: OSSInsight highlighted OpenMontage, an agentic video production system, and codebase-memory-mcp, a code intelligence MCP server for persistent codebase memory.

Source: OpenMontage GitHub; codebase-memory-mcp GitHub

Observations

Today's news clusters around three threads.

The first is infrastructure: OpenAI / Broadcom's inference chip and Qualcomm's Modular acquisition both point to AI competition moving closer to hardware, compilers, runtimes, and inference cost.

The second is agent capability: Gemini computer use, Qwen-AgentWorld, OpenMontage, and codebase-memory-mcp are all pushing agents beyond chat and into real operational environments. Being able to read a UI, operate a tool, understand a repo, and maintain memory matters more than getting a single answer right.

The third is platform governance: GitHub's auto model selection and credential revocation both reflect AI development tools entering a post-scale governance phase. How models are selected, costs are controlled, and credentials are revoked are becoming baseline platform capabilities.

The data entry point for this post is Horizon. This article was compiled, rewritten, and sourced by Codex following the SHUO Blog news format.

Sources