Automated AI News Brief: Agent Sessions, MCP Cloud, and Local Long-Context Models
July 3 AI news brief: Anthropic Python SDK adds an agent memory beta header, GitHub Copilot expands agent session streaming, AI credit pools, GitHub Actions authentication, and usage metrics, Manufact launches as an MCP Cloud on Hacker News, Simon Willison releases llm-coding-agent and uses DSPy to improve Datasette Agent prompts, while local model communities discuss DeepSeek V4 Flash at 1M context, Gemma 4 voice, and WebGPU acceleration.
Introduction
Today's post was built from AI, LLM, agent, developer tooling, and open source community data fetched by Horizon over the past 48 hours, then organized by Codex in the SHUO Blog news format. The main sources Horizon collected this time include GitHub Releases, Hacker News, GitHub Changelog, Google AI Blog, Simon Willison, Latent Space, OSS Insight, and Reddit LocalLLaMA. Reddit MachineLearning RSS is still hitting 429 rate limits, so community items mainly come from LocalLLaMA.
This is not a single story, but a morning AI brief for July 3. Each item includes the original source so you can read the full context.
1. Anthropic Python SDK adds an agent memory beta header
Anthropic Python SDK released v0.116.0, adding the agent-memory-2026-07-22 beta header. The release note is small, but it looks like another API-level step toward memory and long-running state in the Managed Agents direction.
Agent memory matters because tool calls alone only solve one task at a time. Useful work agents need project context, user preferences, past decisions, common tools, and things they should avoid. Memory also introduces permission, privacy, deletion, and false-memory risks. Seeing this appear as a beta header suggests the capability is becoming an explicit API surface, not just a hidden feature inside chat products.
Source: Anthropic SDK Python v0.116.0
2. GitHub Copilot expands session streaming, cost pools, and Actions authentication
GitHub Changelog had several Copilot updates today. Copilot agent session streaming is now in public preview, giving enterprises access to agent session data across Copilot clients. Cost centers now support AI credit pools, so organizations can cap how much of their included monthly AI credits a cost center can use. Copilot CLI can now run in GitHub Actions using the built-in GITHUB_TOKEN instead of a separate PAT. Copilot usage metrics API reports are also more complete and accurate.
These updates share one direction: once agents enter companies, observability, permissions, and spending controls become central. The question is no longer just "can the model write code?" It becomes: what did the agent session do, how many credits did it spend, which cost center paid, who can inspect it, and does CI need a long-lived token? GitHub is turning Copilot from a personal assistant into an engineering platform component.
Sources: Copilot agent session streaming is now in public preview; Cost centers now support AI credit pools; Copilot CLI no longer needs a personal access token in GitHub Actions; Improved accuracy and coverage in Copilot usage metrics reports
3. GitHub Copilot will deprecate Gemini 2.5 Pro and Gemini 3 Flash
GitHub announced that Gemini 2.5 Pro and Gemini 3 Flash will be deprecated across all GitHub Copilot experiences on July 31, 2026. The change covers Copilot Chat, inline edits, ask and agent modes, and code completions.
This is another example of how quickly model supply changes inside multi-model platforms. The benefit is fast access to newer models. The tradeoff is that specific models can also disappear quickly. Companies that bind workflows to a specific model need to track retirement dates, replacements, cost changes, and quality differences. The model picker may look like a UI detail, but at enterprise scale it becomes change management.
Source: Upcoming deprecation of Gemini 2.5 Pro and Gemini 3 Flash
4. Manufact launches on Hacker News as an MCP Cloud
Hacker News featured Launch HN: Manufact (YC S25) – MCP Cloud. Manufact describes itself as a cloud for MCP apps and servers. The team used to be called mcp-use and still maintains open source MCP SDKs under that name.
MCP is moving quickly. At first, many developers treated it as a protocol for connecting models to tools. Now cloud deployment, server management, app platforms, SDKs, and enterprise needs are appearing around it. That suggests MCP is moving from developer experiment into infrastructure market. The hard parts will be security, authorization, observability, secret management, and cross-tool compatibility, because MCP servers become part of the permission boundary once agents can use many external capabilities.
Sources: Manufact; Launch HN discussion; mcp-use GitHub
5. Simon Willison: llm-coding-agent, DSPy, and "understand to participate"
Simon Willison released llm-coding-agent 0.1a0, an experiment building a small coding agent on top of his LLM library as it evolves into more of an agent framework. He also published a note on using DSPy to evaluate and improve Datasette Agent's SQL system prompts, and wrote about Geoffrey Litt's AIE framing: Understand to participate.
These three items fit together neatly. A coding agent is not just a model wired to a shell. Prompts, evals, tool interfaces, and human understanding all have to be designed. Using DSPy to improve a SQL system prompt makes prompt work feel more like an engineering experiment than intuition-only tuning. "Understand to participate" is the human side: as agents produce larger changes, people need to understand the intermediate work if they are going to review and co-design it.
Sources: llm-coding-agent 0.1a0; Using DSPy to evaluate and improve Datasette Agent's SQL system prompts; Understand to participate
6. Agentic web and software factories still need skills, sandboxes, and human control
Latent Space had several AI Engineer World's Fair pieces around agents. Vercel's Andrew Qu discussed agents as a new kind of software, emphasizing skills, sandboxes, and agent-readable websites. Another interview discussed Adobe's experiments with "agentic sites" that assemble pages around individual user intent. A third piece covered skill engineering and the argument against one-shot AI design.
This shows agent products moving from "do this one task for me" toward a world where websites, tools, sandboxes, and skills are redesigned for agents. But these pieces also keep returning to human judgment and control. The more agents enter design, development, and website generation, the less convincing one-shot generation becomes. Loops, review, understandable intermediate states, and product structures that let people intervene become more important.
Sources: Vercel's Andrew Qu on why agents are a new kind of software; The website of the future may assemble itself for every visitor; Skill engineering and the case against one-shot AI design
7. Local models: DeepSeek V4 Flash at 1M context, Gemma 4 voice, and WebGPU acceleration
LocalLLaMA had several practical local-model discussions today. One post shared a llama.cpp patch for running DeepSeek V4 Flash locally with full 1M token context on an RTX 5090. A Hugging Face member shared a Gemma 4 31B voice demo combining Parakeet, Gemma 4 31B, Qwen3TTS, and web search. The community also discussed Gemma 4 WebGPU kernels reaching 255 tok/s, plus local RTX 3090 benchmarks comparing Qwen3.6 27B and Ornith.
These all point in the same direction: local AI is bottlenecked by more than the model itself. Runtime support, kernels, context memory, voice pipelines, benchmarks, and hardware adaptation matter just as much. As the toolchain improves, local models start to look less like single-prompt toys and more like private assistants that can work for longer sessions.
Sources: Reddit: DeepSeek V4 Flash running with full 1M token context locally; Reddit: Talking with Gemma 4 31B; Reddit: Gemma 4 WebGPU Kernels 255 tok/s; Reddit: Local benchmarks with a RTX 3090
8. OSS Insight: agent tools are appearing across security, video, routing, and multi-agent desktops
OSS Insight caught several AI and agent-related trending repositories today. usestrix/strix describes itself as open-source AI hackers for finding and fixing app vulnerabilities. calesthio/OpenMontage is an agentic video production system. diegosouzapw/OmniRoute is an AI gateway with multi-provider routing, compression, and fallback. stablyai/orca is positioned as an ADE for managing a fleet of parallel agents. DeusData/codebase-memory-mcp is a high-performance code intelligence MCP server.
Not all of these projects will become large products, but the direction is clear: the agent ecosystem is splitting into infrastructure pieces. Security testing, video production, model routing, multi-agent management, and codebase memory are all supporting parts needed to turn an AI assistant into a usable system.
Sources: usestrix/strix; calesthio/OpenMontage; diegosouzapw/OmniRoute; stablyai/orca; DeusData/codebase-memory-mcp
Today's Notes
Today's AI news falls into three lines.
First, agents are filling in platform capabilities: memory, session streaming, usage metrics, cost pools, and CI token handling are all things agents need when they move from demos into team tools.
Second, MCP and coding agents are becoming infrastructure markets: Manufact, llm-coding-agent, Datasette Agent prompt evals, and codebase-memory MCP all push "model plus tools" toward something more deployable, measurable, and manageable.
Third, local model progress is increasingly engineering-driven: 1M context, WebGPU kernels, voice pipelines, and RTX 3090 benchmarks matter more for daily usability than model names alone.
The data entry point for this post is Horizon. This post was organized, rewritten, and supplemented with sources by Codex according to the SHUO Blog news format.
Sources
- Anthropic SDK Python v0.116.0
- GitHub Changelog: Copilot agent session streaming is now in public preview
- GitHub Changelog: Cost centers now support AI credit pools
- GitHub Changelog: Copilot CLI no longer needs a personal access token in GitHub Actions
- GitHub Changelog: Upcoming deprecation of Gemini 2.5 Pro and Gemini 3 Flash
- Manufact
- Launch HN discussion: Manufact
- Simon Willison: llm-coding-agent 0.1a0
- Simon Willison: Using DSPy to evaluate and improve Datasette Agent's SQL system prompts
- Latent Space: Vercel's Andrew Qu on why agents are a new kind of software
- Latent Space: Skill engineering and the case against one-shot AI design
- Reddit: DeepSeek V4 Flash running with full 1M token context locally
- Reddit: Talking with Gemma 4 31B
- Reddit: Gemma 4 WebGPU Kernels 255 tok/s
- usestrix/strix
- DeusData/codebase-memory-mcp

