Models & Labs

AWS Launches OpenSearch Serverless for AI Agents

TechCrunch AIMay 28, 2026high confidence

Why it matters

→AWS's new infrastructure can handle the unpredictable traffic of AI agents, reducing costs for users.
→The decoupling of compute from storage allows for instant scalability, crucial for AI-driven workloads.
→This reflects a broader industry shift towards accommodating machine-generated traffic, making AI agents more viable at scale.

AWS Launches OpenSearch Serverless for AI Agents — ©TechCrunch AI

AWS has introduced a new version of OpenSearch Serverless, tailored for the unique demands of AI agents. This update allows for dynamic scaling of compute resources, addressing the unpredictable nature of agent-driven workloads. By separating compute from storage, AWS ensures that users only pay for active usage, potentially lowering costs. This move is part of a larger industry trend as cloud providers adapt to increasing machine-generated traffic. The integration with AI development platforms like Vercel and Kiro further simplifies deployment for developers.

Read original

More from TechCrunch AI

Market & Regulationbusiness

Microsoft Challenges OpenAI, Anthropic with Own AI Models

Microsoft is positioning itself as a formidable competitor to AI giants OpenAI and Anthropic by promoting its own AI models and infrastructure. CEO Satya Nadella emphasizes the importance of enterprises maintaining control over their AI systems, advocating for a diverse model approach to avoid dependency on any single provider. This strategy is underscored by Microsoft's development of the MAI family of models and the Maya AI chips, which promise cost-effective and efficient performance. By offering a broad catalog of models, Microsoft aims to provide enterprises with flexible and secure AI solutions, challenging the dominance of established AI labs.

TechCrunch AIJul 30, 2026

Market & Regulationagents

Zuckerberg Predicts Billions Will Have AI Agents

Mark Zuckerberg envisions a future where billions of people have personal AI agents within five years, capable of managing tasks like finances and health. This ambitious vision aligns with Meta's ongoing investments in AI infrastructure, despite significant financial losses in its Reality Labs division. While Meta's stock has taken a hit, the company is doubling down on AI, partnering with BlackRock to build a $14 billion data center. The success of Meta's business agents on platforms like WhatsApp suggests a potential path forward, but scaling to billions of consumer agents remains a formidable challenge.

TechCrunch AIJul 29, 2026

Market & Regulationbusiness

Microsoft's Anthropic Investment Yields $3.2B Gain

Microsoft's investment in Anthropic has proven highly lucrative, with a $3.2 billion gain reported for the quarter, significantly boosting its earnings per share. This contrasts with its investment in OpenAI, which saw a $600 million write-down for the same period. Despite this quarterly dip, Microsoft's annual gain from OpenAI still reached $5 billion, highlighting the long-term value of its AI investments. The contrasting fortunes of these investments underscore the dynamic nature of the AI sector and Microsoft's strategic positioning within it.

TechCrunch AIJul 29, 2026

More in Models & Labs

Models & Labsmodels

Llama.cpp adds GLM-5.2 speculative decoding support

Llama.cpp's latest update introduces speculative decoding support for GLM-5.2, enhancing its capabilities with NextN/MTP features. This addition allows for more efficient tensor loading and context management, particularly benefiting models using the GLM_DSA architecture. The update also includes options for exporting models with or without the MTP feature, providing flexibility for developers. This release marks a step forward in optimizing model performance and adaptability, especially for those leveraging the GLM-5.2 framework.

llama.cpp ReleasesJul 30, 2026

Models & Labsmodels

Llama.cpp b10178 Release Adds Trace Logging

The b10178 release of llama.cpp enhances its server capabilities by adding trace logging for slot similarity checking, offering developers detailed insights into prompt cache slot selection processes. This update includes specifics on skip reasons and similarity calculations, which can aid in performance optimization. While no new model architectures are introduced, the release continues to support a wide array of platforms, such as macOS with KleidiAI, Ubuntu with ROCm 7.2, and Windows with CUDA 12 and 13. This makes llama.cpp a more versatile tool for developers working on different systems, reinforcing its position as a comprehensive inference runtime.

llama.cpp ReleasesJul 30, 2026

Models & Labsmodels

llama.cpp b10180 Release Enhances SYCL Performance

The b10180 release of llama.cpp brings notable improvements to SYCL performance, focusing on unary elementwise operations. By introducing a contiguous fast path and employing 32-bit index math, the update aims to boost computational efficiency. The integration of fastdiv for elementwise index math further enhances processing speed. Although there are no new models in this release, llama.cpp continues to evolve as a flexible inference runtime, now more efficient on systems like macOS, Linux, and Windows. Developers working with SYCL can expect smoother and faster operations, reinforcing llama.cpp's adaptability across different computing environments.

llama.cpp ReleasesJul 30, 2026