Category: Inside AI

Auto Added by WPeMatico

  • The math behind the OpenAI Jalapeño chip

    OpenAI’s financial trajectory hinges heavily on infrastructure costs, a reality that drove the development of the new custom OpenAI Jalapeño chip. Developed in collaboration with Broadcom, the application-specific integrated circuit (ASIC) represents a direct attempt to mitigate the heavy capital expenditure associated with third-party hardware. 

    While Nvidia currently commands an estimated 75% profit margin on its high-end processors, OpenAI operates on tighter margins, keeping roughly 33 cents of profit on each dollar generated after accounting for its massive operational expenses. The financial burden of running large language models at scale is severe. 

    Last year, keeping ChatGPT servers responsive had cost OpenAI a staggering US$8.4 billion. With the platform now attracting 900 million weekly users, that operational cost is projected to reach approximately US$14 billion this year. Over the next eight years, OpenAI has committed roughly US$1.4 trillion to computing power, a massive bet for a company currently generating US$25 billion in annual revenue.

    Designing Hardware for LLM Inference

    The OpenAI Jalapeño chip, dubbed as the company’s first “Intelligence Processor”, is built specifically for large language model (LLM) inference rather than general-purpose AI workloads. OpenAI provided the core architectural design based on its specific model roadmaps and serving systems, while Broadcom managed the silicon engineering and high-performance networking integration. 

    TSMC handles the physical manufacturing in Taiwan, and Celestica is tasked with building the board and rack systems. According to OpenAI, early lab samples are already running frontier workloads, including an unreleased GPT-5.3-Codex-Spark model, at target production frequency and power. 

    Richard Ho, head of OpenAI’s hardware program, noted that the architecture minimizes data movement to push realized utilization closer to its theoretical peak performance. Unlike general-purpose accelerators adapted from legacy AI workloads, this architecture specifically balances compute, memory, and networking resources to solve the data-movement bottlenecks native to interactive LLM serving.

    To achieve this at scale, the platform integrates Broadcom’s Tomahawk networking silicon directly into the design, allowing the custom processors to communicate across massive, clustered data center environments.

    The vertical integration flywheel

    By moving into custom silicon, OpenAI shifts from being a mere software layer to a vertically integrated infrastructure company. This full-stack strategy spans the entire pipeline: chip architecture, software kernels, memory systems, network scheduling, and the final application layer. Much like Apple’s tight coupling of proprietary hardware and iOS, OpenAI can now optimize its infrastructure around its exact internal model roadmaps.

    This integration feeds a continuous operational flywheel. Enhanced infrastructure efficiency lowers the cost of both training and serving models. More affordable serving leads to better, more responsive products, which drives user volume and revenue to be reinvested back into the next generation of custom infrastructure.

    Overcoming the late-mover advantage

    By introducing its own silicon, OpenAI enters a landscape where its primary competitors have spent nearly a decade developing proprietary hardware. Google began deploying its Tensor Processing Units (TPUs) in 2015 and now controls roughly a quarter of global AI computing capacity outside of Nvidia’s supply chain. 

    Amazon has shipped over one million of its custom chips, while Meta and Microsoft continue to scale their own infrastructure.

    “Jalapeño is part of our long-term full-stack infrastructure strategy to make compute more abundant,” said Greg Brockman, president and co-founder of OpenAI. “By designing more of the stack ourselves, we can serve more intelligence with greater efficiency.”

    To close this timeline gap, OpenAI accelerated the development phase. The OpenAI Jalapeño chip transitioned from a blank-slate design to manufacturing tape-out—the final step before physical production—in just nine months. The engineering teams achieved this timeline by utilizing OpenAI’s own language models to automate and optimize portions of the hardware design process.

    This creates a unique feedback loop where the models served to users are actively being leveraged to build the physical infrastructure that will run future iterations. Initial deployment of the hardware into data centres is scheduled to begin by the end of 2026.

    Broadcom CEO Hock Tan confirmed that the rollout will scale alongside infrastructure partners, including Microsoft, to prepare for gigawatt-scale data centre integration.

    (Photo by OpenAI)

    See also: Omio scales travel product development using OpenAI models

    Banner for AI & Big Data Expo by TechEx events.

    Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and is co-located with other leading technology events, click here for more information.

    AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

    The post The math behind the OpenAI Jalapeño chip appeared first on AI News.

  • Anthropic drops ‘workplace AI agents’ directly inside Slack

    Anthropic launched a beta version of its Claude Tag feature for Enterprise and Team tiers, shifting its chat model into shared Slack channels. Moving away from traditional isolated chat boxes, users pull the artificial intelligence model into active group threads by typing @Claude. 

    The integration allows any team member in the channel to delegate a task, review the model’s outputs, and pick up the discussion thread from a previous point. This structural shift follows a US$65 billion Series H funding round that brought Anthropic’s post-money valuation to US$965 billion, positioned above rival OpenAI’s US$852 billion mark. 

    Following a confidential S-1 filing for an initial public offering, market competition for business software placement remains tight. Data from corporate expense platform Ramp’s May 2026 AI Index indicates Anthropic’s enterprise adoption rate reached 34.4%, passing OpenAI’s 32.3% footprint.

    Modifying the channel workstream

    Standard generative software requires enterprise employees to move data between team chats and separate browser instances. Anthropic aims to reduce this back-and-forth movement by restructuring workplace AI agents to work in multiplayer environments.

    “Instead of a private back-and-forth, Claude Tag shows up in the open,” stated Rob Seaman, general manager of Slack, regarding the operational mechanics of the application. This shared visibility alters how context is tracked inside an organisation. Because Claude Tag logs its task status directly inside the communication window, multiple employees can monitor the live execution steps. 

    The system tracks ongoing information from its active channels to build a contextual background. This automated history tracking limits the need for team members to continuously retype foundational company data or project scopes.

    Functional mechanics and asynchronous tasks

    The technical foundation for this channel integration relies on Anthropic’s Opus 4.8 engine. When assigned a request, the model divides the operation into sequential execution phases and utilises connected corporate databases, tools, and code repositories to complete the work.

    The primary operational difference for these workplace AI agents is their capability to function asynchronously without real-time human prompting. If a network administrator activates the tool’s “ambient” configuration, Claude Tag monitors threads and tracks tasks autonomously. The agent checks inactive text threads, signals priority notifications from integrated software extensions, and tracks unresolved assignments across multi-day intervals.

    Cat Wu, head of product for Claude Code, noted that the change centres on user configuration rather than completely new logic. “The form factor of being able to tag it the same way that you would a coworker is really powerful,” Wu told Reuters. Wu explained that connecting her personal Claude Tag agent to her email archive allows the system to analyse incoming communications, categorise urgent entries, and send immediate alerts inside Slack.

    Metrics and administrative controls

    Internal reporting from Anthropic shows that automated code generation has altered engineering activities, with the firm’sinternal product group creating 65% of its code through its private version of Claude Tag.

    Beyond software development, the vendor targets non-technical office workforces. Early customer implementations focus on querying database metrics, parsing analytics data, and processing internal IT support tickets.

    This expansion of background agent operations requires a distinct security infrastructure to protect proprietary information. To restrict data access to approved departments, system administrators must establish scoped Claude identities. All localised memories and tool integrations are confined strictly to specific channels authorised by the IT department. 

    Additionally, management portals offer full tracking logs of user queries alongside specific organisational caps to regulate monthly token costs. 

    The enterprise calculation: Autonomy vs. governance

    Frankly, moving generative tools from individual sandboxes into persistent corporate communication channels presents distinct operational trade-offs. The clear upside is the optimisation of routine knowledge work. By centralising information logs directly inside active threads, companies can lower task friction, capture context across changing project teams, and reduce the time spent on manual codebase tracking or database updates.

    However, delegating cross-app workflows to background agents introduces significant structural risks for IT departments. Permitting automated systems to read chat histories, connect to email accounts, and modify central code repositories expands an organisation’s internal data-exposure risks.

    If access boundaries are misconfigured, sensitive proprietary context could cross into unapproved channels. Furthermore, autonomous asynchronous execution removes direct human verification from intermediate workflow stages, leaving teams vulnerable to systemic errors if the underlying model misinterprets instructions mid-task. 

    Corporate decision-makers must ultimately evaluate whether the productivity gains of channel-based automation outweigh the rigorous auditing, compliance overhead, and channel-by-channel security configurations required to safely govern an always-on agent.

    See also: Anthropic releases Claude Opus 4.8

    Banner for the AI & Big Data Expo event series.

    Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and is co-located with other leading technology events including the Cyber Security & Cloud Expo. Click here for more information.

    AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

    The post Anthropic drops ‘workplace AI agents’ directly inside Slack appeared first on AI News.

  • Omio scales travel product development using OpenAI models

    Omio integrates OpenAI models across its engineering operations to accelerate travel product development and launch booking interfaces.

    The multimodal travel platform coordinates operations with over 3,000 transportation providers across 47 countries. Omio explicitly rejects the superficial addition of technology to outdated internal processes. The company’s CTO, Tomas Vocetka, requires all internal functions to completely redesign their operational execution frameworks from the ground up to operate as a native AI enterprise.

    OpenAI Codex integration

    Vocetka initiated the internal deployment by providing base ChatGPT access to the workforce, establishing a baseline familiarity with generative models before executing the primary technical integration.

    Omio subsequently embedded OpenAI Codex directly into its engineering operations, mandating its application across the entire software development lifecycle. Engineers currently apply Codex to preliminary research, architectural planning, active coding, automated testing, code reviews, and ongoing system maintenance.

    The engineering division constructs custom internal connectors to link proprietary data environments directly with these tools. This setup allows developers to bypass basic information retrieval and proceed directly to active task execution within their integrated development environments.

    Vocetka categorises the initial ChatGPT rollout as a preliminary introduction, emphasising that Codex handles the actual production workload. The deployment execution matured beyond the technical divisions. Management actively expands the use of Codex into non-technical corporate functions across the wider organisation. This expansion ensures standard operational procedures adapt to the new capabilities introduced by the engineering team.

    Internal analysis indicates the technical effort required to build specific products now sits at approximately 20 percent of previous levels. Delivery timelines show corresponding compression. Projects demanding the attention of multiple developers over an entire fiscal quarter now require a single engineer operating for roughly one month.

    Faster cycle times allow the engineering teams to test experimental concepts and validate consumer demand with minimal resource expenditure. Management allocates capital and engineering hours with greater precision, relying on prototyping to eliminate unviable features before committing to full-scale production.

    Lowering the time and cost barrier for software creation enables quicker internal decision-making. The technical teams iterate on existing products at a much higher velocity, pushing updates and new interface elements to the live environment at accelerated pace.

    Conversational commerce built on real-time transport data

    Omio launched one of the earliest conversational travel booking interfaces in 2023 by connecting OpenAI models to its proprietary transportation inventory.

    The system processes natural language queries regarding complex multimodal routes. Travelers input natural language requests asking for the fastest route from Rome to Florence, or comparing flights and trains between Paris and Barcelona.

    Omio aggregates services spanning trains, buses, ferries, and flights. Legacy travel booking required users to navigate multiple websites, manually compare modes of transport, and independently aggregate itineraries across multiple providers. Omio replaces this fractured process with a unified interface capable of parsing consumer intent.

    The generative models analyse text inputs and ping the booking systems to construct viable travel paths. The application functions by grounding the model responses in live pricing and availability data. The architecture prevents the generation of travel options based on static or outdated training data. The resulting output provides consumers with directly bookable itineraries.

    Omio expanded its initial integration into a dedicated ChatGPT experience. This dedicated application directly accesses the global transportation network maintained by the company. By grounding the user interaction in verified data, the technical team ensures high-fidelity responses. Consumers receive highly-personalised journey options rather than generic travel advice.

    Omio defines this structural setup as a new category of conversational commerce. The AI operates as the primary interface layer mediating the interaction between the consumer and the underlying global transportation network. The company views this as a broader departure from legacy search-based interfaces toward native generative customer experiences.

    The deployment points to a future where travel planning relies entirely on interacting with intelligent systems connected directly to live transportation networks.

    Omio’s corporate policy explicitly mandates that human personnel retain full accountability for all deployed code and final business outcomes. Generative tools function strictly as acceleration engines for development, analysis, and decision-making.

    “The responsibility and accountability stay with people. AI helps us develop faster, analyse faster, and make decisions faster, but people stay in charge,” explains Vocetka.

    This governance structure prevents automated systems from independently executing irreversible changes to the booking infrastructure or the core multimodal routing algorithms. The combination of broad employee access to OpenAI tools and rigorous oversight models creates an environment prioritising both speed and systemic stability.

    See also: Mitigating vendor lock-in with Sakana AI Fugu multi-agent models

    Banner for the AI & Big Data Expo event series.

    Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and is co-located with other leading technology events including the Cyber Security & Cloud Expo. Click here for more information.

    AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

    The post Omio scales travel product development using OpenAI models appeared first on AI News.

  • Mitigating vendor lock-in with Sakana AI Fugu multi-agent models

    Sakana AI launched Fugu to orchestrate multi-agent operations and mitigate single-vendor dependency risks in enterprise deployments.

    Enterprises face operational vulnerabilities when relying entirely on monolithic AI APIs. Japanese AI firm Sakana AI designed Fugu as a response to these concentration risks by creating an orchestration language model that calls upon a pool of varied models to complete multi-step tasks.

    Users access this ecosystem through a single OpenAI-compatible endpoint. Fugu routes queries internally, deciding whether to resolve a prompt directly or to assemble a coordinated team of expert models for deeper analysis. The system handles model selection, delegation, verification, and synthesis internally. Engineering teams interact with what appears to be one model while a background system of specialists executes the actual computation.

    Sakana AI targets the geopolitical and regulatory risks associated with AI sourcing. Recent export controls affecting Anthropic models like Fable and Mythos demonstrated that access to specific foundational architectures can vanish based on foreign policy decisions.

    Fugu functions as a hedge against these sudden supply chain disruptions. The platform relies on a completely swappable agent pool. Fugu dynamically routes traffic around any restricted or degraded provider to maintain service continuity. Sakana AI states this capability provides the resilient architecture required for AI sovereignty.

    Fugu deployment tiers

    Two tiers are available to accommodate different operational latency requirements.

    The standard Fugu model prioritises low latency for daily tasks, integrating into standard developer tools like Codex for live coding and code review. Organisations subject to strict data governance or privacy mandates can manually opt specific underlying models out of the standard Fugu routing pool.

    Fugu Ultra targets complex, multi-step analytical problems that demand maximum accuracy. The Ultra variant coordinates a deeper pool of expert agents for intensive tasks such as academic paper reproduction, literature investigations, and patent analysis.

    Sakana AI reports that Fugu Ultra performs competitively against leading closed models like Fable 5 and Mythos Preview across scientific, engineering, and reasoning benchmarks:

    Benchmarks of Sakana AI Fugu standard and Ultra compared to rival frontier models.

    The orchestration method ensures companies can access top-tier computing capabilities without carrying the vendor concentration risk or export control exposure inherent to those closed models.

    Implementation in cybersecurity

    Almost 500 early users tested the system during an extended beta program focused on lengthy, multi-step computational workflows. With cybersecurity such a focus for models like Claude Mythos, engineering teams deployed Fugu Ultra to automate complete security assessment cycles.

    Human operators issued one scoped instruction, and the orchestration engine executed the entire reconnaissance phase. The model successfully conducted cross-site scripting and SQL injection checks alongside thorough authentication reviews.

    A participating cybersecurity engineer confirmed the model stayed strictly within its operational parameters and avoided initiating destructive actions against the target infrastructure. Fugu concluded the automated engagement by generating a clean vulnerability report complete with verifying evidence and exact retest steps for human remediation teams.

    The implementation demonstrated that multi-agent routing maintains strict compliance boundaries while executing complex penetration testing sequences.

    Software development teams also integrated Fugu Ultra into their primary code review pipelines to compare defect detection rates against established monolithic tools. The orchestration engine consistently outperformed baseline models in identifying logic flaws and security vulnerabilities within complex enterprise codebases.

    “For code review, Fugu Ultra is significantly better than GPT-5.5. It gives comprehensive answers and finds the bugs others miss,” reported a software engineer involved in the beta deployment. “Where other tools flag about three issues, Fugu surfaced more than twenty. It’s become the model I run all my reviews through.”

    Automated research and persona stability

    Data science units deployed the system in an almost fully-automated research mode. Fugu Ultra successfully explored mathematical hypotheses, executed experimental code runs, interpreted failure states, and revised its own approaches to sustain progress over extended periods with minimal human intervention. This capability directly addresses the operational limitations of single-call models that require constant human prompting to recover from logic errors.

    Leadership at an unnamed enterprise platform company identified long-term persona stability as a primary advantage during these extended sessions. Conventional monolithic architectures often suffer from context degradation and identity drift when processing extensive conversational histories.

    “Raw output quality is on par with top frontier models, but Fugu showed unusually strong persona stability across long sessions, holding its identity where other models drift,” the executive stated. “For agent products, that may matter more than raw benchmark scores.”

    Extended benchmark validation

    Sakana AI built the internal routing logic upon extensive research into learned model orchestration. The technical foundation for the product stems from findings published in the company’s ICLR 2026 papers, specifically the Trinity and Conductor frameworks.

    These academic foundations allow Fugu to process requests by understanding precisely when a task requires delegation versus direct resolution. The internal language model dictates communication protocols between the individual agents and structures the final synthesis of their separate computational outputs.

    Validation testing against frontier AI competitors covered complex, open-ended disciplines ranging from financial time series prediction to mechanical design. Fugu also demonstrated high proficiency in niche physical logic tests and visual interpretation tasks, including solving the Rubik’s Cube and performing Japanese handwriting analysis. The capacity to excel in both quantitative financial modelling and qualitative image processing confirms the efficacy of the multi-agent orchestration approach.

    Sakana AI designed the system to scale organically as the broader AI hardware and software market matures. Because the product relies entirely on learned orchestration logic rather than fixed operational rulesets, it automatically benefits from third-party innovations. Sakana AI plans to continuously expand the available pool of expert agents.

    The engineering team will fold newly-released open-source tools and proprietary Sakana AI models into the routing pool as they become available. Both the standard Fugu and Fugu Ultra models are available to enterprise clients today.

    See also: SAP and Google Cloud deploy agentic commerce architecture

    Banner for the AI & Big Data Expo event series.

    Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and is co-located with other leading technology events including the Cyber Security & Cloud Expo. Click here for more information.

    AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

    The post Mitigating vendor lock-in with Sakana AI Fugu multi-agent models appeared first on AI News.

  • SAP and Google Cloud deploy agentic commerce architecture

    SAP and Google Cloud are deploying agentic commerce architecture to automate multi-agent marketing and retail operations at enterprise scale.

    SAP research indicates 78 percent of businesses consider AI essential for retaining customers in 2026. However, the same data reveals fewer than two in five companies share customer data across customer experience (37%) or CRM (39%) platforms. 

    Addressing this structural data failure requires direct infrastructure intervention. SAP and Google Cloud expanded their partnership to build an agentic customer experience architecture, connecting data, AI, engagement, and commerce operations.

    The deployment relies on restructuring how AI interacts with backend commercial platforms. Most digital commerce infrastructures rely on fragmented APIs. SAP Commerce Cloud adopts the Universal Commerce Protocol to standardise data exchange among retailers, payment gateways, and autonomous agents. This framework allows software to independently execute the full retail sequence, spanning initial search, transaction processing, and post-sale resolution.

    Deploying the Universal Commerce Protocol

    Engineering teams integrating the Universal Commerce Protocol facilitate direct interactions between intelligent agents and commerce platforms. The standardisation lowers integration costs and accelerates onboarding into AI-driven channels.

    SAP plans to collaborate with Google to ensure merchant products surface organically across the Gemini application and Google Search, specifically incorporating AI Mode functionalities. Consumers interact with these interfaces while the backend architecture processes inventory checks, cart management, and payment processing without requiring retailers to rebuild existing infrastructure.

    SAP Commerce Cloud integrates Google Gemini capabilities to power a designated Shopping Assistant. Brands deploy the assistant directly to their consumers to facilitate chat, voice, and text engagements. State retention remains active throughout the complete shopping cycle. The deployment ingests live behavioural inputs, current warehouse capacities, and active marketing data to assemble distinct merchandise pairings, including full event configurations. By continuously refining recommendations, the application ensures high relevance and strict physical fulfilment capability.

    Enterprise systems often fail when promotional campaigns trigger demand that physical inventory cannot satisfy. Frontend interfaces failing to synchronise with backend warehouse systems frequently halt digital purchases. Users regularly click promotional emails, load the associated mobile application, and face sudden out-of-stock notices during checkout. Fulfilment updates experience severe delays, leaving support agents without a complete operational picture. SAP and Google Cloud engineered their joint solution to correct these specific systemic customer experience failures.

    Instead of managing disconnected points of contact, the architecture unifies the entire sequence. Traditional commercial setups require consumers to repeatedly input previously shared information. Support staff frequently lack access to unified records, preventing them from resolving issues efficiently. The integration targets these operational breakdowns, ensuring the system recognises the user and their precise context instantly across all digital properties.

    Bidirectional data flows

    Marketing execution demands highly accurate data pipelines. SAP Engagement Cloud partners with Google Cloud to formulate an autonomous multi-agent framework. The technical foundation relies on SAP Business Data Cloud Connect for Google BigQuery. The deployment relies on bidirectional, zero-copy data linking secured by strict administrative controls. Leaving vast data stores in place rather than duplicating them drops storage expenses and network latency.

    BigQuery ingests live variables like weather conditions, precise locations, and active advertising interaction rates. SAP Customer Experience solutions supply the internal behavioural context, tracking customer profiles, exact transaction histories, specific service interactions, and consented engagement records. SAP Engagement Cloud activates the combined intelligence, deploying autonomous agents to orchestrate personalised interactions throughout the customer lifecycle.

    Routing information through the Business Data Cloud while BigQuery handles the logic forces immediate inventory synchronisation. The Shopping Assistant actively queries live warehouse records before displaying any product. Software checks physical supply against consumer requests, verifying availability prior to making the suggestion.

    Generative execution in production environments

    Advanced generative models dictate the localised output of the marketing campaigns. Google Gemini models, specifically including the Nano Banana 2 iteration, provide specialised agentic skills. The models dynamically generate localised messaging, customised imagery, and campaign variations based on the exact specifications provided by the bidirectional data flow.

    The deployment upgrades standard text messages into immersive and interactive interfaces via Google Rich Communication Services. Advertising creatives evolve continuously based on incoming engagement data. The system processes the interaction, evaluates the response against the user profile, and instructs the Nano Banana 2 model to adjust the subsequent communication.

    Marketing departments achieve high efficiency by abandoning manual execution. Instead of configuring rigid campaign parameters, teams establish business goals and provide enterprise data access to the SAP Engagement Cloud. The autonomous agents coordinate the necessary steps, segmenting audiences based on Google BigQuery analytics and generating specific content variations through Google Gemini models.

    Evaluating the infrastructure impact

    Deploying the architecture restructures standard commerce operations. Consumers dictate their purchasing intent to search engines and conversational interfaces. The embedded AI agents process the intent, navigate the Universal Commerce Protocol connections, and complete the purchase directly against the enterprise backend.

    Retailers retain full ownership of the customer relationship despite the transaction occurring within a third-party environment. The architecture captures the consented engagement data, feeding the transaction history back into the SAP Customer Experience solutions. The system updates the localised customer profile, providing the Google Gemini models with fresh context prior to the next engagement cycle.

    The system continuously improves campaign performance without requiring direct human intervention. The multi-agent framework evaluates the success of a generated Rich Communication Services text message, adjusting the variables prior to the next automated dispatch.

    See also: Computer vision deployments drive retail productivity gains

    Banner for the AI & Big Data Expo event series.

    Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and is co-located with other leading technology events including the Cyber Security & Cloud Expo. Click here for more information.

    AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

    The post SAP and Google Cloud deploy agentic commerce architecture appeared first on AI News.

  • Computer vision deployments drive retail productivity gains

    Computer vision deployments are driving retail productivity gains as operators automate physical shelf tracking to protect eroding margins.

    This hardware deployment directly addresses the persistent in-store execution failures currently costing the industry billions. A study authored by Coresight Research – in partnership with technology providers Simbe and RELEX Solutions – calculates the exact cost of these operational shortfalls.

    Inefficiencies consume 6.4 percent of gross sales across the sector. Hardware, mass merchandise, and grocery categories will surrender $196.4 billion to these operational failures in 2026. The monetary value of these losses is jumping 21 percent over the previous year. This deficit vastly outpaces the three percent projected sales growth for the entire sector.

    Nine in ten retailers report active difficulties managing their shop floors. Empty shelves and inaccurate pricing structures directly suppress operating margins. Margin erosion exceeds five percent for 89 percent of operating businesses.

    Full-scale deployments of store intelligence platforms operate across 60 percent of enterprise footprints. This adoption rate represents an 18-percentage-point jump year-over-year.

    Experimental pilot programmes account for a mere 18 percent of current market activity. The adoption curve skews heavily toward top-tier enterprises. 73 percent of retail companies generating over $5 billion in annual revenue maintain fully scaled deployments.

    Mid-market operators lag behind, with only 42 percent of sub-$1 billion companies achieving similar deployment maturity. Treating physical stores as separate entities from digital channels degrades customer lifetime value. Capital expenditure directly targets out-of-stock tracking, automated pricing, planogram verification, and assortment planning.

    Production deployments in hardware and grocery

    BJ’s Wholesale Club provides a documented case study of applied shelf digitisation. The operator deployed Simbe robotics platforms to monitor inventory and price accuracy across its locations.

    Management used this hardware foundation to generate digital twins of individual warehouse clubs. This application established real-time visibility systems previously absent from their physical operations.

    BJ’s applied these digital models to route planning for online orders and curbside fulfillment. The engineering team recorded a 40 percent year-over-year improvement in picking efficiency through this data application. CEO Bob Eddy reported the technology enabled the company to elevate quality standards within fresh merchandise categories.

    Grocery operator Albertsons applies AI to automate complex retail operations. The grocer targets $1.5 billion in productivity gains spanning three fiscal years. CEO Susan Morris explained: “We will be equipping our merchants with AI-driven insights and automated execution to optimise pricing, promotions, and assortment decisions, transforming category management and driving margin improvement.

    “Our vision is the future where intelligent automation guides these decisions, freeing our people to focus on strategy and innovation.”

    Flaws in deployment sequencing

    Many organisations prioritise the installation of pricing software while ignoring foundational sensor infrastructure. 43 percent of surveyed technology leaders direct their capital toward pricing optimisation software.

    Supplier collaboration platforms rank second in priority, attracting investment from 36 percent of operators. Only 33 percent of these organisations invest in the shelf digitisation hardware required to feed accurate data into those pricing models.

    This hardware includes the sensors and cameras needed to verify physical stock availability. Store intelligence deployments require strict sequencing to function properly. Retailers must first digitise the shelf, deploy data analytics, install inventory tracking software, and finally execute pricing automation.

    This inversion of the technology stack creates downstream data failures. Markdown algorithms process outdated inventory counts when physical tracking sensors are absent. Mispricing rates hit 13 percent in 2026, marking a four-point increase since 2024.

    Pricing and promotional execution dominates the priority list, presenting an active difficulty for 92 percent of operators. Kim Anderson, VP of Store Operations at Schnucks Markets, states that shelf data must precede all other implementations. Without accurate physical inventory monitoring, downstream applications fail to meet their performance targets.

    Out-of-stock events remain severely disruptive, with 52 percent of operators ranking inventory availability as highly demanding. Operators attempt to fix multiple problems simultaneously, with 40 percent directing capital toward three or more operational inefficiencies at once.

    Labour reallocation and efficiency metrics

    Lowe’s demonstrates the financial impact of automating the associate workflow through its ‘Perpetual Productivity Improvement’ initiative. Executive VP of Stores Joseph McFarland directed the deployment of workforce management tools and inventory solutions to eliminate redundant associate tasks.

    The engineering rollout saved 80 non-productive labour hours per store on a weekly basis. Lowe’s advanced the initiative by deploying full shelf replenishment technologies powered by AI to track stock depletion in real-time.

    Management distributed financial bonuses to the workforce based on documented productivity enhancements. The company issued $5,000 to associate store managers and varied payouts to hourly staff.

    Broad industry data validates the performance metrics recorded by Lowe’s. The deployment of intelligence applications drives a 14 percent average reduction in time spent on manual store tasks. 86 percent of organisations record defined decreases in manual assignment hours.

    Retailers report distinct performance disparities based on total revenue. 56 percent of operators generating over $5 billion report advanced reductions in task completion times, compared to only 36 percent of mid-market companies.

    Organisations cite operational efficiency as their primary investment objective, followed closely by the unification of store data. Retailers expect these tools to generate new capital, with 40 percent of leaders seeking to establish alternative revenue streams like retail media networks.

    Securing market competitiveness

    Store intelligence technologies function as an interconnected ecosystem rather than standalone fixes for isolated problems. Deploying these systems without a coherent sequencing plan forces operators to build upon an unstable foundation.

    Establishing real-time, shelf-level visibility proves strictly necessary before attempting to scale downstream software. Pricing automation, supplier collaboration platforms, and inventory forecasting applications require verified physical data to generate accurate outputs.

    Customer behaviour responds directly to correct operational upgrades. Proper deployments increase customer lifetime value by 11 percent across the sector, while conversion rates improve for 50 percent of the operators executing physical automation frameworks.

    48 percent of companies record increased enrollment in their loyalty programmes following system integration. Accurate pricing and consistent stock availability elevate online review metrics for 47 percent of surveyed operators.

    Retailers compounding value through integrated, properly sequenced hardware and software capabilities possess a distinct market advantage over competitors accumulating disconnected applications.

    See also: HSBC expands AI banking partnership with Google Cloud

    Banner for the AI & Big Data Expo event series.

    Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and is co-located with other leading technology events including the Cyber Security & Cloud Expo. Click here for more information.

    AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

    The post Computer vision deployments drive retail productivity gains appeared first on AI News.

  • Microsoft sells OpenAI models in China. OpenAI and Anthropic won’t.

    Microsoft has quietly become the main supplier of OpenAI models in China, selling the technology to the country’s largest internet companies even as OpenAI and Anthropic keep their own models out of the market on intellectual-property and misuse grounds. The arrangement, detailed this week by Bloomberg, hands Microsoft a position no other American AI vendor holds: it sells the GPT series to Chinese firms that the model’s own creator will not deal with directly.

    The scale is not trivial. ByteDance has been Microsoft’s largest AI customer in recent years, running largely on OpenAI models, and is on track to spend more than US$1 billion a year on Microsoft’s AI and cloud services, people familiar with the matter told Bloomberg. Ant Group, Meituan and Tencent also buy AI models through Azure, though Ant says it develops its own models and that its core products do not rely on outside systems.

    Inside Microsoft, the growth has been celebrated rather than played down. Azure’s AI revenue in China expanded faster than in any other sales territory, roughly tripling in the financial year to June 2025 after climbing about 400% the year before, then-chief commercial officer Judson Althoff told staff at a July 2025 sales meeting, according to a transcript reviewed by Bloomberg

    Althoff described Microsoft as the one company “bringing those two places together,” meaning the AI hubs of the US West Coast and China’s east. President Brad Smith has separately told US lawmakers that the China business accounted for roughly 1.5% of the company’s revenue in 2024.

    Why OpenAI models in China run through Microsoft alone

    The reason comes down to Microsoft’s singular contract with OpenAI, which lets it set its own terms for selling GPT models abroad. Both OpenAI and Anthropic have declined to sell into China directly, and Anthropic’s models are absent from Microsoft’s China line-up altogether. That leaves Microsoft acting as the intermediary for models whose makers have decided the Chinese market is too risky to serve.

    Risk is the recurring tension. OpenAI has privately pressed Microsoft to do more to stop Chinese customers from “distilling” its models, Bloomberg reported, a technique that uses one model’s outputs to train another. Microsoft points to automated monitoring and a rule that it sells only to established companies rather than individual developers. 

    Yet sources told Bloomberg that Chinese buyers face no heightened scrutiny, and synthetic data generated from the models is difficult to police. To limit its exposure, Microsoft does not host the OpenAI models on Chinese soil; customers reach them over the internet from data centres elsewhere, Singapore among them.

    The contradiction sharpens when you look at what Microsoft hosts alongside GPT. It added DeepSeek’s R1 to Azure AI Foundry in January 2025, and this month confirmed to Axios that it is testing a fine-tuned, Azure-hosted version of DeepSeek-V4 as a cheaper option for Copilot Cowork, the enterprise agent currently powered by OpenAI and Anthropic models. So Microsoft is selling a Chinese model into Western businesses while selling American models into Chinese ones, taking the margin on both legs of the trade.

    Whether the balancing act survives the politics is another matter. The China business is contentious in Washington, where lawmakers have cast the country’s AI push as a threat to American industry, and OpenAI’s private objections could grow louder. For now, Microsoft owns the market for OpenAI models in China, and it is the only player being paid by both sides.

    See also: China’s DeepSeek V3.2 AI model achieves frontier performance on a fraction of the computing budget

    Banner for AI & Big Data Expo by TechEx events.

    Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and is co-located with other leading technology events, click here for more information.

    AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

    The post Microsoft sells OpenAI models in China. OpenAI and Anthropic won’t. appeared first on AI News.

  • Google Cloud generative AI automates council planning operations

    Government ministries are deploying Google Cloud generative AI across municipal agencies to automate council planning operations.

    Public sector administration handles vast volumes of unstructured data that delay infrastructure development. The UK central government established a target to construct 1.5 million new homes by 2029. Local planning authorities encounter administrative backlogs caused by dense paperwork, delaying these development timelines.

    To address these constraints, the Ministry of Housing, Communities and Local Government (MHCLG) and the Department for Science, Innovation and Technology (DSIT) expanded two machine learning tools designed to accelerate municipal processing. Speaking at the Google Cloud Summit London, officials confirmed the nationwide deployment of the ‘Extract’ application and the progression of the ‘Augmented Planning Decisions’ (APD) prototype.

    Lila Ibrahim, Chief AI Readiness Officer at Google DeepMind, said: “The UK has an opportunity to build the homes our communities need, but local councils face a mountain of paperwork. That’s why we’re co-creating a sophisticated planning tool directly with councils to solve real-world bottlenecks.

    “This will help significantly cut decision times, freeing up planners to focus on the future to get Britain building faster.”

    Householder applications – which include routine domestic modifications such as loft conversions or property extensions – account for nearly 70 percent of all planning applications submitted annually. Evaluating these standard submissions manually requires planning officers to spend hours cross-referencing regional policy documents, historical archives, and unstructured PDF files.

    Such a repetitive evaluation process consumes administrative hours that would otherwise support major infrastructure and commercial developments. The deployment of automation targets this administrative distribution, aiming to reduce application decision timelines by 50 percent.

    Core capabilities of the Google Cloud generative AI tools

    Engineers at MHCLG and the government’s applied AI team, the Incubator for AI (i.AI), built the Extract tool internally using Gemini foundation models. Following trials across more than 20 local planning authorities, administrators expanded the application to every council in England.

    Extract parses unstructured data locked within legacy PDF records, converting hundreds of pages of historical planning documentation into structured digital datasets within minutes. Operational data from the trial phases indicates that the tool will eliminate roughly 255 hours of manual data entry per council annually. This reduction allows local authorities to reallocate personnel to complex evaluation tasks.

    Integrating large language models into public sector workflows requires enterprise-grade security environments. Local authorities process sensitive civic records, requiring strict risk management protocols to prevent data exposure.

    The government hosted the Gemini models on Google Cloud to establish a protected operating environment where data sovereignty is maintained. The cloud environment features active security controls to block malicious inputs, including prompt injection attacks. This technical framework ensures that sensitive municipal data remains secure during both testing and production computing cycles.

    The APD system, meanwhile, acts as an analytical assistant for municipal planning officers by automating four primary administrative tasks:

    1. The system consolidates incoming documentation by pre-processing data backlogs, flagging missing information gaps, and extracting core geographical site data onto a unified user interface for officer review.
    2. The software identifies relevant national and local zoning laws, assesses compliance margins, and appends precise policy citations for manual verification.
    3. The application parses public consultation letters, summarising stakeholder objections or historical legal precedents.
    4. The model generates initial drafts of final evaluation reports, including the technical rationale and recommended approval conditions.

    Protocols dictate that human planning officers retain final decision-making authority over every application. The software does not automate final approvals or rejections independently. Staff members review every line of text generated by the machine learning models, modifying the analytical reasoning before validating the report.

    To maintain regulatory accountability, the APD prototype records its internal processing steps sequentially. This mechanism establishes an auditable chain of thought, creating a verification trail for every processed application to support the officer’s final determination.

    Local council planning trials and scaling timelines

    The development of the APD prototype relies on a collaborative framework linking public sector administrators with engineering teams from Google Cloud, Google DeepMind, and Faculty.

    The alpha version undergoes live testing within three local authorities: the London Borough of Barnet, Dorset Council, and the London Borough of Camden. Testing across these distinct regional jurisdictions provides developers with varied municipal datasets to test the software against diverse local policies. 

    Central planners intend to complete the alpha phase and deploy the APD tool to all 300-plus English local authorities by 2027. Google Cloud provides the elastic computing infrastructure required to manage the thousands of concurrent inferencing queries generated during daily operations.

    Paul Maltby, Director of Public Services at Faculty, commented: “The English planning system is clogged up. Planning officers are forced to spend half their time reviewing applications to convert an attic, putting those for housing estates and warehouses on hold.

    “Built with planning officers, our AI system will take the drudgery out of reviewing simple planning applications so they can make quick decisions. It will let planning officers focus on the major developments that matter, and crucially, let families improve their homes without months of delay and uncertainty.”

    Naisha Polaine, Executive Director for Growth at Barnet Council, added: “The tool’s ability to collect relevant information, undertake a provisional assessment, and draft the foundations of a report has the potential to save significant officer time spent working on the administration of planning applications and direct this to speeding up the decision-making process for residents. In turn, this will contribute significantly to delivering our house building growth targets in the borough.”

    The coordination between MHCLG, i.AI, Google DeepMind, and Faculty establishes a structured division of labour for enterprise software engineering. Public ministries define the policy guidelines and statutory boundaries, while external technical partners engineer and deploy the underlying model architectures.

    The successful integration of these systems demonstrates the feasibility of hosting advanced language models within a secured public cloud infrastructure to process core administrative workloads and modernise public service delivery.

    See also: EU publishes its AI content labelling playbook ahead of the AI Act’s August deadline

    Banner for the AI & Big Data Expo event series.

    Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and is co-located with other leading technology events including the Cyber Security & Cloud Expo. Click here for more information.

    AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

    The post Google Cloud generative AI automates council planning operations appeared first on AI News.

  • Insurers pivot AI strategy toward core risk underwriting

    AI investments by insurers are now expected to generate tangible business value beyond mere efficiency.

    According to findings in the 2026 Evident AI Index, insurers are now embedding AI technologies into workflows that directly influence underwriting discipline and capital allocation.

    Christian Preece, Insurance Director at Evident, says: “For years, insurers have competed on AI ambition, but now the focus is shifting from what insurers are building to the value they’re creating. In itself, it’s a sign of AI maturity to have the internal capability to measure these figures and be confident enough to disclose them.

    “As the first industry leaders disclose hard return on investment data, they’re providing the kind of evidence that shareholders and boards have been looking for in light of increasing concerns around the costs of AI, and we can expect to see more insurers going public in the coming year.”

    While the broader insurance workforce experienced a contraction of 2.2 percent over the past year, the AI-specialist headcount expanded by 32 percent across the 30 insurers tracked in the report. This personnel shift highlights a transition from building data foundations to the integration and optimisation of business-specific AI use cases.

    Data engineering remains a component of this investment, yet its relative share of the talent stack is declining as roles focused on AI development and software implementation gain priority. AI specialists now represent one in every 50 employees at insurers included in the Index.

    Executive structures are also adapting to these requirements. Nearly 40 percent of the insurers indexed now designate a senior leader with explicit responsibility for AI. Most of these appointments occurred within the last 12 months, creating a new level of executive oversight for AI-driven growth.

    This governance is vital as firms shift from isolated point solutions toward agentic AI systems that coordinate actions across multiple stages of the policy administration and claims lifecycle. Notably, the adoption of agentic AI has surged, with one in four newly disclosed use cases now showing evidence of agentic orchestration, compared to one in twenty only six months prior.

    Zurich sets an example

    Zurich serves as an example of this transition, rising from 12th position to 4th in the global rankings by emphasising a shared platform model over decentralised experimentation.

    The insurance giant deployed ZurichIQ, a modular generative AI platform integrated into underwriting, claims, legal, and service operations. This architecture provides a unified environment for various functional tools, such as PolicyIQ for contract comparisons and GuidelinelQ for enforcing underwriting standards.

    Hurdles in such deployments typically involve maintaining oversight across diverse business lines. Zurich manages these risks through a dedicated committee that governs AI investment and model risk management. The platform approach allows the insurer to push AI capabilities into daily production while maintaining a consistent governance framework, which is reinforced by internal training programs like the £1.3m AI apprenticeship initiative.

    Ericson Chan, Group Chief Information & Digital Officer at Zurich, said: “Being recognised as the biggest AI growth insurer in the Evident AI Index is not simply a reflection of technology adoption; it signals a broader transformation from use cases to enterprise-wide execution and change.

    “This recognition reinforces our conviction in our AI360 strategy, embedding intelligence into workflows, decisions, and customer outcomes across the value chain. AI is no longer a technology initiative. It is becoming Zurich’s operating system.”

    Focus on risk selection and ROI

    With claims typically accounting for 60 to 80 percent of premium income, even minor improvements in fraud detection and risk selection produce a disproportionate financial impact compared to general administrative cost reduction.

    Insurers are now directing venture capital and internal innovation efforts toward data sources that enable more dynamic analysis of climate volatility and cyber threats. A critical marker of this maturity is the ability to quantify and disclose financial returns.

    Manulife, Generali, and Intact Financial have led this effort, publicly reporting AI-driven value. Projections indicate these three firms will generate over $1 billion in AI-driven value by the end of their respective reporting periods. This transparency provides the hard data shareholders demand regarding the costs of AI deployment, effectively mandating more rigorous performance measurement across the sector.

    Success in the next phase of industry adoption depends on the ability to translate these technical investments into better underwriting results. Market leaders Allianz (which now holds the largest AI talent pool in the industry and has registered 900 AI use cases worldwide) and AXA maintain top positions by demonstrating sustained investment across innovation, talent, and transparency pillars.

    Barbara Karuth-Zelle, Member of the Board of Management and Group COO at Allianz, commented: “AI didn’t change our ambition. It accelerates how we deliver on it at scale.

    “Behind this ranking are thousands of moments: a claim processed faster, a customer experience reimagined, a partner better connected, a colleague freed up for what truly matters. And we are determined to keep going—an inspiring, transformative journey.”

    See also: Accenture: Consumers show growing trust in AI shopping agents

    Banner for AI & Big Data Expo by TechEx events.

    Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and is co-located with other leading technology events including the Cyber Security & Cloud Expo. Click here for more information.

    AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

    The post Insurers pivot AI strategy toward core risk underwriting appeared first on AI News.

  • HarmonyOS 7 steps into the AI gap Apple left open in China

    Four days after Apple confirmed that Siri AI would not launch in China, Huawei took the stage in Dongguan and declared HarmonyOS 7 the beginning of the agent era. The gap Apple could not fill, Huawei has moved into with an architecture built specifically for it.

    What HarmonyOS 7 actually changes

    The headline change is the HarmonyOS Intelligent Agent Framework 2.0, which restructures the OS around what Huawei calls an “intent-as-service” model, compressing what previously required multiple app navigation into a single natural-language command.

    At the centre of this is Xiaoyi, Huawei’s AI assistant, rebuilt from a conventional voice tool into what the company describes as a system-level intelligence agent. Xiaoyi now controls over 2,100 system-level capabilities and coordinates with more than 2,000 third-party AI agents developed across Huawei’s developer ecosystem. 

    Richard Yu, chairman of Huawei’s Consumer Business Group, framed the release as a generational inflexion point: “In 2019, HarmonyOS was born. In 2023, native HarmonyOS apps began. In 2026, HarmonyOS enters the Agent era.”

    Underneath sits openPangu 2.0, Huawei’s updated foundation model, with 505 billion parameters in its Pro version and 92 billion in the Flash variant, both supporting 512K context windows. On-device models at 30 billion parameters are due on Kirin chips by autumn 2026. HarmonyOS 7 also delivers a 15%-plus performance improvement over HarmonyOS 6.1, according to Huawei’s own benchmarks. 

    The task execution rate claimed is above 90%, though that figure is Huawei’s own and has not been independently verified.

    The market position is consolidating

    The numbers shared at HDC 2026 reflect a shift that has already happened. In Q1 2026, HarmonyOS held 19% of China’s smartphone OS market against Apple iOS at 16%, with Android at 65%. HarmonyOS first overtook iOS in China in Q2 2025, according to Counterpoint Research.

    That trajectory matters more than any single feature because China is simultaneously the market Apple cannot currently operate in at the AI level and the one Huawei has fully optimised for. The agent network Xiaoyi coordinates includes partnerships with Ctrip for travel planning and Ant Medical for health data analysis, services woven into the Chinese consumer stack that Apple’s architecture does not reach.

    Where the limits are

    The scope of the challenge to Apple needs calibrating. HarmonyOS 7 is currently in developer beta, with the stable consumer release expected this autumn. The 2,000-plus AI agents are anchored in the Chinese app ecosystem. 

    The platform counts more than 400,000 applications and services, which is significant but still a fraction of what Apple’s App Store carries. Huawei’s ambitions to take HarmonyOS international remain aspirational for now.

    There is also a design note that softens any clean divergence narrative: HarmonyOS 7 adopts the same Liquid Glass aesthetic Apple introduced with iOS 26, and Samsung brought to One UI 9. Visual language converges even as underlying architectures and regulatory environments pull in opposite directions.

    The longer arc

    HarmonyOS exists because of US sanctions. When Huawei lost access to Google’s Android in 2019, it built its own OS from necessity. By January 2026, over 90% of Huawei devices were running the fully homegrown version. That forced independence is now a structural advantage in the one market where Apple cannot currently deploy its headline AI feature.

    Sanctions built the platform. Regulatory friction cleared its path.

    See also: Siri AI arrives with Google inside, and much of the world is locked out

    Banner for the AI & Big Data Expo event series.

    Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and is co-located with other leading technology events including the Cyber Security & Cloud Expo. Click here for more information.

    AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

    The post HarmonyOS 7 steps into the AI gap Apple left open in China appeared first on AI News.