NVIDIA benchmarks show Run:ai platform doubles GPU utilization while cutting latency 61x for enterprise AI deployments running NIM inference microservices. (ReadNVIDIA benchmarks show Run:ai platform doubles GPU utilization while cutting latency 61x for enterprise AI deployments running NIM inference microservices. (Read

NVIDIA Run:ai Delivers 2x GPU Utilization Gains for AI Inference Workloads

2026/02/28 01:35
Okuma süresi: 3 dk
Bu içerikle ilgili geri bildirim veya endişeleriniz için lütfen crypto.news@mexc.com üzerinden bizimle iletişime geçin.

NVIDIA Run:ai Delivers 2x GPU Utilization Gains for AI Inference Workloads

Caroline Bishop Feb 27, 2026 17:35

NVIDIA benchmarks show Run:ai platform doubles GPU utilization while cutting latency 61x for enterprise AI deployments running NIM inference microservices.

NVIDIA Run:ai Delivers 2x GPU Utilization Gains for AI Inference Workloads

NVIDIA has released comprehensive benchmarking data showing its Run:ai orchestration platform can double GPU utilization for enterprises running AI inference workloads, while simultaneously slashing first-request latency by up to 61x compared to traditional cold-start deployments.

The findings come as organizations struggle with a fundamental tension in LLM deployment: small embedding models might consume just a few gigabytes of GPU memory, while 70B+ parameter models demand multiple GPUs. Without intelligent orchestration, teams face an ugly choice between overprovisioning (burning money) and underprovisioning (degrading user experience).

The Numbers That Matter

NVIDIA tested three NIM microservices—a 7B LLM, 12B vision-language model, and 30B mixture-of-experts model—on H100 GPUs. The results challenge conventional deployment wisdom.

Using GPU fractions with bin packing, three models that previously required three dedicated H100s were consolidated onto approximately 1.5 H100s. Each NIM retained 91-100% of single-GPU throughput. Mistral-7B matched its dedicated-GPU performance completely at 834 tokens per second with long-context input.

Dynamic GPU fractions pushed performance further under heavy load. Nemotron-3-Nano-30B sustained 1,025 tokens per second at 256 concurrent requests—compared to a static-fraction ceiling of just 721 tokens per second at four concurrent requests before instability. That's a 1.4x throughput improvement when traffic spikes hit.

Cold Start Problem Solved

The most dramatic gains came from GPU memory swap, which keeps models in CPU memory and dynamically moves weights to GPU as requests arrive. Scale-from-zero cold starts took 75-93 seconds for first-token generation at 128-token input. GPU memory swap cut that to 1.23-1.61 seconds—a 55-61x improvement.

For longer 2,048-token prompts, cold-start times of 158-180 seconds dropped to under 4 seconds with swap enabled.

Market Context

NVIDIA stock trades at $181.24, down 2.42% in the past 24 hours, with a market cap of $4.49 trillion. The company has been aggressively expanding its AI infrastructure partnerships. Red Hat and NVIDIA launched a co-engineered AI Factory platform on February 25, while VAST Data announced a platform tie-up on February 26.

Run:ai's fractional GPU capabilities have shown production-ready results in cloud provider benchmarks. Testing with Nebius demonstrated support for 2x more concurrent users on existing hardware.

What This Means for Enterprise AI

The practical implication: organizations can deploy more models on fewer GPUs without sacrificing latency SLAs. Static fractions work well for predictable, low-concurrency workloads. Dynamic fractions handle variable traffic and high concurrency where KV-cache growth creates memory pressure.

GPU memory swap eliminates the penalty for keeping rarely-accessed models available—critical for organizations running diverse model portfolios where some endpoints see sporadic traffic.

NVIDIA has published deployment guides for running NIM as native inference workloads on Run:ai. The platform supports single-GPU, multi-GPU, and fractional deployments with Kubernetes-native traffic balancing and autoscaling.

Image source: Shutterstock
  • nvidia
  • gpu optimization
  • ai infrastructure
  • enterprise ai
  • machine learning
Piyasa Fırsatı
NodeAI Logosu
NodeAI Fiyatı(GPU)
$0.02349
$0.02349$0.02349
-5.92%
USD
NodeAI (GPU) Canlı Fiyat Grafiği
Sorumluluk Reddi: Bu sitede yeniden yayınlanan makaleler, halka açık platformlardan alınmıştır ve yalnızca bilgilendirme amaçlıdır. MEXC'nin görüşlerini yansıtmayabilir. Tüm hakları telif sahiplerine aittir. Herhangi bir içeriğin üçüncü taraf haklarını ihlal ettiğini düşünüyorsanız, kaldırılması için lütfen crypto.news@mexc.com ile iletişime geçin. MEXC, içeriğin doğruluğu, eksiksizliği veya güncelliği konusunda hiçbir garanti vermez ve sağlanan bilgilere dayalı olarak alınan herhangi bir eylemden sorumlu değildir. İçerik, finansal, yasal veya diğer profesyonel tavsiye niteliğinde değildir ve MEXC tarafından bir tavsiye veya onay olarak değerlendirilmemelidir.

Ayrıca Şunları da Beğenebilirsiniz

Ondo Finance Launches USDY Yieldcoin on Stellar, Bringing Tokenized U.S. Treasuries to Users

Ondo Finance Launches USDY Yieldcoin on Stellar, Bringing Tokenized U.S. Treasuries to Users

Ondo Finance, a U.S.-based digital asset firm specializing in bringing traditional financial products on-chain through tokenization, is expanding its yieldcoin USDY to the Stellar network. This lates update marks a step forward in merging tokenized real-world assets with a global payments infrastructure, unlocking new opportunities for users worldwide. The announcement was made at the Stellar Meridian event in Copacabana, Rio de Janeiro, on September 17. USDY Joins the Stellar Ecosystem Ondo Finance, a recognized leader in tokenized real-world assets, announced the deployment of United States Dollar Yield (USDY) on Stellar, the payments-focused blockchain known for speed and low transaction costs. USDY is the most widely available “yieldcoin,” offering investors access to onchain assets backed by U.S. Treasuries. This launch allows Stellar’s global user base to tap into permissionless, yield-bearing assets tied to one of the safest financial instruments in the world. It also aligns with Stellar’s mission of driving fast, affordable cross-border payments. Combining Yield with Payments Infrastructure “Stablecoins unlocked global access to the U.S. dollar. With USDY, we’re taking the next step by bringing U.S. Treasuries onchain in a form that combines stability, liquidity, and yield,” said Ian De Bode, Chief Strategy Officer at Ondo Finance. “Fast, affordable cross-border payments are at the center of what Stellar was designed to do. The global reach of the Stellar ecosystem combined with a yield-bearing asset like USDY levels up what is possible onchain, allowing wallets and businesses to offer yield opportunities to their users,” said Denelle Dixon, CEO of the Stellar Development Foundation. Ondo claims by pairing USDY with Stellar’s infrastructure, new possibilities open up in treasury management, collateralization, and everyday financial applications. Unlocking Institutional and Retail Use Cases USDY currently manages over $650 million in total value locked (TVL) across nine blockchains and offers a 5.3% APY. By launching on Stellar, Ondo Finance extends these benefits to global retail and institutional users. The firm explains balances on Stellar can now become productive, supporting use cases such as onchain savings, institutional treasury strategies, cost-efficient collateral for DeFi protocols, and remittance flows that carry yield rather than remaining static. A Milestone for Tokenized Treasuries With the integration of USDY, Stellar users gain more than just access to stable-value assets—they gain access to institutional-grade yield. For investors outside the U.S., the launch represents a new way to combine the safety of Treasuries with the accessibility of blockchain technology. As tokenization accelerates globally, Ondo Finance’s decision to deploy USDY on Stellar reinforces the narrative that blockchain is not just about speculation, but about reimagining the global financial system through secure, yield-bearing digital assets
Paylaş
CryptoNews2025/09/18 00:46
MetaMask Token is Coming ‘Sooner’ Than Expected: Consensys CEO

MetaMask Token is Coming ‘Sooner’ Than Expected: Consensys CEO

The MetaMask token launch "may come sooner than you would expect," says Joe Lubin, CEO of Consensys.
Paylaş
Coinstats2025/09/19 14:16
Based Eggman $GGs Grabs Ethereum Investors’ Focus in 2025 Institutional Presale Rally

Based Eggman $GGs Grabs Ethereum Investors’ Focus in 2025 Institutional Presale Rally

Ethereum holders are shifting attention to Based Eggman $GGs, a new crypto token presale making waves in the crypto presale list of 2025 among the top crypto presales.
Paylaş
Blockchainreporter2025/09/18 01:30

Trade GOLD, Share 1,000,000 USDT

Trade GOLD, Share 1,000,000 USDTTrade GOLD, Share 1,000,000 USDT

0 fees, up to 1,000x leverage, deep liquidity