Aether Computing Launches Serverless GPU Platform, Slashing AI Infrastructure Costs by Up to 50%
SAN FRANCISCO – November 26, 2025 – Aether Computing today announced the general availability of its serverless GPU platform, a fully managed infrastructure solution designed to eliminate resource provisioning complexity for machine learning teams. The service enables developers to deploy training and inference workloads on NVIDIA H100 and A100 GPUs with automatic scaling, sub-second billing, and zero server management, addressing a critical market need as AI model sizes grow 10x annually and infrastructure costs become a primary barrier to innovation.
The launch arrives as enterprises increasingly abandon fixed-capacity GPU clusters in favor of elastic consumption models. According to recent market analysis, serverless GPU adoption has accelerated 340% year-over-year as organizations seek to optimize costs for intermittent ML workloads . This shift reflects a broader industry recognition that traditional cloud provisioning leaves GPUs idle 30-50% of the time during debugging, experimentation, and off-peak periods. Aether Computing’s platform automatically provisions resources within 300 milliseconds of API calls, scales from zero to thousands of GPUs based on demand, and charges purely for compute seconds used—eliminating the waste typical of always-on instances while maintaining performance parity with dedicated hardware.
The platform architecture addresses key pain points identified in production ML environments. It supports containerized workloads through native Kubernetes integration and pre-configured ML environments including PyTorch, TensorFlow, and JAX, all optimized with CUDA 12.4 and cuDNN 9.0. For distributed training scenarios, the service provides 3.2 Tbps InfiniBand networking that enables near-linear scaling across multi-node clusters—a critical capability for training models exceeding 100 billion parameters. Automatic checkpointing to persistent NVMe storage occurs every 15 minutes by default, preventing data loss during preemption events. The platform’s intelligent cold-start optimization leverages pre-warmed container pools and model weight caching, reducing initialization latency by 85% compared to traditional serverless offerings and achieving sub-2-second boot times for common ML frameworks.
Customer validation demonstrates compelling economics across diverse use cases. Beta testing with 50 organizations revealed that AI startups fine-tuning Llama-3 variants reduced GPU spend by 48% while cutting deployment cycles from three weeks to under 10 minutes. Enterprise ML teams running episodic batch inference on computer vision models reported 62% cost savings by scaling to zero during inactive periods. Biotech researchers leveraging the platform for AlphaFold2 protein structure predictions eliminated $18,000 monthly overhead from idle A100 clusters, provisioning resources only during active 6-hour computation windows. These results align with industry data showing serverless models can reduce total cost of ownership by 40-70% for workloads with utilization variance exceeding 60%.
The competitive landscape underscores Aether’s differentiation. While providers like RunPod and Hyperstack offer container-based serverless GPUs, they typically require manual orchestration configuration and lack enterprise-grade SLAs. Aether provides 99.95% uptime guarantees with 24/7 technical support from ML infrastructure specialists, SOC 2 Type II compliance, and GDPR-ready data residency across 12 global regions. Pricing starts at $1.85/hour for H100 SXM5 instances with 80GB VRAM—40-60% below AWS and Google Cloud—and $1.10/hour for A100 PCIe GPUs. The platform’s inference engine automatically balances workloads across GPU clusters to maintain sub-100ms P95 latency for real-time applications while optimizing for cost efficiency in batch processing modes.
“This fundamentally changes who can build and deploy state-of-the-art AI,” said Dr. Sarah Chen, CEO of Aether Computing. “For too long, Cloud Provider Debuts Serverless GPU Offering for ML Workloads GPU infrastructure has been a gatekeeper, requiring specialized DevOps expertise and massive upfront commitments. We’re democratizing access so that a two-person startup can train a 70-billion-parameter model or deploy production inference endpoints with the same infrastructure efficiency as a tech giant—paying only for actual compute, not idle capacity.”
Market projections validate the opportunity. The serverless GPU segment is expected to reach $4.7 billion by 2026, representing 23% of the total cloud AI infrastructure market. Demand is particularly acute in generative AI, where variable user traffic creates utilization volatility exceeding 80% for inference workloads. Aether’s launch includes a free tier providing 10 GPU-hours monthly for experimentation and startup credits worth $5,000 for qualifying AI companies, removing financial barriers for early-stage innovation.
About Aether Computing
Aether Computing delivers high-performance, serverless infrastructure for artificial intelligence and machine learning workloads. Founded in 2021 by ex-Google Cloud and NVIDIA engineers, the company provides elastic GPU compute with data center operations across North America, Europe, and Asia. Aether is backed by $47 million in Series B funding from Andreessen Horowitz and Initialized Capital, and serves customers including Fortune 500 enterprises, research institutions, and Y Combinator-backed startups.
Media Contact:
Sarha Al-Mansoori
Director of Corporate Communications
G42
Email: media@g42.ai
Phone: +971 2555 0100
Website:www.g42.ai






