At GTC in March, Jensen Huang told a sold-out SAP Center he expects $1 trillion in purchase orders for Blackwell and Vera Rubin systems through 2027. A year earlier, the figure was $500 billion.

NVIDIA posted $215.9 billion in fiscal 2026 revenue. Vera Rubin is entering production. The demand signal could hardly be louder.

But one important number was missing from the keynote:

How much of that ordered capacity is actually live?

Not announced. Not shipped. Not sitting in a warehouse. Not waiting for power. Not installed but underutilized.

Live.

A GPU only becomes economically real when it is powered, cooled, networked, deployed, validated, scheduled, and serving workloads. Until then, it is inventory.

To us, the unit that truly matters in AI infrastructure is usable cluster capacity.

Shipped capacity is what left the factory.
Installed capacity is what reached a data center and got racked.
Usable capacity is what is actually serving workloads and generating tokens.

The gap between shipped and usable capacity is where the bottlenecks hide. It is also where the analytical edge sits.

An observer who understands those layers can build a much better picture of real-world AI capacity than one who looks only at revenue and capex guidance.

The stack does not move as one system. Bottlenecks migrate.

When wafer supply expands, pressure moves to packaging. When packaging expands, pressure shifts to HBM, optics, power, cooling, deployment, or utilization. A solved bottleneck rarely means abundance. More often, it reveals the next constraint. That is what we track at Tessara.

Where are we today?

Compute Regime Score on Tessara

Tessara’s Compute Regime Score (CRS) tracks whether compute conditions are tightening or easing across demand, supply, and forward-looking indicators. It answers the question: where are we now?

CRS-Forward asks the next question: where is the regime likely heading?

Both are rising. (You can read more on our methodology here.)

Watching aggregate chip demand is no longer enough. A high capex number can coexist with delayed capacity. A large GPU shipment can still fail to become usable compute. A data center can be built before it is energized. A cluster can be installed and still underproduce if it lacks networking, cooling, or orchestration.

The point is not simply whether AI compute is tight.

The point is where it is tight.

This primer maps the eight layers between wafer fabrication and a live cluster. Because to understand the compute supercycle, you have to understand how capacity moves through the stack, where it gets stuck, and who benefits when the bottleneck moves.

Wafer to Revenue: How The Compute Stack is Laid Out

Compute begins as silicon, but silicon is not capacity.

Turning a wafer into revenue-generating AI compute requires a chain of highly specialized steps.

Each one has its own supply dynamics, its own bottlenecks, and its own set of companies extracting value. A constraint at any single layer throttles everything downstream.

Here is the full stack:

Layer 1: Wafer Fabrication

Turning silicon (sand) into a functioning AI chip is almost like alchemy.

It requires the most expensive manufacturing process humans have ever built. Machines that cost $300 million each to print circuits at atomic scale.

The output is the scarcest commodity in tech. Leading-edge AI chips, NVIDIA's Blackwell, AMD's MI300 series, custom ASICs from Broadcom, all require the most advanced process nodes available. That means 3nm today and 2nm ramping in late 2026.

N3 wafers on Tessara

Who controls it: TSMC dominates. Samsung Foundry is a distant second. Intel Foundry is trying to compete but remains subscale for leading-edge AI work.
What constrains it: Node capacity. TSMC's 3nm is ramping well and represented 23% of wafer revenue by Q3 2025 and 2nm (N2) entered mass volume production in late 2025. Every major AI chip designer is fighting for the same wafer starts.
Why it matters: TSMC's order book is a direct readthrough to AI chip supply 6 to 12 months out. When TSMC raises capex guidance, more chips are coming. When utilization tightens, lead times extend and pricing power shifts to the foundry. TSMC's HPC segment (AI and data center) accounted for 57% of revenue in Q3 2025.

Layer 2: Advanced Packaging and HBM

Fabricating the chip is only half the job. The die has to be packaged alongside high-bandwidth memory in a single module. This is where CoWoS (Chip-on-Wafer-on-Substrate) comes in. It is the process that bonds GPU dies to HBM stacks on a silicon interposer. Without it, you have bare silicon and no usable accelerator.

Who controls it: TSMC also dominates CoWoS. NVIDIA alone consumes over 60% of total CoWoS capacity.
What constrains it: Two things here:
- CoWoS capacity sat at roughly 75- 80K wafers per month at end of 2025. TSMC is targeting 120 to 130K by late 2026.
- HBM3E is sold out through 2026. Samsung and SK Hynix have raised HBM3E contract prices by roughly 20% for 2026 deliveries.
Why it matters: Packaging was the binding constraint of 2024 and 2025. It is easing but not resolved. The transition to HBM4 (expected to begin volume shipments in H2 2026) adds complexity: HBM4 costs roughly 50% more to produce, with per-module pricing reportedly approaching $500 versus $350 for HBM3E.

Investors should watch CoWoS utilization and HBM contract pricing as the two tightest leading indicators in the supply chain. We wrote in-depth about HBM earlier in our memory primer, The Memory Wall

Layer 3: Board and Server Assembly

Once the GPU module is packaged, it goes to an ODM (original design manufacturer) for integration into server boards and complete server systems. Modern AI systems have moved from 8-GPU server nodes toward 72-GPU rack-scale architectures. CPUs, NVLink, power delivery, thermal management, and chassis design are engineered as a single system.

ODMs on Tessara

Who controls it: Foxconn, Quanta, and Wistron are the big three. NVIDIA's DGX systems and hyperscaler custom designs flow through this layer.
What constrains it: This layer is less constrained than fabrication and packaging but not unconstrained. Lead times have extended as Blackwell server complexity has increased. Thermal management at the board level is more demanding with each generation. Component shortages (capacitors, power delivery ICs) can create intermittent delays.
Why it matters: Assembly is where system-level costs stack up. A full DGX H200 system (8 GPUs) prices around $400,000 to $500,000. Margins here are thinner than upstream. The investment angle is volume: ODMs are riding a massive unit growth curve driven by hyperscaler capex.

Layer 4: Networking

A GPU without networking is a space heater. Literally.

Putting it to use in AI training and inference require ultra-fast interconnection between GPUs within a server (NVLink), between servers in a rack (InfiniBand or Ethernet), and between racks across a data center (optical transceivers and switches).

Who controls it: NVIDIA (InfiniBand/NVLink), Broadcom (custom Ethernet/switching silicon), Arista (switches), Lumentum and Coherent (optical transceivers). NVIDIA's Quantum InfiniBand and ConnectX networking remain the default for large training clusters.
What constrains it: Optical transceiver supply. Lumentum has guided to sold-out capacity through 2027. The shift to 800G and 1.6T transceivers requires new manufacturing ramp. InfiniBand allocation remains tight and NVIDIA-controlled, meaning non-hyperscaler customers face longer waits.
Why it matters: Networking is one of the fastest-tightening layers right now. A cluster without sufficient network bandwidth is a cluster that cannot scale training runs efficiently. The rise of custom silicon (Broadcom ASICs for Google, Amazon, Meta) is creating new demand for networking that is largely incremental to NVIDIA's GPU-driven demand.

Watch optical transceiver lead times as a barometer for overall cluster buildout pace.

Layer 5: Rack Integration and Deployment

Servers only create value once they are installed and running.

Rack integration means physically installing servers in data center racks with proper power distribution, cabling, and environmental controls. Deployment means bringing a cluster online: firmware, drivers, validation, burn-in testing.

Who controls it: Hyperscalers such as AWS, Azure, and Google handle this internally. Neo-clouds such as CoreWeave, Lambda, and Crusoe deploy capacity for external customers. Colocation providers such as Equinix and Digital Realty supply the facilities.
What constrains it: Physical space and build-out timelines. Construction can take 12 to 18 months under favorable conditions (unless your name is Elon), but securing permits and grid power can extend the timeline considerably.
Why it matters: This is where the shipped-to-usable gap lives most visibly. GPUs can arrive at a site that isn't ready for them. CoreWeave's valuation reflects the market's pricing of the deployment bottleneck. if you can get capacity racked and generating revenue faster than competitors, you capture margin.

Layer 6: Power

AI compute requires large amounts of electricity around the clock. A major GPU cluster can consume 100 MW or more, making data centers the single largest driver of new electricity demand in the United States.

Who controls it: Utilities, independent power producers (Constellation, Vistra, NRG), and the regional grid operators (PJM, ERCOT, CAISO) who manage interconnection. Eaton and Schneider supply power distribution equipment.
What constrains it: Grid interconnection and generation capacity. Around 2.8 TW of projects are waiting in US interconnection queues, with a median wait of 5.2 years. Our tracked pipeline of several major data center providers show that of 4.5 GW of their disclosed contracted pipeline in the US in 2026, only 850 MW is live and energised. Power availability (not construction) is the key constraint
Why it matters: Power is the slowest-moving bottleneck in the stack. You cannot software-engineer your way around it. You cannot expedite a transmission line. This is the primary reason that not all announced hyperscaler capex will translate to usable capacity on schedule. Companies such as Constellation, Vistra, and Eaton benefit directly, while access to power has become a primary factor in site selection.

Layer 7: Cooling

AI accelerators generate extraordinary amounts of heat.

A single Blackwell GPU rack can dissipate 120+ kW. Traditional air cooling cannot keep up. The industry is mid-transition to direct liquid cooling (DLC), where coolant removes heat directly from chips through cold plates.

Who controls it: Vertiv and Schneider Electric are major suppliers of data-center cooling and power infrastructure. CoolIT Systems specializes in direct liquid cooling, while companies such as Iceotope and GRC focus on immersion cooling for high-density deployments.
What constrains it: Existing infrastructure. Most data centers were designed for rack densities of roughly 10 to 15 kW. Supporting liquid cooling requires new piping, pumps, coolant distribution units, and heat-rejection systems. New facilities can be designed around liquid cooling, but retrofitting older sites is slower and more expensive.
Why it matters: Cooling limits how much compute can operate within a given facility. Sites that cannot support high-density racks must spread servers across more space or reduce performance. If you cannot cool a rack at 120 kW, you run it at 60 kW and waste half your floor space.

This makes cooling equipment a critical part of the AI infrastructure buildout and creates a multi-year upgrade cycle for suppliers such as Vertiv and Schneider Electric.

Layer 8: Software and Orchestration

Once a cluster is installed, software determines how much of that hardware actually gets used. It schedules workloads, allocates GPUs, monitors performance.

Who controls it: NVIDIA controls the core software layer through CUDA, its libraries for running GPUs. Hyperscalers operate proprietary platforms, while Kubernetes, Slurm, and Ray provide widely used open-source foundations.
What constrains it: CUDA lock-in remains powerful. The ecosystem is deep embedded across AI workloads. Switching to AMD's ROCm or Intel's oneAPI involves real porting cost and performance risk.
Why it matters: Better orchestration increases the productive output of each GPU without adding hardware. NVIDIA’s advantage therefore extends beyond silicon into the software ecosystem surrounding it.

Most clusters run at 50 to 70% GPU utilization because orchestration is hard. Getting that number to 85%+ is a material efficiency gain that translates directly to revenue per GPU.

A sustained weakening of CUDA would be one of the most important structural shifts in AI infrastructure. We wait with bated breath for this to happen (if ever).

How Compute Pricing Actually Works

Understanding the compute stack shows where capacity comes from. It does not explain why access to the same GPU can cost less than $1 per hour on one platform and more than $10 on another.

GPU compute looks like a simple number. $$ per GPU per hour. Pick a provider, check the rate, spin up an instance.

But that single number hides three distinct markets underneath it. The relationship between them tells you more about supply and demand than the price itself ever could.

#1 On-demand

On-demand is the public rate paid without a long-term commitment.

Hyperscalers can charge $7 to $12 per H100-hour, while neo-cloud providers often list comparable capacity closer to $2.

These offers are not identical. Hyperscaler pricing includes enterprise support, networking, storage, security, availability guarantees, and access to a broader cloud ecosystem. Marketplace capacity offer fewer guarantees.

The wide range therefore reflects both market fragmentation and product differences.

#2 Reserved and contract

Large buyers don’t pay public on-demand rates.

One-year and three-year commitments can reduce effective prices substantially (30-50%), particularly at hyperscalers. These contracts often include minimum spending commitments, reserved capacity, and negotiated service terms.

This is the market that matters most for provider revenue. Contract pricing determines whether announced capacity becomes committed demand and whether new infrastructure earns an acceptable return.

#3 Spot

Spot pricing reflects capacity available for immediate use without guaranteed continuity.

Marketplaces such as Vast.ai provide a near-real-time view of available supply. H100 spot prices have fallen sharply from the extreme levels seen during the early 2024 shortage.

Reading the Spread

The spread between spot, neo-cloud, and hyperscaler pricing can reveal changes in market conditions.

Wide spread (spot > on-demand): Acute shortage. Buyers are panic-purchasing at any price. This was the market in early 2024, when H100 spot regularly exceeded on-demand pricing and some secondary market transactions hit $15+ per hour.
Narrowing spread (spot converging toward on-demand): Supply is catching up to demand and the market is normalizing.
Spot prices < on-demand with falling prices: Available capacity may be growing faster than demand.

For investors, compute pricing should not be treated as a single benchmark. It is a set of connected markets with different contract structures & supply-demand signals.

Where the Bottlenecks Are Now

Our view is that the visible semiconductor bottlenecks are no longer the whole story. CoWoS and HBM remain tight, but they are already consensus constraints.

The more interesting pressure is moving downstream into power, cooling, optics, deployment speed, and utilization. These are the layers that determine whether AI capex becomes real, monetizable capacity.

The key idea: bottlenecks migrate.

When wafer supply expands, pressure moves to advanced packaging. When packaging catches up, it shifts to HBM, power, cooling, or deployment. Each solved constraint exposes the next one.

The question is never simply, “Is the bottleneck easing?” It is: Where is the pressure moving next?

By the time a bottleneck appears in earnings calls or sell-side research, much of the trade may already be priced. The edge comes from tracking signals that show a layer tightening or loosening before consensus notices.

There are dozens of signals worth watching across the stack. When these move, the constraint map changes.

These indicators are tracked live on Tessara Terminal.

The Bottom Line

We believe that we’re in a generational compute supercycle.

The race to AGI remains wide open, and every serious lab, hyperscaler, sovereign, and enterprise wants more compute than it can reliably access.

But the market is still looking at the wrong unit.

It debates AI capex mostly through chip demand. That is the wrong unit of analysis. The scarce asset is not the GPU. It is energized, cooled, networked, scheduled capacity.

That changes the investment map. The edge is shifting from who ships the most chips to who converts ordered capacity into usable clusters fastest.

Power. Interconnection. Cooling. Optics. Deployment. Utilization.

They are the bottlenecks that decide how much AI capacity actually comes online.

The next winners of the compute supercycle will be the companies that close the gap between ordered capacity and usable capacity.

That is the gap Tessara tracks.

Tessara is the research terminal for the AI buildout. We track what is binding across compute, memory, foundry, networking, packaging, and power, then map those constraints to the companies exposed.

Apply for early access →

This article is for informational and research purposes only. It is not financial advice, investment advice, or a recommendation to buy or sell any security. Tessara Research does not publish price targets. The views expressed here reflect our analysis at the time of publication and may change as new evidence arrives. Readers should do their own research and consult a qualified financial adviser before making investment decisions.

Primer: The Compute Conversion Gap