How can materials shortages slow the growth of AI infrastructure?

AI Infrastructure Constraints: How Power Limits Are Reshaping the AI Economy

I spoke to a semiconductor procurement lead a few months ago. I asked him what was keeping him up at night and he said, “Pick a material. Any material.” He was not joking. He was tired.

People generally discuss AI growth in terms of software and data-center-based computations, energy consumption by huge data centers, etc. But there is a huge quantity of stuff that we typically don’t think about which must first be mined, refined, etc. Some of the stuff required for current chips has begun to run critically short — not yet a crisis of the sort which makes big news, but of a sort which can cause problems for people down stream who do not have the up stream supplies flowing at normal rates as time passes.

Chips don’t grow on trees (obviously, but stay with me)

The semiconductor supply chain for semiconductors has a perceived problem solved. The COVID chip crisis went out of the news so it must have gone away. Unfortunately, the supply chain bottlenecks have just moved down stream to packaging materials, substrates, specialty glass, and the critical t-glass shortage that most people outside of the chip packaging supply chain are unaware of.

Chips don’t grow on trees (obviously, but stay with me). A single material can become constricted and the entire packaging assembly line around it can then seize up. The packaging for the highest end chips running large language models, the largest AI training jobs, and a host of other computationally intensive applications rely on advanced packaging techniques (chiplets, 2.5D and higher stacking, high-bandwidth memory interfaces, etc.). These packages demand very precise control of the materials, often at the nanoscale. The substrates require substrate flatness at the micron level, the dielectric materials (insulators) require a constant (how well they insulate) that is just right, and the glass (the only material with the appropriate properties for very thin layers) requires just the right thermal expansion. One input material constriction can cause the other materials to seize as well.

And so the supply chain, which had found a way to produce just enough to meet all the growing demand, starts to constrict. And if the thing that’s constricting is something deep in the supply chain then the whole supply chain starts to take a hit.

What actually breaks when materials get tight

However, before I dive into which specific materials are running critically short, I want to first describe how shortages in general evolve over time to create more and more pressure to perform until they simply cease to function altogether.

  • Lead times that stretch from weeks to quarters, quietly pushing product launches back
  • Yield degradation when manufacturers substitute lower-grade inputs under pressure
  • Allocation wars between hyperscalers and smaller AI hardware companies, with predictable outcomes for who wins
  • Cost spikes that get absorbed, until they don’t, and then get passed downstream

A reduction of just 10% in the available substrate capacity can have dramatic compounding effects. For example, while one might think that this would cause production of only 10% less chips, the reality is much worse. Schedules can be disrupted, partial deliveries can occur, lots of rework by the design teams can be required to fit the available materials into the available space. All this takes time and AI infrastructure buildout has no time to waste.

The part most people skip over: substrate and glass

Glass-based substrates are increasingly being used in next generation chip packaging for AI accelerators. In order to support these high power processors, substrate manufacturers such as Intel and TSMC are moving to glass substrates, due to their superior thermal stability and ability to support higher interconnect density than organic substrates. However, high quality glass substrates for these advanced packaging applications are typically produced in low volume and require significant investment and time to ramp up production.

As noted above, t-glass is currently in short supply. The term t-glass is often used as a catch-all to describe glass-based substrates. In practice, however, t-glass is used to refer specifically to a highly specialized type of material that is used for the substrate in certain types of packages (notably, chiplets). It is a critical component for many packages that support very high performing AI chips, and there is a severe shortage of it right now. This is causing problems for the hardware production of such chips.

This is why it is so frustrating to see people and companies treat material shortages as if they can simply be worked around. People always say that “there is always a work around” but the reality is that there are only so many work arounds before the law of physics starts to apply. The engineers who design these chips and the packaging around them know exactly what I am talking about. They can try to design a chip around a material that does not exist, but until that material appears, it is not going to work. A software patch will not solve this problem.

A quick look at where the pressure is concentrated

Material categoryRole in AI hardwareCurrent supply pressure
Glass substrates (T-glass)Advanced chip packaging, signal integrityHigh, limited production capacity globally
High-bandwidth memory (HBM)Memory stacking for AI acceleratorsHigh, dominated by two or three suppliers
Advanced dielectricsInsulation in multi-layer packagingModerate, but tightening
Silicon carbide (SiC)Power electronics in data center hardwareElevated, particularly for EV overlap demand

I should note also that the supply of critical packaging materials is a very thin thread already and under a lot of stress. A single weather event, industrial fire or other event in a single part of the world can cause a global problem in a matter of days. Here is a more detailed look at concentration for a few of the critical packaging materials:

(As an aside, a huge part of the silicon carbide shortage right now is due to EV manufacturing trying to build a large number of cars, which require large numbers of battery management systems that are made with SiC. This is a huge industrial effort, and the industry building large numbers of AI chips is colliding with this large industrial effort to build EVs. There is no planning within the industry to deal with the buildout of the other industry, and thus the material needs of the other industry are winning out. That is not good.)

Honestly, the software-first framing is part of the problem

Another huge error in the software-focused view of AI growth is that of ignoring the vast number of very basic, physical materials, mined, processed, and stockpiled around the world, which are then shipped, through distribution centers, to the manufacturing plants where they are transformed into interim forms, and are then stockpiled until finally being shipped to the final point of assembly — into a chip (or into the packaging for a chip), which is then inserted into a server and then into a datacenter. If there are any shortages, any bottlenecks, etc. of any of these materials, they are manifesting as crises in the materials themselves. (It is rather as if there were an airport where no planes were arriving from the “upstream” locations, yet for some reason, no one had any idea why the flights were not getting through to the “downstream” locations.)

This problem is not going to go away until there is a huge delay in the AI strategy roadmap of a very large organization, brought about by a collapse of the material-based components of the systems that are currently driving the explosion in AI growth. The biggest problem is that capital is being thrown at training large models and creating lots of inference power for inference. The people with the necessary knowledge to drive the AI strategy of large organizations are those who are fluent in algorithms and in cloud architecture. They have no idea what the dielectric constant of a material is, nor why the flatness of a substrate is important at the micron level.

That one statement by the procurement lead gave me a glimpse into the workings of the team and why they came to a point where they told their engineers to pick a material for the new AI chip packaging. Of course I now know that it was not as simple as stating the obvious. The team of the lead of the procurement was well intentioned but apparently they did not know about the material constraints of the expansion of the AI growth globally. It will take time and monitoring where the bottlenecks are to work through these physical constraints to enable the growth of AI globally.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *