The Open Compute Project Global Summit 2024 (aka OCP24) took place in San Jose October 15-17. OCP is a show dedicated to computing, and participation over the past couple of years has been supercharged by AI. While most of the attendees at OCP24 met to discuss AI architectures, power, liquid cooling, and software, there was a standing-room only session for those who recognize how important networking and optics are to the current and future scalability of AI nodes. Meta made the point that the answer to the question of how to get more accurate results from large AI models without building larger arrays of GPUs is still unknown (more accuracy requires more parameters which requires larger GPU arrays), which means there is no end in sight to the demand for optical bandwidth in the datacenter.
This year, for the first time, there were two booths on the show floor from optics vendors Accelink and Ciena. With CIOE and ECOC just barely in the rearview mirror, and with OCP not being a traditional optical show, there weren’t any major optical announcements. However, there were intriguing discussions on the future of optics inside the datacenter, and OCP is perhaps the best place to hear from startups with interesting ideas for AI optics.
Topics covered in this note include:
- Co-Packaged Optics (CPO) Inches Closer to Reality
- 400G/Lane Emerges from Ciena – Not Where You Expected It
- Optics Reliability and Stability in AI Nodes Needs to Improve
- LPO – Still Kicking but Questions Persist
- Liquid Cooling Will Change Architectures
- Conclusions
Co-Packaged Optics (CPO) Inches Closer to Reality
For the last few years, CPO has been almost exclusively promoted by Intel and Broadcom, via 51.2T switch demonstrations. Last year at OCP, Micas Networks debuted its commercial switch based on Broadcom’s CPO platform. This year, Micas was back as still the only commercial CPO switch, but many others mentioned CPO in their presentations. With recent announcements from TSMC indicating that the world’s leading silicon processing fab is investigating integration of SiPho into its processes, and with encouraging statements from hyperscalers, CPO seems closer than ever to reality – perhaps less than 5 years away from large-scale deployment. However, the adoption of CPO is still somewhat binary – it will either be deemed acceptable for mass deployment by a major customer, which will kick off large demand, or it will remain a niche product adopted by a few smaller operators. That pivitol major customer has yet to step forward, but given that AI is a core use case, Nvidia and Hyperscaler ASICs are logically positioned to be the early adopters.
The primary touted benefit of CPO is still power (claimed to be less than 5.5W per 800GbE), but it also provides stability and lower latency. Occasional errors (flapping) that plague optical connections in AI models are lower with fewer DSPs in the link, potentially improving link stability at the expense of consistency and guaranteed low BER. For short links, that tradeoff may be worthwhile. Latency improvements of up to 600ns for Layer 2 networks and up to 1000ns for Layer 3 networks were reported at the show by ByteDance from early trials (ByteDance did not explain why Layer 3 networks would have better latency improvement).
- Micas continues to develop its CPO platforms, currently at 51.2T, but with plans to migrate to 102.4T as soon as Tomahawk 6 is available. This implies that Broadcom will develop a CPO board for 102.4T. It has shipped a few tens of switches this year, mostly for evaluation, and expects to only ship a few hundred at best in 2025 as the search for a major customer continues. Tencent is still a potential customer (Micas hired a senior engineer from the company), but not for large volumes in the short term.
- Broadcom co-presented on CPO with ByteDance, which showed a potential network deployment with CPO switches as the top spine layer, moving to the core layer in the future, putting CPO directly in the main switching infrastructure of the network. The service provider is trialing a custom version of a commercial platform, which presumably is Micas, as it has the only current commercial platform. ByteDance said that it has not committed to purchase and deployment, and it is still evaluating the technology.
- Meta stated that it is investigating CPO for use in the “scale up domain” (inside the rack where copper is used today). As that domain grows beyond the single rack and optics are required, CPO could be a viable option. Meta believes that CPO could provide a more reliable network with fewer link failures due to fewer active components being deployed. Recall that Meta was one of the early proponents of CPO/NPO before abandoning its internal development.
- Even Innolight, the company that stands to lose the most if CPO cannibalizes the pluggables market, presented on CPO at the show. Innolight proposed that CPO needs to be modular, with an interoperable ecosystem that allows multiple sources to contribute – like in the current pluggable market. The only solution today (Broadcom/Micas) is a closed ecosystem, and it’s not clear that a TSMC-based solution would necessarily be more open.
- CPO-type solutions were also shown for the AI back-end network or GPU-to-GPU connections. Intel calls its optical compute interconnect “CPO for XPU interconnect”. Luxshare showed “co-packaged copper”, essentially an integrated flyover cable technology. Nubis’s vertical fiber technology is arguably CPO technology. Several other startups like Ayar Labs and Celestial AI are likewise trying to find a way to put optics next to the processors to replace either copper (for longer distances) or pluggable optics. CPO, perhaps by another name, will almost certainly show up in the back-end network before it is widely deployed in the front end.
400G/Lane Emerges from Ciena – Not Where You Expected It
As Cignal AI reported in our recent ECOC report (ECOC 2024 Show Report), 400G/lane electronics and optics were considered close to being demonstrated publicly. It was closer than expected. At OCP, Ciena demonstrated 400G/lane PAM4 operation using the SERDES from the company’s WaveLogic 6e coherent DSP. No, it wasn’t a traditional DSP vendor like Marvell or Broadcom, but instead Ciena that was first to demonstrate 400Gbps operation publicly in 3nm silicon. The Ciena demonstration should be considered a test chip, and the company is considering a commercial plan for many of its components – including a 400Gbps/lane PAM4 DSP – in the future.
On the optical side, presentations by Hyperlight on thin-film lithium niobate (TFLN) showed that the material has more than enough bandwidth to support 400G/lane optics. Hyperlight also said that the number of TFLN wafer manufacturers has tripled – from one to three – over the last two years. Silicon photonics (SiPho) almost certainly will not work at 400G/lane and even InP EMLs may have performance issues. TFLN, while still unproven in large scale manufacturing, is a strong candidate for 400Gbs/lane 3.2GbE sometime after 2028.
Optics Reliability and Stability in AI Nodes Needs to Improve
A topic that Cignal AI first reported on in our CIOE report (CIOE24: Insights into China’s Market) – bit errors and flapping in optics causing AI model failures – came up frequently at OCP. Link errors in AI models can cause the entire model compute cycle to fail and be re-loaded from checkpoint. However, the news for optics is better than originally reported:
- Meta presented data from its models showing that GPUs fail much more often than optical links. Almost 80% of model failures are hardware issues, with 60% of those due to GPUs, at least in early data. Networking is the #4 cause of failure – still not great, but not as bad as originally implied.
- Meta also said that failures in 400GbE modules are mostly due to manufacturing issues, not laser failures (failures at 200GbE were mostly due to the DMLs, but 400GbE uses more reliable EMLs). Manufacturing issues should be easier to solve than fundamental semiconductor reliability issues.
- Finally, Meta said that failure rates in all hardware – both optics and ASICs – reduce over time, implying that there are infant mortality causes that have yet to be identified. Again, this should be an easier problem to solve than reliability failures.
- Innolight presented data showing that the reliability of SiPho-based optics has significantly improved over time. The company, which has sold millions of pluggable modules, showed FIT rates below 0.4 across its current products, which bodes well for lower-cost SiPho optics at 1.6Gbps rates.
LPO – Still Kicking but Questions Persist
Linear Pluggable Optics (LPO) continues to get airtime at shows, especially the shows at which Andy Bechtolsheim of Arista is on the stage. However, there are still no large customers committed to the technology. Issues around troubleshooting and management remain, even if interoperability is solved. Therefore, despite continuing industry discussion of the technology, Cignal AI’s forecast (less than 10% of the 800GbE market) remains unchanged from over a year ago (The Linear Drive Market Opportunity).
In a presentation in the optical track, Meta said that LPO was “under active investigation” but it’s been almost two years since LPO broke out at OFC23, and investigations have not yet led to deployment. Meta also reported that troubleshooting optical links is inherently difficult; 75% of modules returned for failure are reported as no trouble found (NTF), meaning that the optics were not the source of the error. With LPO removing even more telemetry data for link evaluation, the problem can only get worse, although fewer active components in the link could improve overall reliability.
Perhaps the largest incitement of LPO came from Chris Cole noted (as we have) that speed of deployment is much more important to current AI operators than saving some power on the optics – so current MOP (based on DSP-based optics) will continue to be the preferred architecture.
1.6T may offer an opportunity for LPO – or more likely LRO – as there are no established MOPs yet. Speakers at OCP24 acknowledged that 200G/lane LPO is much more challenging, however, which means that deployment is not guaranteed.
Liquid Cooling Will Change Architectures
Next generation AI builds will require liquid cooling as heat dissipation in a single rack blows past 100kW. Liquid cooling vendors and demonstrations were scattered around the show floor. As discussed in Cignal AI’s ECOC report, liquid cooling will change how equipment is designed. Credo presented how liquid cooling will also make electrical connections (copper/AECs) more popular. As the density of AI nodes increases due to liquid cooling, the distances between GPUs go down, thus enabling copper connections to be used for more connections. There will certainly be other changes in equipment and network designs once air cooling is no longer a consideration.
Conclusions
Even though OCP is not an optics show, it provides an important view into the demands and developments of optics for the next years as they are driven by AI. Copper has a long life within the AI nodes, but optics are inevitable as speeds increase and clusters span greater distances. Optical bandwidth requirements continue to grow, and power continues to be a concern as AI model parameter growth shows no sign of slowing down. Innovations in optical interconnects – the majority of which, no doubt, will never see widespread adoption – have the potential to challenge industry assumptions of what optical interconnects look like. It’s an AI party, and optics are invited to come along for the ride.