Long Wire

the Long Wire Problem

The challenge affecting AI systems and their sensitivity to latency

The issue

The "long wire" problem in semiconductor System-on-Chip (SoC) design refers to the challenge of managing delays and signal integrity issues caused by lengthy interconnects between various components on the chip.

As SoCs become increasingly complex with more functionality integrated onto a single chip, the distances signals need to travel can become significant, leading to increased propagation delays, power consumption, and potential signal degradation.

Traditionally designers had employed techniques such as buffering, pipelining, and routing optimization to mitigate these issues and ensure reliable operation of the SoC at the expense of latency performance.

The introduction of generative AI has pushed this issue even further, forcing strict latency requirements into systems.

Traditional pipelining solutions are becoming prohibitive (even in source- synchronous applications) limiting performance and scalability of modern SoCs.

Memory Subsystem as an Example

A Memory Subsystem is a latency sensitive block
Usually the System Cache is located close to the Cores and the DDR PHYs are located at the die boundary.

In modern SOC the distance between the two blocks can reach up to 15mm.
The Memory controller needs to be placed between the two blocks:

If the Memory Controller is placed close to the System Cache, it minimizes the latency to the System Cache at the expenses of the latency for the DFI interconnect.

If a Source -Synchronous interconnect is used, it requires a lot of manual handling resulting in a bottleneck in the schedule (long TTM).

The maximum clock speed is also limited by clock degradation (<1.5GHz in current processes).

On the other hand, if the Memory Controller is placed close to the DDR PHYs, it minimizes the latency to the DDR PHY at the expenses of greater latency for the AMBA CHI/AXI bus.

Other than placement optimization, there is not a direct simple solution which minimizes latency.

Unfortunately the space close to the System Cache is a very desirable location for the majority of the blocks in the system, making placement quite challenging.

Memory Subsystem with Chronos

Chronos Fully Digital SerDes enables decoupling of latency performance form clock degradation across the die.

It can achieve best in class latency performance up to (~366ps/mm).
The throughput is not affected by distance, enabling complete freedom of placement,
while maintaining a predictable chip development cycle (the SerDes is deployed by standard tools using standard digital flows).

Furthermore, the proprietary on-chip telemetry feature, enables superior test and tunability at no extra cost.