Artificial intelligence (AI) is at the forefront of modern life, reshaping it, and ushering in a new era of machine learning (ML) applications. Massachusetts Institute of Technology (MIT) published a research paper named "Lightning: A Reconfigurable Photonic-Electronic SmartNIC for Fast and Energy-Efficient Inference," which explores the option of using photonic computing to create a reconfigurable photonic-electronic smartNIC that serves deep neural network inference requests.
Omdia view
Summary
Artificial intelligence (AI) is at the forefront of modern life, reshaping it, and ushering in a new era of machine learning (ML) applications. Massachusetts Institute of Technology (MIT) published a research paper entitled “Lightning: A Reconfigurable Photonic-Electronic SmartNIC for Fast and Energy-Efficient Inference,” which explores the option of using photonic computing to create a reconfigurable photonic-electronic smart network interface card (smartNIC) that serves deep neural network (DNN) inference requests. With the exponential growth of inference-based services within data centers, the demand for rapid and energy-efficient systems to handle real-time inference queries has become paramount. A current example is from January 2023 when ChatGPT processed an astonishing 600 million inference queries, while its energy consumption equaled 175,000 individuals.
In this dynamic landscape, photonic computing stands as a disruptive force with the potential to redefine the computing paradigm. By harnessing light waves and optical devices, photonic computing can allow rapid and energy-efficient computations in the analog domain. At its core, the basis of photonic computing’s claim to the networking space is the speed of photonic devices compared to traditional transistors, which are all achieved with significantly reduced heat generation. This innovation could enhance computational speed and substantially reduce the environmental impact of computing infrastructure if scientists overcome its challenges.
Challenges with photonic computing
Recent studies have showcased the tremendous potential of photonic computation, achieving frequencies exceeding 100GHz while consuming 40 attojoules per operation. However, these studies also uncover a significant challenge in current photonic computing paradigms: the bottleneck caused by data movement. When accounting for the digital datapath latency in existing proposals, the end-to-end inference latency experiences a significant increase, spanning five orders of magnitude, nullifying the advantages of photonic computing.
The cause of this issue is the passiveness of the photonic computing cores, which lack memory and the instructions to direct the flow of computation data in complex real-world applications. Consequently, previous approaches have resorted to a “stop-and-go” methodology, heavily reliant on control software (e.g., Python scripts) to manage photonic computing operations. Unfortunately, the coupling to the control software and the resulting effect on data planes introduces substantial overhead within the datapath, severely impeding end-to-end latency.
Additionally, the slow nature of the control plane’s digital clock frequency compared to the photonic cores’ efficient nature worsens the latency issue. The example given by the MIT paper was where a photonic computing core operates at 100GHz, steered by digital software clocked at a mere 1GHz. In such cases, interactions between the photonic and digital domains (such as packet processing and data reads/writes) can momentarily halt the 100GHz operations to accommodate control-plane decisions, further intensifying latency.
The proposed solution
The researchers at MIT put forward the concept of co-designing digital and photonic elements, culminating in “Lightning” – an adaptive photonic-electronic smartNIC equipped with high-speed, energy-efficient photonic computing cores. Lightning addresses data transfer delays in current photonic computing strategies, achieved through reconfigurable count-action abstraction. Count-action abstraction provides separation between the control and data planes for inference requests by enabling the datapath to maintain a record of the directed acyclic graph (DAG) for each inference request without disrupting the seamless flow of data into and out of the photonic computing cores.
MIT’s count-action abstraction has three components:
- A set of variables to count
- A set of target results
- A set of actions to trigger when the result is equivalent to the target value.
The count component keeps track of the required operations for each task of the DAG and triggers the execution of the following tasks immediately after the current finishes without involving the control plane. Lightning’s count-action units are reconfigurable at runtime.
The Lightning smartNIC’s design
For the Lightning smartNIC to function as an option in data center networking, the MIT researchers list requirements that the Lightning smartNIC must satisfy. These are:
- R1: Handle live user traffic arriving from remote users
- R2: Support reconfigurability at runtime to serve inference requests for different DNNs
- R3: Ensure the inference query data from remote users are multiplied correctly with the DNN model parameters
- R4: Distinguish meaningful photonic computing results from noise
- R5: Avoid making non-photonic compute operations a bottleneck.
The researchers at MIT designed the Lightning smartNIC with eight major components, which all work together to provide networking capabilities. MIT’s explanation of how the Lightning smartNIC works is as follows:
- The packet parser: Lightning’s packet parser receives packets from the 100Gbps network interface to handle live user traffic (R1). The parser identifies inference queries from regular packets based on the destination port number field in the incoming packet header. Once the packet parser has identified a packet as an inference query, the parser extracts the DNN model ID and corresponding user data from the header. Depending on the DNN model, the inference query’s data may be in the packet header or the payload. For example, for a traffic classification inference, the packet parser uses header data (e.g., src IP, dst IP) while, for language generation inference, the parser reads the payload as the user data (e.g., a search query typed by the user).
- The DAG configuration loader: Afterwards, the DAG configuration loader reconfigures the datapath based on the computation DAG of the packet’s DNN model. This module decouples the control-plane decisions from the computation operations in the data plane, enabling Lightning to make control decisions in the data plane without stopping the data streams in and out of photonics. The DAG loader uses a key feature in Lightning called a reconfigurable count-action abstraction. This abstraction enables the DAG configuration loader to reconfigure a series of datapath templates (e.g., fully connected layers, convolution layers, attention layers, recurrent layers, adder tree modules, non-linear computation like ReLU and Softmax, etc.) at runtime (R2). Once the datapath is configured with the appropriate counts and actions for each DNN model, packets flow through the system without involving the control plane.
- The memory controller: While the DAG configuration loader reconfigures Lightning’s datapath, it notifies the memory controller module to stream the corresponding DNN model parameters from off-chip memories, such as dynamic random-access memory (DRAM) or high bandwidth memory (HBM). For fully connected layers, the memory controller streams the weight matrices directly into the datapath. To reduce memory access overheads for convolution layers, the memory controller reads the convolution kernel only once and stores it in local register files for subsequent reuse.
- 4 to 7. Pipelined photonic-electronic computing: Components 4 to 7 all work straightforwardly. Afterward, a data streamer module sends multiple parallel digital data into photonic vector dot product cores via on-chip digital-to-analog converters (DACs). Then, the photonic vector dot product cores compute the vector dot products of input data streams and return the results to analog-to-digital converters (ADCs), where a preamble detection module distinguishes the results from noise without stopping the data’s flow. The vector dot product results are fed into a digital computation module. This step contains several pipeline parallel digital computation modules to perform additional digital operations, such as ReLU and Softmax.
- 8. Result generation: The processes occurring in components 4–7 are repeated until the DAG is completed and the inference result is ready. Depending on the inference packet, Lightning creates a response packet and sends it to the user through the Ethernet interface or the PCIe bus.
- Additionally, to satisfy the remaining requirements, Lightning’s Synchronous Data Streamer ensures (R3) is met. Lightning adds a preamble pattern to each vector in the digital domain before streaming its data into the DACs to satisfy (R4), and Lightning performs digital computations using a pipeline parallel adder module and a pipeline parallel non-linear function module to satisfy (R5) (see Figure 1).
Figure 1: A high-level annotation of Lightning’s design
Source: MIT
Photonic computing’s capabilities and shortcomings
In compute energy efficiency, previous research has shed light on the capabilities of 8-bit photonic computing, which only consumes 40 attojoules per multiply-accumulate (MAC) operation. Compared to an 8-bit MAC operation in a 7 nm application-specific integrated circuit (ASIC), such as those found in GPUs and TPUs, it consumes approximately 0.07 picojoules. Similarly, field-programmable gate arrays (FPGAs) require about 15 picojoules for an 8-bit MAC operation using dedicated digital signal processing (DSP) blocks. Thus, photonic MAC operations exhibit unparalleled energy efficiency, surpassing ASICs by 1,750 times and FPGAs by 375,000 times.
A compelling advantage of photonic computing lies in its inherent parallelism. Optical modulators can perform parallel multiplications with up to 200 co-propagating wavelengths, unlocking a realm of native parallelism that significantly enhances performance through more simultaneous photonic operations.
The compact design of the photonic components within the Lightning smartNIC is an advantage over traditional smartNICs, occupying a mere 1500.01mm2 of space, which starkly contrasts the expansive size of modern-day GPUs and smaller DPUs. Notably, it is significantly smaller than the NVIDIA H100 GPU, which spans 29,748mm2 of space.
The big drawback of the Lightning smartNIC is the price. MIT estimates that after mass production, the cost of a Lightning smartNIC with 100GE capabilities would be $2,639.95. The manufacturing will cost around 10 times more than a current 100GE offload NIC, costing around $230 as of 3Q23. Unless significant advancements can decrease manufacturing costs, the price will be a significant barrier to photonic computing’s entry into the networking market.
Omdia’s view on photonic computing in the data center networking space
Using photonic computing in data center networking offers several benefits over traditional smartNICs; however, it also has significant drawbacks as it currently stands.
Benefits
Photonic computing allows data transfer at light’s speed, facilitating data transmission significantly faster than traditional electronic components such as transistors. This heightened speed is essential in combatting a future plateau in the pursuit of Moore’s law. This future plateau will put immense pressure on data centers to meet the escalating compute requirements of our data-hungry world.
However, speed is just the tip of the photonic iceberg. The light that transfers information in photonic computing is more efficient than electronic components, potentially reducing the overall energy usage of data center operations. This saving in energy can bring about both operating cost savings and lower carbon emissions. Additionally, photonic networks can support high bandwidth, facilitating the seamless transfer of large datasets, a requirement that is rapidly growing in data center networking.
Photonic computing’s components generate substantially less heat during operation than their electronic counterparts, lowering operating temperatures and reducing the need for elaborate cooling systems.Moreover, optical networks do not experience interference to the extent of electrical components, making data transmission more reliable. Furthermore, the inherent security of optical data transmission raises the bar for data protection. Intercepting optical signals is far more complex than electrical signals, with the technology’s inherent security mechanisms making detection problematic. They are less susceptible to electromagnetic interference (EMI) and radio frequency interference (RFI), crafting a clean and reliable network and keeping data flowing smoothly and without interruptions.
Finally, the compact nature of photonic components presents an opportunity for data center design and layout flexibility. Unlike their bulkier electronic counterparts, photonic components save space, helping data centers pack more compute power into a smaller footprint, which is essential when considering the data center industry’s land constraints.
Drawbacks
One of the significant challenges with photonic computing entering the data center market is the intricate and expensive nature of its components. The manufacturing and seamless integration of these components into existing data center infrastructure is expensive and time-consuming. An example of this is the Lightning smartNIC prototype, having an estimated cost accounting for mass production of over $2,500. This cost is around 10 times more expensive than the equivalent electronic smartNIC, and retrofitting the technology into the existing data center servers would cost an astronomical amount. Moreover, the intricate photonic components can be a maintenance nightmare, being more susceptible to breaking down and needing replacing. This additional maintenance work is a daunting proposition for smaller data centers with budget constraints.
Photonic computing remains in the very early stages of development and, thus far, only scientists have developed prototypes. The consequence of being at an early stage is a lack of a support ecosystem that smartNICs have traditionally cultivated over time. This lack of support can lead to limited availability of compatible hardware and skilled personnel, complicating the integration and ongoing maintenance of photonic systems. Additionally, photonic computing’s compatibility with existing networking protocols and standards is not universal. This lack of universality can limit the application of photonic technology, particularly in environments where compatibility with traditional network equipment is a prerequisite.
Conclusion
Photonic computing’s presence in the data center networking space is still in its very early infancy. While photonic computing may be an exciting technological phenomenon, it also brings a set of limitations that technology developers must overcome to make it worthwhile for data centers to implement them. These drawbacks range from complexity, cost, and compatibility concerns. With further research and development, there may be possibilities for photonic components to replace electronics in the future; however, for now, electric components remain the status quo.
Appendix
An overview of the scientific inner workings
It is crucial to understand the principle of amplitude modulation and photonic multiplication when looking at the inner workings of photonic computing as it underpins its networking capabilities. This technique involves two amplitude modulators, each experiencing an input voltage modulating a light wave into a “double-modulated light wave.” In practical terms, starting with a light of arbitrary amplitude, this is inputted into modulator 1, which experiences an input voltage 𝑎, creating a light wave with an amplitude proportional to 𝑎. This light wave is a carrier signal that enters modulator 2, experiencing an input voltage of 𝑏. The result multiplies the carrier light wave proportional to 𝑎 by voltage 𝑏 in the photonic domain, resulting in a double-modulated light wave with its amplitude proportional to 𝑎 × 𝑏. The double-modulated light wave then enters a photodetector, translating that amplitude into voltage.
In the example put forward by MIT (see Figure 2), 𝑎 = 0.6 and 𝑏 = 0.85 represent the input voltages, and the initial light wave has an amplitude of 1. Feeding the initial light into the first optical modulator gives an output light wave with an amplitude of 1 × 0.6 = 0.6. This output feeds into the second modulator, giving an output light wave with an amplitude of 0.6 × 0.85 = 0.51. The intensity of the output light from the second modulator becomes proportional to the multiplication of the two input voltages.
Figure 2: A visual representation of amplitude modulation
Source: MIT
In real-world situations, the two modulators can have multiple input voltages, giving a unique amplitude for each light wave that enters. MIT gave an example with𝑎= [0.1, 0.7, 0.6] and𝑏= [1, 0.05, 0.85] representing a series of input voltages entering the modulators (see Figure 3). The first modulator alters the amplitude of the three initial light waves with amplitude 1, resulting in three output light waves with amplitudes [0.1, 0.7, 0.6]. These output light waves flow into the second modulator, which alters the amplitude of the output light waves by the second batch of input voltages, leading to the multiplication of each element in𝑎and𝑏, or [𝑎𝑖×𝑏𝑖] = [0.1, 0.035, 0.51], or in lay terms: 0.1 × 1 = 0.1, 0.7 × 0.05 = 0.035, and 0.85 × 0.6 = 0.51. The final output light waves enter into the photodetector, which integrates the intensities of double-modulated light over multiple time steps and return their sum value, . In lay terms: 0.1 + 0.035 + 0.51= 0.645.
Figure 3: A visual representation of amplitude multiplication in the real world
Source: MIT
Another method for performing amplitude modulation in a real-world scenario is performing the photonic operations in parallel to one another. Each vector element – i.e.,xin the vector [x,y,z] – uses a different initial wavelength of light 𝜆𝑖. The two modulators are the same as in the previous examples, resulting in a double-modulated light wave; however, there are three sets ofmodulator 1andmodulator 2 in this case. The three input voltages for𝑎and𝑏applied to the three sets ofmodulator 1andmodulator 2are also different, with each𝑎and𝑏being a different voltage. Once double modulation has finished for the three light waves, a wavelength-division multiplexing multiplexer (WDM MUX) combines the double-modulated wavelengths, and the photodetector returns an output voltage proportional to the sum of the incident light intensities. Assuming the three values of𝑎and𝑏remain the same, where𝑎= [0.1, 0.7, 0.6] and𝑏= [1, 0.05, 0.85], the sum value will equal 0.645 (see Figure 4).
Figure 4: An example of parallel amplitude multiplication
Source: MIT
Further reading
Hailong Zhou, Jianji Dong, Junwei Cheng, Wenchan Dong, Chaoran Huang, Yichen Shen, Qiming Zhang, Min Gu, Chao Qian, Hongsheng Chen, et al., 2022. “Photonic matrix multiplication lights up photonic accelerator and beyond.” Light: Science & Applications, 11, 1 (2022), 30.
Zhong, Z., Yang, M., Lang, J., Williams, C., Kronman, L., Sludds, A., Esfahanizadeh, H., Englund, D., and Ghobadi, M., 2023. “Lightning: A Reconfigurable Photonic-Electronic SmartNIC for Fast and Energy-Efficient Inference.” Proceedings of the ACM SIGCOMM 2023 Conference. Available online: https://doi.org/10.1145/3603269.3604821.
Author
Aaron Lewis, Analyst, Cloud and Data Center