Netpreme decouples and extends the memory from expensive AI processors (like graphics processing units, or GPUs, and tensor processing units, or TPUs), helping customers save up to five times the cost and run AI models up to tens of times faster. Netpreme’s product connects to existing systems using a standard hardware interface and works transparently with AI applications—enabling drop-in adoption compared to more complex solutions like co-packaged optics.

 
 

 

FELLOW

 

Zhizhen Zhong

Zhizhen Zhong is a co-founder and CEO of Netpreme, the photonic-electronic AI infrastructure company. At Netpreme, Zhong and his team are working on building the world's fastest memory-compute fabric using light for AI supercomputers. Before founding Netpreme, Zhong was a postdoctoral associate at MIT and a research scientist at Meta, working on optical networking and cloud networked systems for large-scale data-intensive applications.

 
 

Nikita Lazarev

Nikita Lazarev, the co-founder and CTO of Netpreme, completed his Ph.D. in electrical and computer engineering at MIT, specializing in hardware design for emerging networking infrastructures and data center systems. Prior to grad school, Lazarev worked at Google and Microsoft as a visiting researcher on data center systems. Lazarev also holds a master's degree in computer science from EPFL in Switzerland and an undergraduate degree in electrical engineering from Bauman Moscow State Technical University.

 

TECHNOLOGY

 

Critical Need
Generative AI workloads are memory-intensive, meaning that they heavily rely on moving heavy volumes of data between memory and compute units. Currently, on-chip high-bandwidth memory has limited capacity, and off-chip memory has large capacity but limited bandwidth. Therefore, AI processors like GPUs and TPUs) fundamentally suffer from this memory bandwidth and capacity tradeoff when serving fast-growing model sizes and user applications.

Technology Vision
Netpreme is pioneering an expandable memory system for computer processors (like graphics processing units, or GPUs, and tensor processing units, or TPUs) in AI data centers. In particular, Netpreme's product unlocks the world's fastest memory-compute fabric using light, enabling expandable memory systems with flexible memory-compute ratio for the memory-bound workloads with low arithmetic intensity.

Potential for Impact
Netpreme's expandable memory system is designed to be deployed with standard hardware interfaces (e.g., Peripheral Component Interconnect Express, or PCIe), eliminating the complexities of co-packaged optics. When deployed at scale, it enables up to tens of times faster inference than GPUs with remote direct access memory offload and reduces the GPU usage required for the same inference quality by up to five times. It also supports ultra-long contexts (100 million tokens or more) without compromising performance, achieving industry-leading inference speeds and redefining the landscape of artificial general intelligence.

Website
Netpreme