PACE processor released: A small step for Xizhi Technology, a big step for photon computing

A few days ago, Xizhi Technology announced the release of the latest high-performance photonic computing processor-PACE (Photonic Arithmetic Computing Engine, Photonic Computing Engine)-a single photonic chip integrates more than 10,000 photonic devices, running 1GHz system clock, running a specific recurrent neural network The speed can reach hundreds of times of the current high-end GPU. According to the official statement of Xizhi Technology, PACE successfully verified the superiority of photon computing, which is another major breakthrough of Xizhi Technology in the integrated circuit industry.

A few days ago, Xizhi Technology announced the release of the latest high-performance photonic computing processor-PACE (Photonic Arithmetic Computing Engine, Photonic Computing Engine)-a single photonic chip integrates more than 10,000 photonic devices, running 1GHz system clock, running a specific recurrent neural network The speed can reach hundreds of times of the current high-end GPU. According to the official statement of Xizhi Technology, PACE successfully verified the superiority of photon computing, which is another major breakthrough of Xizhi Technology in the integrated circuit industry.

PACE processor released: A small step for Xizhi Technology, a big step for photon computing
PACE and PCI-e boards

Millions of growth

In April 2019, Xizhi Technology officially released the world’s first photonic chip prototype board, and used the photonic chip to run the convolutional neural network model that comes with Google TensorFlow to process the MNIST data set. More than 95% of the operations of the entire model are performed in Finished on the photonic chip. The accuracy of its photonic chip processing is close to that of Electronic chips (above 97%). In addition, the time taken by the photonic chip to complete the matrix multiplication is within 1% of that of the most advanced electronic chip at the time.

PACE processor released: A small step for Xizhi Technology, a big step for photon computing

The first-generation prototype board contains about 100 photonic devices, and the operating frequency is 100kHz. However, as Dr. Meng Huaiyu, CTO of Xizhi Technology, said, the original prototype board “has not fully released the potential of photon computing.” It is precisely because of this that, after two years of research and development, Xizhi Technology has launched a new generation of photonic computing processor-PACE through its revolutionary self-developed optoelectronic integration technology.

Taking into account the 100 times increase in the number of photonic devices on a single chip and the ten thousand times increase in the main frequency, the computing power of the new generation of PACE processors is a million times increase in the first generation. And according to Dr. Shen Yichen, CEO of Xizhi Technology, the main frequency clock can be increased by 1 to 2 orders of magnitude in the future.

Measured computing power increase

In order to prove that the photon calculation is not bragging, Xizhi Technology also gave out measured data.

Compared with the current retailers on the market with the leading computing power of Nvidia GPU3080, running the same specific recurrent neural network algorithm of PACE, the running time of PACE is only within 1% of the GPU.

PACE can be used to solve combinatorial problems, through repeated matrix multiplication and clever use of tight loops composed of controlled noise to achieve low latency, generating problems such as Ising and Max-cut/Min-cut High-quality solutions. These problems that have plagued mathematicians around the world for nearly 50 years are non-deterministic problems of polynomial complexity (NP-complete), that is, problems that cannot be solved by mathematical methods on the polynomial time scale.

PACE processor released: A small step for Xizhi Technology, a big step for photon computing

The actual measurement results above show that in terms of the maximum cutting problem, the operating speed of PACE is only 154μS, while the GPU costs 18,000μS.

Shen Yichen said that the original intention of the PACE chip design is not to satisfy all general-purpose neural networks. The reason for choosing the maximum cut problem for comparison is that NP-complete represented by the maximum cut/minimum cut problem can be widely used in biological information and traffic dispatch. , Circuit design, material discovery and other fields. Once an NP-complete problem is solved, it is relatively easy to map the solution to other NP-complete problems.

In addition, Shen Yichen also stated that Xizhi Technology will launch relatively general products next year, based on the advantages of photonic computing to modify or optimize more models to meet different market needs.

Why is the photon calculation so awesome?

Software is eating the world, and AI is eating software.

In fact, AI’s demand for computing power has exploded since 2012-on average, the computing model doubles every 3 to 4 months.

As the introduction of advanced technology becomes more and more difficult, the traditional Moore’s Law is slowing down. Even if it does not slow down, doubling the number of transistors in 18 months will not be able to meet the growth of the AI ​​model. For this reason, the industry has generally begun to use the DSA architecture to optimize specific applications, but there are still three bottlenecks-computing power, data transmission and storage.

Shen Yichen explained that the size of the transistor is limited by the physical size of the atom, and it can no longer be scaled indefinitely. At the same time, due to the tunneling phenomenon of the transistor, the power consumption cannot be reduced, and the heat dissipation problem cannot be effectively solved. When the density cannot be increased significantly, the total computing power can only be increased by increasing the area, but it cannot achieve a linear increase. For example, the single-wafer chip launched by Cerebras has indeed increased its computing power to 70-80 times that of Nvidia, but its power consumption is more than 200 times that of Nvidia. The additional power consumption is mainly due to the power consumed by data handling on the on-chip network. . On the inter-chip network, due to bandwidth limitations, the efficiency of parallel computing is also seriously affected. For example, if 100 boards are interconnected, the computing power may only be more than 10 times higher than that of a single card. As for storage, the problem of delay and bandwidth in the process of data transfer is still the so-called “memory wall” problem.

However, in photon calculations, since the matrix multiplication itself is passive, no energy is consumed in this process. In addition, the speed of photon calculation is completed in the time required for light to pass through the matrix, and no transistors are required to flip, so only a few It can be completed in a fraction of a nanosecond. Finally, the high energy efficiency and low delay performance have nothing to do with the frequency of the input optical signal, which means that the optical matrix can support high throughput.

The successful commercialization of optical fiber communications has proven the importance of light in computing networks. At the same time, the photon calculation will not be affected by the dispersion and loss of light due to its short distance.

In recent years, the use of in-memory computing architecture to break the limitations of the memory wall has also become popular. Like optics, these are also based on simulated neural network calculations. A transistor can be equivalent to a computing unit. Shen Yichen also said that photon matrix calculations are based on analog calculations, so the accuracy has some limitations, but the photon signal is cleaner, so it will be better than electricity-based analog calculations to a certain extent.

At present, photon computing can support the requirements of commonly used AI algorithms such as 8bit and 10bit, and there is room for further improvement in the accuracy of photon computing in the future, and low-precision support will also be provided.

Explore PACE

Shen Yichen emphasized that PACE is not a purely optical calculation, but an optical and deep hybrid calculation, and in the foreseeable future, this is also the mainstream development direction of photon computing. So what exactly is in PACE?

PACE processor released: A small step for Xizhi Technology, a big step for photon computing

It can be seen that the PACE chip also contains two parts, one part is a silicon optical chip, and the other part is a traditional electronic chip. The interconnection between the two is realized by flip-chip stacking in the form of 3D packaging.

The electronic chip of PACE contains a digital chip and an analog chip. The digital chip includes logic and SRAM. The logic part is responsible for mediating data flow and managing input and output, and SRAM is responsible for storage.

Analog is the communication bridge, including a series of signal chain components, including A/D, D/A converters, amplifiers, drivers, modulation, etc.

PACE processor released: A small step for Xizhi Technology, a big step for photon computing

The silicon optical chip includes a 64×64 optical calculation matrix and a photodetector, and the laser is external. In theory, the closer the laser is to the chip, the better. Intel’s silicon photonics technology integrates lasers, semiconductor optical amplifiers, all-silicon photodetectors, and micro-ring modulators into a single chip. Shen Yichen also said that this is also the technological evolution direction of Xizhi Technology in the future.

PACE processor released: A small step for Xizhi Technology, a big step for photon computing

For each optical matrix multiplication, the input vector value is first extracted from the on-chip SRAM, converted into an analog value by a digital-to-analog converter, and then applied to the corresponding light modulator through the micro bumps between the electronic chip and the photonic chip. The device correspondingly weakens the incident light to form an input light vector. The entire optical matrix plays a role similar to the NPU matrix. After calculation, the output produces a set of optical output. The photodetector array converts the light intensity into an electrical signal, returns to the analog part through the micro bumps, and then passes through the transimpedance. The amplifier and digital-to-analog converter return to the digital domain output.

Choosing the photoelectric hybrid mode can make the IO interaction completed through the electronic chip, and all the instruction set compilers and SDK are performed on the electronic chip, so that it can be compatible with the current existing software ecology, and customers can import it faster.

In terms of development, Shen Yichen said that both silicon photonics and silicon electronic chips are made of CMOS technology, which solves 90% of the problems. Including most of the electrical/thermal simulation, design, verification and other tools can be used directly, and in the wafer production process is also based on the traditional CMOS process to modify, may introduce several special process steps, and in terms of packaging, Need to consider laser packaging or reserve a light source channel, but most of them have already been mature commercial applications.

Although it is simple to describe, if Xizhi Technology’s photonic computing chip is to succeed, it needs to overcome many engineering difficulties to solve the remaining 10% of the problems. Shen Yichen said that before the establishment of Xizhi Technology, the world’s most integrated silicon optical products may also integrate dozens of optical-related devices. In order to meet the tens of thousands of optical devices required by the optical calculation matrix, purely manual methods are no longer possible. The design was completed, so Xizhi Technology re-developed a design process for a highly integrated photonic chip. As for packaging, due to the high integration of optical devices, the traditional way of controlling optical components with external boards cannot be applied, and a 3D package for optical control needs to be developed. In addition, the coordinated work of optoelectronic signals requires multiple factors such as software and hardware integration, system architecture design, and so on.

To this end, Xizhi Technology also specially hired Maurice Steinman as the vice president of engineering. According to data, Mo has a career in the technology industry for more than 30 years. He has worked in Digital, Compaq, HP, Intel and other companies, and served as Senior Fellow and Chief Architect at AMD. As a veteran with more than 24 successful testing and product introduction experiences, Mo is an expert in SoC architecture, SoC interconnection, memory subsystem and power management.

Xizhi Technology’s core technology

Xizhi Technology’s photonics technology is mainly divided into three parts: oMAC-optical multiplying and accumulating operation, oNOC-on-chip optical network, oNET-inter-chip optical network.

(1) oMAC-Optical Multiplication Accumulation and Addition Operation: This is an analog calculation that uses light to replace traditional electronics for data processing. Data can be loaded on the intensity or phase of light, and calculations can be performed while the data is flowing. The linear operation performed by oMAC can also be understood as a matrix-matrix or matrix-vector multiplication.

The realization method is to adopt a CMOS-compatible silicon optical process platform, optical-electrical collaborative design, combined with advanced packaging technology; adopt high-speed adjustable, small-size electro-optical modulator design; through a novel computing architecture-based on the MZI structure coherent / non-coherent The coherent scheme is used to do the interference between light and light; finally, the hardware-algorithm is coordinated and optimized.

Its advantages are that optical computing has stronger parallel capabilities, energy efficiency is comparable or even better than electronic chips, and ultra-low latency. In addition, silicon photonics has very low process requirements and costs. For example, a 65nm or 45nm CMOS process device can meet all the requirements of current photonic computing, and its manufacturing process cost is much lower than that of electric chips.

(2) oNOC-on-chip optical network: By replacing copper wires with waveguides, data is transmitted in the optical chip network, which can realize data transmission within a single electronic chip (EIC) and between multiple electronic chips (EIC) inside the package Data communication.

The realization method is to construct a fixed or flexibly adjustable communication network topology on the optical chip, and connect different electric chips to single or multiple nodes in it to realize the data interaction based on oNOC; the network topology based on optical broadcasting is adopted. And the network topology based on wavelength division multiplexing.

Its main advantages are high bandwidth, low energy consumption, low latency, and insensitivity to distance. In addition, the method is highly versatile and can combine different types of electronic chips to provide high-speed, low-energy interconnection between chips, and is suitable for application scenarios with high bandwidth requirements.

(3) oNET-Inter-chip optical network: The optical chip plays a role similar to the optical BUS, gathering the data that needs to be transmitted inside the unit, and interacting with other units through the optical transmission medium (such as optical fiber).

This technology is mainly used to optimize the communication efficiency between computing units. Compared with traditional electrical interconnection, optical networks have high energy efficiency ratio, low optical propagation loss, high bandwidth, low delay, and transmission distance is not sensitive.

PACE processor released: A small step for Xizhi Technology, a big step for photon computing

In addition to the technology in integrated photonics engineering, the company also has a large number of AI-related talents. Xizhi Technology has proposed a novel recurrent neural network (RNN)-based model, which is both unitary (rather than general) ) The memory ability of RNN and the ability of gated RNN to effectively forget redundant/irrelevant information in its memory.

Optical ecology is becoming hot

In 2017, Shen Yichen published a cover paper in the journal Nature C-Photon as the first author, showing the new starting point of integrated photonic computing to the world for the first time.

It was this paper that gave birth to more than 20 related companies, including Xizhi Technology and Lightmatter. At the same time, a number of giants including Intel, HPE, and IBM have also entered the market.

In an exclusive interview with MIT Technology Review, Shen Yichen once compared the competitive stage of photonic technology to the era when transistors replaced vacuum tubes. At that time, several transistor companies were achieving leapfrog development, but the competition between them was not competition with each other, but innovation and competition with existing industries. “At this stage, it is beneficial for us to have more competitors engaged in optical computing. We can make a louder voice and form a larger community to expand and enhance the entire optical computing ecosystem.” Shen Yichen said.

Shen Yichen especially emphasized that the success of PACE is also inseparable from the strong support of partners. Xizhi Technology is establishing strategic partnerships with companies such as fabs, packaging plants and internationally renowned EDA design companies to enrich the entire ecosystem.

Shen Yichen said that the traditional silicon photonics ecosystem is not sufficiently attractive for the supply chain because of its small magnitude. Only with the blessing of large-scale application scenarios such as photonic computing can the ecological development be accelerated. More importantly, more and more customers are more and more interested in high-performance, low-power AI computing, which is also the most important link in the ecological chain of Xizhi Technology. In fact, among the shareholders of Xizhi Technology, there are many first-line Internet customers.

In fact, in addition to photonic computing, technologies such as solid-state lidar and light sensors are also developing rapidly. No matter what type of scenario, there will be compatibility with basic photonic processes, packaging, and devices. The overall expansion of the photonic market can further accelerate The commercialization process of photon computing.

According to Wired’s previous report, the idea of ​​using photons for calculations is not new and can be traced back to the 1950s. But facts have proved that electronic computing is more suitable for development and commercialization. In the 1980s, Bell Labs tried to manufacture general-purpose light-based chips, but failed due to the difficulty of constructing working optical transistors. The development of the current industrial ecology is obviously incomparable at the beginning.

Xizhi Technology’s future

At present, Xizhi Technology’s accumulated financing totals more than 1 billion yuan, and there are nearly 200 full-time employees worldwide, of which technical personnel account for more than 80%. 70% of chip designers have more than 10 years of experience in the semiconductor industry, covering everything from silicon light to software, The most complete team from analog to digital. This team, which has run in for four years, is also regarded by Shen Yichen as the company’s “biggest wealth”.

Shen Yichen also emphasized that Xizhi Technology is the earliest company in photonic computing and the strongest execution. Unlike digital circuits, photonic computing has a mature design process. The development of photonic computing systems includes a long running-in period such as device design, packaging, and software and hardware integration. “Any company, even with a market value of hundreds of billions, will take at least three years to make a PACE-like product from now on.”

Speaking of future development plans, Shen Yichen is divided into three stages:

In the first stage, from now on, application scenarios that reflect the advantages of photonic computing will be implemented in 1 to 3 years, including cloud computing, intelligent driving, quantitative trading in finance, and biopharmaceutical research and development. At present, Xizhi Technology has been provided with the world’s top cloud services Businesses and major financial institutions have begun in-depth cooperation.

In the second stage, with the clear advantages of photon computing, it will enter the training market. In the training market, more chip collaboration, larger matrix multiplication and more mature software systems are required.

In the third stage, as the hardware and software systems become more mature, it is planned to enter the mass market, which has higher requirements for power consumption, reliability, and software ecology, such as GPUs and automotive chips.

There is still a long way to go, but just like the connotation of the PACE naming, an important first step has been taken, and photon computing will also have a brighter future.

The Links:   LM32004 PM50B6LA060