CHINA'S 1.5 EXAFLOPS SUPERCOMPUTER WINS AGAIN THE GORDON BELL PRIZE

The Relationship for Figuring Hardware has recently put out the finalists for the Gordon Ringer Prize honor that will be given out at the SC23 supercomputing meeting in Denver, and as you could expect, probably the greatest iron gathered on the planet are driving the high level applications that have their focus on the big picture.

The ACM cautions that the last framework sizes and end-product of the reenactments and models run are not yet finished, yet we examine one of them in light of the fact that the specialists in China’s Public Supercomputing Center in Wuxi really distributed a paper they will officially delivered in November in front of the SC23 gathering. That paper, Towards Exascale Calculation for Turbomachinery Streams, was run on the “Oceanlite” supercomputing framework, which we originally expounded on way back in February 2021, that won a Gorden Ringer prize in November 2021 for a quantum reproduction across 41.9 million centers, and that we conjectured the design of back in Walk 2022 when Alibaba Gathering, Tsinghua College, DAMO Foundation, Zhejiang Lab, and Beijing Institute of Man-made reasoning ran a pretrained AI model called BaGuaLu, across in excess of 37 million centers and 14.5 trillion boundaries in the Oceanlite machine.

NASA threw down a stupendous test almost 10 years prior to do a period subordinate recreation of a total stream motor, with streamlined and heat move mimicked, and the Wuxi group, with the assistance of designing specialists at various colleges in China, the Unified States,m and the Unified Realm have gotten the glove. What we found intriguing about the paper is that it affirmed large numbers of our hypotheses about the Oceanlite machine.

According to the authors of the paper, the system contained more than 100,000 custom SW26010-Pro processors created specifically for the Oceanlite system by China’s NRCPC (National Research Center of Parallel Computer Engineering and Technology). The following is how the national foundry of China, Semiconductor Manufacturing International Corp (SMIC), etched the SW26010-Pro processor using 14 nanometer processes:

The Sunway chip family is “enlivened” by the 64-bit DEC Alpha 21164 processor, which is as yet one of the most mind-blowing computer processors made; In 2006, the 16-core SW-1 chip made its debut in China.

There are six blocks of center gatherings in the processor, with each center gathering having one fatter administration handling component (MPE) for overseeing Linux strings and an eight by eight framework of centers containing a register handling component (CPE) with 256 KB of L2 reserve. Each CPE has four rationale blocks, which can uphold FP64 and FP32 math on one sets and FP16 and BF16 on another pair. The SW26010-Pro’s core groups each have 16 GB of memory, a DDR4 memory controller, and 51.4 GB/sec of memory bandwidth. As a result, the entire device has 96 GB of main memory and 307.2 GB/sec of bandwidth. The six CPEs are connected by a ring interconnect and have two organization interfaces that interface them to the rest of the world utilizing an exclusive interconnect, which we ave consistently believed was vigorously propelled by the InfiniBand innovation utilized in the first TaihuLight framework. The SW26010-Ace chip is appraised at 14.03 petaflops at either FP64 or FP32 accuracy and 55.3 petaflops at BF16 or FP16 accuracy.

The biggest setup of Oceanlite that we have known about had 107,520 hubs (with one SW26010-Master containing a hub) for a sum of 41.93 million centers across 105 cupboards, and the paper just reported affirmed that the machine had a hypothetical max operation of 1.5 exaflops, which matches the presentation we assessed (1.51 exaflops) and impeccably matches the clock speed (2.2 GHz) we assessed very nearly quite a while back. It just so happens, the MPE centers run at 2.1 GHz and the CPW centers run at 2.25 GHz.

We actually imagine that China might have fabricated a greater Oceanlite machine than this, or positively could. At 120 cupboards, the machine would scale to 1.72 exaflops top at FP64 percision, which is somewhat greater than the 1.68 exaflops “Outskirts” supercomputer at Oak Edge Public Research facility, and at 160 cupboards, Oceanlite would have just shy of 2.3 exaflops top at FP64. As verified in the remarks underneath, the Wuxi group will introduce the Oceanlite machine during a meeting at SC23 in November, and that meeting says the machine has 5 exaflops of blended accuracy execution across 40 million centers. That suggests a 2.5 exaflops maximized execution at FP64 and FP32 accuracy.

Those last option numbers are significant if China has any desire to be a spoiler and attempt to place a machine in the field that outclasses the looming “El Capitan” machine at Lawrence Livermore Public Lab, which is guaranteed to have more than 2 exaflops of FP64 oomph.

The jet engine simulation for the most recent Gordon Bell Prize entry was run on Oceanlite with approximately 58,333 nodes, which equates to more than 22.4 million CPE cores and more than 350,000 MPE cores. That is somewhat the greater part of the biggest setup of Oceanlite that has been accounted for in a paper. It is fascinating that the supported execution of the application was just 115.8 petaflops.

One more Gordon Ringer finalist for 2023 is a group at the College of Michigan and the Indian Organization of Science who worked with the group at Oak Edge on the Wilderness framework to utilize a crossover AI and HPC reenactment way to deal with join thickness capability hypothesis and the quantum many body issue to do quantum molecule reproductions. With this work, the subsequent programming had the option to scale across 60% of the Boondocks framework. Try not to expect that implies this quantum recreation ran at a supported 1 exaflops; it will presumably be more similar to 650 petaflops, and maybe much less relying upon the computational and network productivity of the Boondocks box with regards to this specific application.

The third finalist for the Gordon Chime prize comprises of scientists at Penn State and the College of Illinois, who worked with groups at Argonne Public Research center and Oak Edge to reproduce an atomic reactor. ( We started writing in the past at the Penn State NukeE department, so congrats, Lions.) This reenactment, which included radiation transport with intensity and liquid recreation within the reactor, and the ACM report says it ran on 8,192 hubs in the Outskirts framework, which is authoritatively measured at 9,402 hubs and which have one “Trento” custom Epyc computer processor per hub and four “Aldebaran” Sense MI250X GPU gas pedals per hub for a sum of 37,608 GPUs.

Teams from KTH Royal Institute of Technology, Friedrich-Alexander-Universitat, Max Planck Computing and Data Facility, and Technische Universität Ilmenau are the fourth finalists for the 2023 Gordon Bell away. They are scaling Neko, a high-fidelity spectral element code, across 16,384 GPUs on the “Lumi” supercomputer in Finland and the “Leonardo” supercomputer in Italy.

Using a cluster of 48 CS-2 wafer-scale systems from Cerebras with a total of 35.8 million cores, King Abdullah University of Science and Technology and Cerebras Systems performed seismic processing simulations for oil reservoirs. This one is perfect since it is bowing a computer based intelligence lattice math machine to accomplish HPC work – something we have investigated habitually.

Number six of the 2023 finalists is a group from Harvard College, who utilized the “Perlmutter” mixture computer processor GPU framework at Lawerence Berkeley Public Research center to reproduce the nuclear design of a HIV infection capsid up to 44 million molecules and a few nanoseconds of reproduction. Strong scaling was pushed to 100 million atoms by them.

This year, the ACM is likewise introducing its most memorable Gordon Ringer Prize for Environment displaying, and as we said we trusted would happen when in April of this current year we covered the Shout variation of the Energy Exascale Earth Framework Model, otherwise called E3SM, created and reached out by Sandia Public Labs, this drawn out goal weather conditions model is up for an award. Shout is fascinating is that is begun without any preparation for parts of the code, utilizing C++ and the Kokkos library to distribute out to central processors and GPUs in frameworks, and for this situation it was run on the Boondocks machine at Oak Edge, recreating 1.26 years of the day for a functional cloud-settling reproduction.

The Sunway Oceanlite framework is a finalist here, as well, yet this one mimicked the impacts of the submerged volcanic ejection off of Tonga in late 2021 and mid 2022, including shock waves, tremors, torrents, and water and debris dispersal. The blend of reenactments and models had the option to recreate 400 billon particles and stumbled into 39 million centers in the Oceanlite framework with 80% computational effectiveness. ( We need to see the paper on this one.)

The third Gordon Ringer environment demonstrating finalist is a group of scientists in Japan who got their hands on 11,580 hubs in the “Fugaku” supercomputer at RIKEN lab – around 7% of the complete hubs in the machine – and did a 1,000 outfit, 500-meter goal weather conditions model with 30 second revive for the 2021 Tokyo Olympics. This was a genuine use case, and more than 75,248 weather conditions gauges disseminated north of a multi day time frame and every brief conjecture was finished in less than three minutes.

Topics #applications #framework #reenactments #relationship #supercomputing