For the past two years, Nvidia has launched high-end supercomputing hardware at theNew Tech International Conference for High Performance Computing, Networking, Storage and Analysis (sensibly abbreviated as SC). Today, at SC14, Nvidia is following that trend, except this time it’s bringing a new dual-GPU product to the table. The new K80 is based on a revised version of the GK110 chip, dubbed GK210, with 13 of Kepler’s 15 SMX’s enabled and a 300W TDP. This may allow the card to hit higher Boost frequencies than the desktop-oriented Titan Z, which struggled to deliver equivalent scaling to two Titan Black’s in an SLI configuration.
Meet GK210
Nvidia’s GK210 is a fairly significant alteration to the GK110/GK110B. While the maximum number of stream processors remains the same, at 2880, the register file size has doubled (512KB, up from 256KB) and the L1 cache is 128KB up from 64KB. Throughput and performance per clock both appear to be somewhat higher on the new card, Tesla K80 offers roughly 2x the maximum single and double-precision of Tesla K40 (8.74TFLOPs vs. 4.29TFLOPs and 2.91TFLOPs vs. 1.43TFLOPS) despite having fewer cores per chip (2496 vs. 2880 on a full Tesla K40).
What does all this mean? Massive scientific computing horsepower — more than any other company has ever brought to market. The 24GB of RAM on the card is divided between both GPUs, but there should be a difference between how this memory pool is tapped for scientific workloads vs. gaming.
The entire point of having a hardware accelerator in a supercomputing environment is that you can keep data stored locally rather than waiting on the system to deliver new information over the achingly slow PCIe 3.0 bus. Nvidia’s NVLink is expected to address part of this problem when it eventually arrives in concert with IBM, but until that solution is ready, the company is stuck with the relatively pokey bandwidth and latency of PCI Express 3.0. That means you can only solve problems as complex as you can hold in local memory.
When you build a dual-GPU system, the data in RAM is always duplicated. In graphics, a dual-GPU card with 2x4GB of RAM isn’t the same as a card with 8GB of RAM because all graphics data is copied across both frame buffers to ensure that the game can be rendered smoothly. That’s not automatically the case in scientific computing, where the two GPUs could be assigned two completely different tasks or theoretically given two halves of the same problem (provided, obviously, that the workload can be split in that fashion).
The big unknown is whether or not the GPU can run at its boost clock of 875MHz for sustained periods of time, as opposed to its base frequency of 562MHz. While even this frequency gives it a 50% advantage in raw throughput over a single K40, a boost frequency of 875MHz would make it literally twice as powerful as Nvidia’s previous top-end supercomputing solution. This is the first card with a full dynamic GPU boost solution in the HPC space, so how that will impact performance over time remains to be seen.
Don’t expect consumer options
The big difference between this card and the previous workstation or scientific computing behemoths that Nvidia has launched is that we don’t expect to see a 28nm variant of this chip taking over the high end of the market. Kepler may have addressed Nvidia’s highest-end consumer needs quite well for several years, but the twin GK210 chips that K80 offers would be beaten by a brace of GTX 980s — likely at much lower prices. Maxwell has taken over the high-end consumer market, and it’s not going to give that space back.
For now, the Tesla family will remain Kepler-only. As of this writing, Nvidia hasn’t specified when it plans to roll out midrange consumer Maxwell-based solutions, or whether it will wait for 20nm to deploy those cards. Doing so could make good financial sense — it wouldn’t be the first time that a company has debuted a midrange part on a new process to iron out the manufacturing issues before transitioning high-end products at a somewhat later date.
This new card should be a monster in its intended environment. AMD’s efforts in the HPC and scientific computing space remain fairly minimal; the company has done a limited amount of proof-of-concept work with APUs and other HPC projects but hasn’t put a sustained push behind scientific computing. Intel has its own Many Integrated Core chip, of course, but Knights Landing won’t ship until the second half of next year. For now, that puts Nvidia firmly in the driver’s seat.
No comments:
Post a Comment