Imagine the business challenges that would be solved with 10 times faster data center server performance. For those of you who design and develop technology for the data center, this has been a goal you’ve strived for and struggled to achieve due to the diminishing rate of microprocessor technology and design improvement. Until now!
Something big is happening in the industry
Tech leaders including AMD, Dell EMC, Google, Hewlett Packard Enterprise, IBM, Mellanox Technologies, Micron, NVIDIA and Xilinx are joining forces with the goal to build servers that meet the higher performance requirements of data, analytics, AI and other emerging workloads. By coming together, these industry leaders can speed up innovation and bring differentiated products to market faster.
“Only open standards can foster a rich ecosystem of accelerated applications…”
The industry needs specialized hardware accelerators to close the performance gap. While accelerated applications have become common in some areas like high-performance computing (HPC) and deep learning, architectural limitations of the traditional input/output (I/O) subsystem have impeded the diversity of accelerator solutions necessary to meet this challenge.
New interfaces are being designed to eliminate inefficiencies of the traditional model and reduce the effort and complexity required to integrate a hardware accelerator into an application. The challenge cannot be met with proprietary solutions or delivered by a single company or within a single microprocessor architecture. Only open standards can foster a rich ecosystem of accelerated applications, lower the barriers to accelerator development and provide users with choices.
That’s why we are forming the OpenCAPI Consortium.
The OpenCAPI Consortium, a data-centric approach to server design
As described in last month’s blog, IBM and OpenPOWER partners have proven the value of open interfaces, coherently attached accelerators and I/O devices. Results consistently show improved accelerator performance, simplified application programming and significantly reduced CPU overhead compared to attaching devices through a traditional I/O subsystem.
These experiences have also highlighted why a new, truly optimized interface is required. Here are some of the reasons:
- Takes advantage of system memory capacity. Many of the established use cases have focused on specialized memory built into accelerators, but there are also many use cases where the data volume is large enough that the accelerator must take advantage of the capacity of system memory. These use cases can have latency and bandwidth requirements that far exceed the capabilities of mainstream server I/O interfaces. OpenCAPI is built to provide this level of performance.
- Provides flexibility and simplified design. The POWER8 CAPI protocol was built on top of the PCI Express physical layer. The flexibility and generations of backward compatibility of PCI Express requires specialized circuitry in an FPGA, limiting choices on the size and cost of an FPGA for a given application. OpenCAPI is architected to put very little burden on the attached device, allowing flexibility in choosing an FPGA and simplifying the hardware design of the accelerator.
- Operates independently but directly. An OpenCAPI device operates solely in the virtual address space of the applications it supports. This allows the application to communicate directly with the device, eliminating the need for kernel software calls that can consume thousands of CPU cycles per operation. It also provides isolation to deter a broken or malicious OpenCAPI device from accessing unauthorized memory locations owned by the operating system or other applications. This also makes it easier for OpenCAPI devices to interoperate with CPUs of different architectures.
- Addresses emerging memory technologies. In the coming years, the industry will see a number of advanced memory technologies emerge. OpenCAPI works as a technology-agnostic interface with outstanding latency and bandwidth, and offers a wide range of possibilities for accessing semantics to address the unique properties of specific advanced memory technology.