My interest in NUMA (Non-Uniform Memory Access) has picked up quite a bit as of late. This technology has a huge potential for application in large scale Web hosting and Internet solutions. In brief, NUMA is a technology that was conceived to get around bus contention in general. What this means is that, with NUMA, programs are no longer limited by memory size or disk IOPS on the same host node and can, therefore, harness the power of other nearby servers as if they were part of the same chassis. In fact, IO scales up as a result!
The problem with uniform memory access architectures is that in a single chassis multi-processor system with a large amount of memory the bus tends to saturate very quickly. Processor cores addressing the memory bus are thus contending for the same resource, which becomes a hot spot. Take a 32-core server with 128GB of DDR3 memory. This server will under perform should it run a parallel memory hungry application. The issue being that each task on a core will at some point need to address memory via the bus (DDR3 is a three-frequency or three-channel bus). So out of 32 tasks that are all attempting to access the bus, there will be (32 -3 = 29/3 ~ 9) 9 tasks that contend for the same channel. The end result is a saturated bus and poor performance.
NUMA solves this bus contention problem by creating a virtual network of processor nodes and giving each node a certain weight based on its locality. The most interesting aspect is its true scalability potential. Most importantly, NUMA is the only technology that can scale up. Amazon’s EC2 and most other technologies in the market nowadays can only scale out. In other words, one has to implement load balancing, replication, and data synchronization.
Here’s where it gets interesting. IBM has already comoditised NUMA with its x86 offering: the IBM x440. Although one can only chain two x440 chassis together to form one NUMA network, what was once a concept is now practical! The x440’s can be chained together via their SMP expansion module using remote IO cables. Now imagine having the ability to chain 5, 10, 100 16GB dual quad core boxes! Not only does one take advantage of of a few terrabytes of memory but also scale core compute and IO.
A challenge remains. The remote IO cables that allow formation of a NUMA node cluster is based on proprietary technology. The research and development effort required to deliver such technology is colossal for a smaller firm. Or maybe not!