INTRODUCTION to Rain Technology1. The most general share-nothing model is assumed. There is no shared storage accessible from all computing nodes. The only way for the computing nodes to share state is to communicate via a network. This differentiates RAIN technology from existing back-end server clustering solutions such as SUNcluster, HP MC Serviceguard or Microsoft Cluster Server.
Rainfinity's technology originated in a research project at the California Institute of Technology (Caltech), in collaboration with NASA's Jet Propulsion Laboratory and the Defense Advanced Research Projects Agency (DARPA). The name of the original research project was RAIN, which stands for Reliable Array of Independent Nodes. The goal of the RAIN project was to identify key software building blocks for creating reliable distributed applications using off-the-shelf hardware. The focus of the research was on high-performance, fault-tolerant and portable clustering technology for space-borne computing. Two important assumptions were made, and these two assumptions reflect the differentiations between RAIN and a number of existing solutions both in the industry and in academia:
2. The distributed application is not an isolated system. The distributed protocols interact closely with existing networking protocols so that a RAIN cluster is able to interact with the environment. Specifically, technological modules were created to handle high-volume network-based transactions. This differentiates it from traditional distributed computing projects such as Beowulf.
In short, the RAIN project intended to marry distributed computing with networking protocols. It became obvious that RAIN technology was well-suited for Internet applications. During the RAIN project, key components were built to fulfill this vision. A patent was filed and granted for the RAIN technology. Rainfinity was spun off from Caltech in 1998, and the company has exclusive intellectual property rights to the RAIN technology. After the formation of the company, the RAIN technology has been further augmented, and additional patents have been filed.
The guiding concepts that shaped the architecture are as follows:
1. Network Applications
The architecture goals for clustering data network applications are different from clustering data storage applications. Similar goals apply in the telecom environment that provides the Internet backbone infrastructure, due to the nature of applications and services being clustered.
The shared-storage cluster is the most widely used for database and application servers that store persistent data on disks. This type of cluster typically focuses on the availability of the database or application service, rather than performance. Recovery from failover is generally slow, because restoring application access to disk-based data takes minutes or longer, not seconds. Telecom servers deployed at the edge of the network are often diskless, keeping data in memory for performance reasons, and tolerate low failover time. Therefore, a new type of share-nothing cluster with rapid failure detection and recovery is required. The only way for the shared-nothing cluster to share is to communicate via the network.
While the high-availability cluster focuses on recovery from unplanned and planned downtimes, this new type of cluster must also be able to maximize I/O performance by load balancing across multiple computing nodes. Linear scalability with network throughput is important. In order to maximize the total throughput, load load-balancing decisions must be made dynamically by measuring the current capacity of each computing node in real-time. Static hashing does not guarantee an even distribution of traffic.
A dispatcher-based, master-slave cluster architecture suffers from scalability by introducing a potential bottleneck. A peer-to-peer cluster architecture is more suitable for latency-sensitive data network applications processing shortlived sessions. A hybrid architecture should be considered to offset the need for more control over resource management. For example, a cluster can assign multiple authoritative computing nodes that process traffic in the round-robin order for each network interface that is clustered to reduce the overhead of traffic forwarding