Cloud Compounding: Fault-tolerance for meeting availability requirements

Fault-tolerance is a highly important aspect in a mobile cloud, even more so than a conventional cloud because of the mobile nature of the devices, i.e. “mobility is inherently hazardous” [2]. Disconnection can happen due to user mobility as devices enter and leave a network. Running out of battery power, network signal loss, or hardware failures are other common factors.

Redundancy. The FT support in Hyrax [14] comes from the FT mechanisms of Hadoop on which Hyrax is based. Hadoop recovers from task failure by re-execution and redundancy. If node failure is anticipated, the task is replicated on another node/s that is deemed to be stable. In their testing and evaluations, where applications such as Sort, Grep and Word Count were ported, it was found that Hyrax was able to recover more effectively when the number of nodes was higher.

In [12] although FT is not implemented, it is mentioned as future work, where the authors suggest using context-awareness for fault-tolerance purposes. Context information would be used to judge if a node is unstable, and if so, task redundancy could be carried out to increase the success level of task completion.

Proxy migration. In [62] FT is achieved by migrating the proxy service. In the event that a proxy node fails, its place is taken up by another node in the service cloud so as to ensure minimal disruption to the communication stream/s. This was tested using a testbed comprising PlanetLab nodes and hosts on a university intranet, and two strategies were implemented: on demand backup where another service is migrated as soon as system detects failure, and ready backup where a backup node is configured by default at the time of service composition. Of these two strategies, the ready backup was slightly faster, as can be expected, since the system only needs to reconfigure the relay to forward the stream in that case. The authors also suggest using the client middleware to trigger reconfiguration faster in future work.

Resource tracking. In [77], Palmer et al. proposes using the Ibis grid computing platform to address similar problems in mobile computing. The Ibis framework enables users to integrate their mobile devices onto the grid taking advantage of the grid’s computational power. Here, FT is achieved by the Ibis system’s resource tracking model using the ‘JEL’ API standing for ‘Join, Elect, Leave’. As the name suggests the JEL API gives the system malleability, enabling it to adapt as new mobile nodes join and leave the network. The ‘Join’ operation notifies the application when a new node connects to the distributed system, thereby facilitating the applications to scale up. The ‘Elect’ operation is used to elect a node into the coordinating role. Whenever a node is disconnected, whether by choice or fault, the ‘Leave’ operation notifies the application and triggers an ‘Elect’ to select a new node to fill in.