Cloud Computing is a novel paradigm for providing data center resources as on-demand services in a pay-as-you-go manner. It promises significant cost savings by making it possible to consolidate workloads and share infrastructure resources among multiple applications resulting in higher cost- and energy-efficiency.
However, these benefits come at the cost of increased system complexity and dynamicity posing new challenges in providing service dependability and resilience for applications running in a Cloud environment. At the same time, the virtualization of physical resources, inherent in Cloud Computing, provides new opportunities for novel dependability and quality-of-service management techniques that can potentially improve system resilience.
In this chapter, we first discuss in detail the challenges and opportunities introduced by the Cloud Computing paradigm. We then provide a review of the state of the art in dependability and resilience management in Cloud environments, and conclude with an overview of emerging research directions.