FindIaCBug: Create your Defect and Error Free Infrastructure as Code Scripts

Infrastructure-as-Code (IaC) provides a model for provisioning and managing a computing environment using the explicit definition of the desired state of the environment in source code and applying software engineering principles, methodologies, and tools. As organizations increasingly adopt the IaC approach, the tools and methodologies for developing high-quality IaC source codes (scripts) are crucial. For example, security issues and misconfigurations in IaC scripts are very common, and their impacts are severe [1]. Nearly 200,000 insecure IaC templates were found among IaC scripts used by a set of enterprises, and 65% of cloud incidents are due to misconfigurations [2]. Thus, the detection and correction of defective and erroneous IaC scripts are of paramount importance.

Objectives

FindIaCBug aims to provide the deployment engineers with the tools that enable developing defect and error free IaC scripts. FindIaCBug also aims to provide a set of catalogues of IaC quality, including best and bad development practices, smells, bugs, and quality metrics.

Architecture

FindIaCBug uses different techniques for detecting different types of defects in IaC scripts. For example, it uses ontological reasoning to verify the constraints over the structures of TOSCA blueprints and IaC scripts. To detect smells/bugs, FindIaCBug employs three main approaches: informal rules, semantic rules, and data-driven approaches such as machine learning, deep learning, and natural language processing. The details about different techniques can be found in the related publications. In this blog, we present the architecture and workflow of smell and error detection.

Figure 1 shows the high-level architecture and workflow of our approach to detect the occurrences of smells and errors in deployment model descriptions in IaC. More specifically:

Population of the Knowledgebase: Resource Experts populate the knowledge base by creating resource models (ontology instances representing resources/nodes in the infrastructure) using SODALITE IDE. Platform Discovery Service may (semi-)automatically update the knowledge base by creating resources models.
Definition of Smells Detection Rules: We use the semantic rules in SPARQL to detect different smells in deployment models. There exist rules to detect common security and implementation smells. New rules can be defined to detect new types of smells.
Detection of Smells: Application Ops Experts create the deployment model instances for representing the deployment models of the applications. The deployment model is automatically translated into the corresponding ontological representation and is saved in the knowledgebase. The smell detection rules are applied over the ontologies in the knowledgebase to detect deployment model-level smells. If a smell is detected, the details of the smell are returned to the Application Ops Experts. The detected smells are shown in the IDE as warnings. The same flow applies to Resource Ops Experts, as they also receive warnings for their resource models.

Figure 1. An Overview of our Approach to TOSCA Smell and Error Detection

Collaborations

FindIaCBug was developed partially in collaboration with the RADON Horizon Eurper Project [3]. In particular, IaC quality metrics and the tool that assesses the defect proneness of IaC scripts using machine learning-based techniques were developed in the RADON project.

RADON Project - Logo

References

[1] https://go.snyk.io/IaC-Report-2021.html

[2] https://start.paloaltonetworks.com/unit-42-cloud-threat-report

[3] https://radon-h2020.eu/

Related Publications

The following are the publications related to FindIaCBug. Several more publications are under review and writing. Stay tuned!

[1] Kumara, Indika, et al. "The do’s and don’ts of infrastructure code: a systematic gray literature review." Information and Software Technology 137 (2021): 106593.

[2] Borovits, Nemania, et al. "DeepIaC: deep learning-based linguistic anti-pattern detection in IaC." Proceedings of the 4th ACM SIGSOFT International Workshop on Machine-Learning Techniques for Software-Quality Evaluation. 2020.

[3] Kumara, Indika, et al. "Towards semantic detection of smells in cloud infrastructure code." Proceedings of the 10th International Conference on Web Intelligence, Mining and Semantics. 2020.

[4] Kumara, Indika, et al. "Quality assurance of heterogeneous applications: The sodalite approach." European Conference on Service-Oriented and Cloud Computing. Springer, Cham, 2020.

[5] Gorroñogoitia, Jesús, et al. "A Smart Development Environment for Infrastructure as Code." CEUR Workshop Proceedings (2021).

[6] Dalla Palma, Stefano, Dario Di Nucci, and Damian A. Tamburri. "AnsibleMetrics: A Python library for measuring Infrastructure-as-Code blueprints in Ansible." SoftwareX 12 (2020): 100633.

[7] Dalla Palma, Stefano, et al. "Within-project defect prediction of infrastructure-as-code using product and process metrics." IEEE Transactions on Software Engineering (2021): 1-1.