Understanding Development Process of Machine Learning Systems: Challenges and Solutions

Abstract

Background The number of Machine Learning (ML) systems developed in the industry is increasing rapidly. Since ML systems are different from traditional systems, these differences are clearly visible in different activities pertaining to ML systems software development process. These differences make the Software Engineering (SE) activities more challenging for ML systems because not only the behavior of the system is data dependent, but also the requirements are data dependent. In such scenario, how can Software Engineering better support the development of ML systems? Aim: Our objective is twofold. First, better understand the process that developers use to build ML systems. Second, identify the main challenges that developers face, proposing ways to overcome these challenges. Method: We conducted interviews with seven developers from three software small companies that develop ML systems. Based on the challenges uncovered, we proposed a set of checklists to support the developers. We assessed the checklists by using a focus group. Results: We found that the ML systems development follow a 4-stage process in these companies. These stages are: understanding the problem, data handling, model building, and model monitoring. The main challenges faced by the developers are: identifying the clients’ business metrics, lack of a defined development process, and designing the database structure. We have identified in the focus group that our proposed checklists provided support during identification of the clients business metrics and in increasing visibility of the progress of the project tasks. Conclusions: Our research is an initial step towards supporting the development of ML systems, suggesting checklists that support developers in essential development tasks, and also serve as a basis for future research in the area.

Publication
In 13th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM ), ACM/IEEE.
Date
Links