Written by Dr. Elan Sasson, CEO of Data Science Group (DSG)
As more businesses start to use machine learning (ML) and AI in products, tools, and business processes, the problem of high-quality data becomes more evident. It is evident that two equally important and tightly coupled components of all ML systems are data and model (i.e., algorithm). The majority (86% according to Gartner) of those ML projects are falling short due to reliability and quality issues of data associated with drifts in the datasets that regularly flow into the technology pipeline of AI-enabled applications.
Data drifts fall into two major categories: structural changes (e.g., the presence of new features or removal of existing features) and generative shifts (e.g., a concept drift/distribution shift) in a data stream. The entanglement complexity in the form of data dependencies in an ML system is also known as the “changing anything changes everything'' issue. For example, when an input feature is changed, then its importance (e.g., weight) and optionally the other remaining features' importance may all shift. Thus, the ML system’s behavior is governed not just by the model, but also by model behavior learned from the data.
The underlying assumption that future data will be similar to past observations, and that distributions of features and targets will remain relatively constant, is typically not valid. Since we expect the world to change over time, model deployment should be treated as a continuous and ongoing process, to identify and to track a model drift over time, and equally important to reduce the time fixing system failures related to model drifts.
An example of this is the performance decay of the AI model predicting the deterioration of Covid-19 patients due to shifts in the clinical data set - different waves of the disease introduce different distributions of the EMR (electronic medical records) over time. Another task taken from the field of algo-trading of commodities in the energy sector, in April 2020, a model for predicting crude oil prices stumbled upon unusual and unexpected negative prices that lead to abnormal behavior in the resulting model in taking bad positions. This eventually led to a substantial loss of money.
Due to this, model drift is more of a norm than an exception when dealing with real-world data, and will significantly degrade the performance of the ML model over time. Such degradation poses a real threat to enterprises developing mission-critical applications in highly regulated industries such as finance and healthcare.
As data changes hourly, weekly, or monthly, production-grade ML applications in diverse verticals will still face data quality issues during their lifespan. As a result, in many other verticals, ensuring high-quality models over time becomes an operational risk that must be monitored and mitigated. Due to this, companies deploying ML-enabled products and services in production environments need to constantly and proactively monitor ML performance metrics to ensure high-quality models over time. Managing dataset drifts could help address the shortage of deployable and industry-ready ML models. According to an O'Reilly survey, industries with mature ML practices cite "the lack of data or data quality issues" as the main bottleneck holding back the further adoption of AI and ML technologies.
With organizations beginning to realize the impact of data drifts and the importance of sustainability and reliability of ML models, organizations will need to go beyond simply deploying ML-driven applications in production environments and consider ML operational monitoring tools to identify data downtime caused by underperforming ML models. A growing trend in the ML community is to use ML algorithms to monitor ML applications at scale.
It is the boundary between the alerting events and the mitigation actions taken to reduce and to lessen critical AI system downtime that is the real challenge. The emphasis is on the ramifications time during identification and mitigation when a model is out of action and unavailable, thus posing an operational risk. It is the goal of the next wave in machine learning and data science to provide tools and recommendation engines that ensure continuous and high availability of AI-systems deployed in production and operational environments.
Achieving an agreed level of operational performance and uptime greatly relies on minimizing the average time between AI system breakdowns or mean-time-between-failure (MTBF). System availability in a production environment is often expressed as a percentage of uptime each year, as in high-critical enterprise software applications. In the event that the system has to operate continuously, service level agreements (SLAs) often refer to monthly or yearly downtime or availability in order to calculate service credits in proportion to the corresponding period of time that the system was unavailable.
Sometimes referred to as "class of nines" or the number of nines. As an example, 99.999% of the time would have 5 nines availability, which represents 5.26 minutes of downtime per year, while on the other edge of this availability scale 98% represents 7.31 days of downtime per year. A similar methodology of performance management along with key performance indicators (KPIs) and key risk indicators (KRIs) would be equally applicable to the AI space. Nevertheless, it seems that most of the products and tools in the space of AI monitoring are still focused on alerting and diagnosis mechanisms, based on a one-size-fits-all approach, and less on the SLA measure, which is a key component of any successful AI-driven system in a real-life business context. Putting your organization on this scale in its AI journey will guarantee a tangible business value of AI, tighter with information and data resilience over time.
