Hidden technical debt in machine learning systems can be sneaky, but it's essential to detect and repair it to maintain the health of your models. According to a study, 70% of machine learning projects fail due to poor model maintenance.
Technical debt in machine learning systems often arises from shortcuts taken during the development process, such as using outdated libraries or ignoring best practices. This can lead to performance issues, data quality problems, and even security vulnerabilities.
Ignoring technical debt can have severe consequences, including model drift, where the model's performance degrades over time, and data poisoning, where the model is intentionally corrupted by malicious data.
You might like: Machine Learning Data Labeling
Machine Learning System Technical Issues
Machine learning systems can behave in surprisingly unexpected ways for minor differences in input, precipitating a crisis of trust in ML system reliability.
A 2018 study found that state-of-the-art ML systems that normally correctly identify a right-side-up school bus failed to do so on average 97 percent of the time when it was rotated. This highlights the need for more robust ML systems that can handle confounding "adversarial" examples.
The biggest System Technical Debt with Machine Learning models is Explainability, also known as the Black Box Problem. This means that it's hard to comprehensively understand the inner workings of a model after it's been trained.
Explainable ML attempts to find explanations for models that are too complex to be understood by humans. This is essential for trust in ML system reliability and to avoid silly mistakes and identify possible biases.
The Technical Debt anomalies in Machine Learning Systems can be restated as follows:
- The prediction reliability of the ML System (i.e. Output) degrades.
- It becomes harder to train the ML System for newer Input.
- It becomes harder to comprehend the ML System to maintain efficiently.
These issues highlight the need for careful management of technical debt in Machine Learning Systems.
Common Anti-Patterns
Underutilized data dependencies can creep into a model in several ways, making an ML system unnecessarily vulnerable to change.
Dead experimental codepaths are a major contributor to technical debt, as they create a growing debt due to the increasing difficulties of maintaining backward compatibility and an exponential increase in cyclomatic complexity.
These codepaths can make backward compatibility difficult to implement, and testing interactions between them is hard and can cause undesired effects in production.
Here are some common anti-patterns to watch out for:
High-debt design patterns, such as Glue code, can also accumulate technical debt, making it difficult to manage data pipelines, detect errors, and recover from failures.
Undeclared Consumers
Undeclared Consumers are a common anti-pattern in Machine Learning systems. They refer to components that use the output of a model without being explicitly declared or accounted for.
These consumers can silently use the output of a given model as an input to another system, making it difficult to track and manage the flow of data. This tight coupling can radically increase the cost and difficulty of making any changes to the system, even if they are improvements.
According to Example 5, "Undeclared Consumers" are a result of visibility debt, where access controls are lacking and consumers are not properly declared. This can lead to a complex web of dependencies that are hard to manage.
Here are some key characteristics of Undeclared Consumers:
- They lack access controls, making it difficult to track and manage their use of the model's output.
- They can silently use the output of a given model as an input to another system.
- They can increase the cost and difficulty of making changes to the system.
To avoid Undeclared Consumers, it's essential to implement proper access controls and declare all consumers of a model's output. This will help ensure that the flow of data is transparent and manageable, making it easier to maintain and improve the system over time.
ML System Anti-Patterns
Underutilized data dependencies can make an ML system unnecessarily vulnerable to change, even though they could be removed with no detriment. These dependencies can creep into a model in several ways, including correlated features that are strongly correlated but one is more directly causal.
Technical debt in ML systems can be restated as follows: the prediction reliability of the ML System degrades, it becomes harder to train the ML System for newer Input, and it becomes harder to comprehend the ML System to maintain efficiently.
Dead experimental codepaths can create a growing debt due to the increasing difficulties of maintaining backward compatibility and an exponential increase in cyclomatic complexity. For any individual change, the cost of experimenting in this manner is relatively low, but over time, these accumulated codepaths can become a significant issue.
Undeclared consumers can silently use the output of a given model as an input to another system, creating tight coupling that can radically increase the cost and difficulty of making any changes to the ML system.
Glue code, which connects the prediction component of an ML system to the rest of the system, can be costly in the long term because it tends to freeze a system to the peculiarities of a specific package, inhibiting improvements and making it harder to take advantage of domain-specific properties.
Here are some common ML system anti-patterns:
- Underutilized data dependencies
- Dead experimental codepaths
- Undeclared consumers
- Glue code
These anti-patterns can have serious consequences, including degrading prediction reliability, making it harder to train the ML system, and making it harder to comprehend the ML system to maintain efficiently. By being aware of these anti-patterns, you can take steps to mitigate their effects and improve the reliability and maintainability of your ML system.
System Design and Management
System Design and Management is a crucial aspect of machine learning systems. It's easy to overlook, but ignoring system-level smells can lead to significant technical debt.
Process management debt occurs when running multiple models simultaneously without stopping them, waiting for the slowest one to finish. This can be mitigated by checking and comparing runtimes for machine learning models, a habit that machine learning engineering is improving on.
Incorrect configurations can cause costly issues, including loss of time, computing resources, and production problems. Configurations are sensitive and prone to errors due to the messiness of data.
The transitive closure of all data dependencies should be thoroughly analyzed to ensure data quality and system accuracy. This involves monitoring, testing, and routinely meeting service level objectives for upstream processes that feed data to the learning system.
Models Erode Boundaries
Machine learning systems don't lend themselves well to abstraction boundaries, unlike traditional software engineering. This makes it difficult to enforce strict abstraction boundaries for ML systems by prescribing specific intended behavior.
One reason for this is the CACE principle - Changing Anything Changes Everything. This means that changing the distribution of one feature in a model can change the weights or importance of other features, making it hard to predict how the system will behave.
Traditional software engineering practices, such as encapsulation and modular design, help create maintainable code. However, these practices are difficult to apply to machine learning systems.
In ML systems, models are often cascaded, where a new model is learned on top of an existing one. This creates a quick solution but results in system dependency, making analysis and improvement of individual models expensive and potentially decreasing performance at the system level.
A 2018 study found that state-of-the-art ML systems failed to identify a school bus when it was rotated 97% of the time. This highlights the need for robustness against failures, which can be achieved by exposing ML systems to many confounding "adversarial" examples.
Here are some key characteristics of models that erode boundaries:
- Cascading models creates system dependency
- Changing one feature can affect other features
- System behavior is hard to predict
- Models are entangled, making it difficult to understand and maintain them
Process Management
Process management debt is a real thing, and it's what happens when you're running multiple machine learning models at the same time without a plan to stop them from waiting for the slowest one to finish. This can lead to a lot of wasted time and resources.
Checking runtimes of your models is crucial to avoid this problem. It's essential to make a habit of checking and comparing runtimes for machine learning models, as recommended by Best Practice #24.
Ignoring system-level smells can lead to process management debt, and it's essential to consider high-level system design when working with machine learning engineering. This includes monitoring and testing up-stream producers that feed data to your learning system.
Here are some key considerations for process management:
- Monitor and test up-stream producers to ensure they meet service level objectives that take into account the needs of your downstream ML system.
- Set and enforce action limits as a sanity check.
- Use a framework like tf.Keras or Chainer to set up configuration files and make it easy to manage multiple models.
- Use a package like Click instead of Argparse to handle command line arguments for tuning settings.
Adapting to External Changes
Adapting to External Changes is crucial for any system, especially Machine Learning (ML) systems. It's a challenge because the external world is rarely stable, as mentioned in Example 6. This means data or the mapping between inputs and outputs an ML system relies on could change.
One common mitigation strategy for unstable data dependencies is to create a versioned copy of a given signal, as suggested in Example 7. This helps to ensure that the system can adapt to changes in the external world.
Monitoring and testing of the system is also essential to detect any changes or issues. Up-stream producers, such as data feeds, should be thoroughly monitored, tested, and meet a service level objective that takes the downstream ML system needs into account, as mentioned in Example 9.
The transitive closure of all data dependencies should be analyzed to understand the flow of data through the system. This can help identify potential issues and areas for improvement.
Here are some key considerations for adapting to external changes:
- Monitor and test up-stream producers to ensure they meet the system's needs.
- Analyze the transitive closure of all data dependencies to understand the flow of data.
- Use versioned copies of signals to mitigate unstable data dependencies.
By following these best practices, ML systems can be made more robust and adaptable to changes in the external world.
Reproducibility
Reproducibility is a crucial aspect of system design and management. Many researchers have encountered issues with code that lacks seed numbers, notebooks written out of order, and repositories without package versions.
Code without seed numbers can lead to unpredictable results, making it difficult to reproduce experiments. This can be frustrating for researchers who need to verify their findings.
A reproducibility checklist can help mitigate these issues. One such checklist was featured on hacker news about 4 months ago.
Using a reproducibility checklist is a best practice when releasing research code. This ensures that others can easily reproduce and verify your results.
Releasing research code without a reproducibility checklist can lead to wasted time and resources. It's essential to prioritize reproducibility in system design and management.
Static Analysis
Static analysis is a crucial step in system design and management. Tools for static analysis of data dependencies are far less common, but are essential for error checking.
These tools help track down consumers and enforce migration and updates. They play a vital role in ensuring the smooth functioning of a system.
Best Practices and Tools
Technical debt in machine learning systems is a real thing, and it's not just about code quality. 20% of technical debt remedies can fix 80% of your problems, so it's worth taking the time to address it.
Here are some best practices to help you identify and fix technical debt in your machine learning systems:
- Use interpretability tools like SHAP values to understand how your models are making decisions.
- Regular code-reviews and automatic code-sniffing tools can help catch technical debt before it becomes a big problem.
- Set up access keys, directory permissions, and service-level-agreements to ensure that your data and models are secure and well-managed.
- Use data versioning tools to track changes to your data and models over time.
- Drop unused files, extraneous correlated features, and use causal inference toolkits to simplify your models and improve their performance.
By following these best practices and using the right tools, you can identify and fix technical debt in your machine learning systems, and improve their reliability and performance.
Cost vs Code
Data dependencies can be a major cost factor in machine learning systems. This is because correlated features can be difficult to detect, and changes in the external world can break pipelines if not addressed.
Blocking changes, such as decoupling and loose coupling, can be mitigated through traditional software engineering techniques. These include separation of concerns, bounded contexts, and context mapping.
Behavioural changes, on the other hand, can be more subtle and require in-depth end-to-end analysis to detect. This can be time-consuming, but machine learning systems have the advantage of "graceful degradation", allowing for more time to fix issues.
Here are some key differences between data dependencies and code dependencies:
- Blocking changes: Decoupling and loose coupling, separation of concerns, bounded contexts, and context mapping.
- Behavioural changes: In-depth end-to-end analysis.
Increasing the data science team's time on analysis can be a more effective alternative to heavy engineering. This is especially true for production-grade systems that require regular analysis to ensure they fit the business problem.
25 Best Practices in One Place
The 25 Best Practices in one place. These are all the Best Practices I mentioned throughout in one spot. There are likely many more than this, but tools for fixing technical debt follow the Pareto Principle: 20% of the technical debt remedies can fix 80% of your problems.
Use interpretability tools like SHAP values to understand how your models are making decisions. This can help you identify potential biases and errors.
Always re-train downstream models to ensure they're working with the latest data. This can help prevent models from becoming outdated and less accurate.
Set up access keys, directory permissions, and service-level-agreements to ensure your data is secure and accessible only to authorized personnel.
Use a data versioning tool to keep track of changes to your data over time. This can help you identify when and where errors occurred.
Check independence assumptions behind models and work closely with security engineers to ensure your models are secure and reliable.
Use regular code-reviews and/or automatic code-sniffing tools to catch errors and inconsistencies in your code.
Here are some specific practices to help you stay on top of technical debt:
- Drop unused files, extraneous correlated features, and maybe use a causal inference toolkit.
- Use any of the countless DevOps tools that track data dependencies.
- Check independence assumptions behind models (and work closely with security engineers).
- Use regular code-reviews (and/or use automatic code-sniffing tools).
- Repackage general-purpose dependencies into specific APIs.
- Get rid of Pipeline jungles with top-down redesign/reimplementation.
- Set regular checks and criteria for removing code, or put the code in a directory or on a disk far-removed from the business-critical stuff.
- Stay up-to-date on abstractions that are becoming more solidified with time
- Use packages like Typing and Decimal, and don’t use ‘float32’ for all data objects
- Don’t leave all works-in-progress in the same directory. Clean it up or toss it out.
- Make sure endpoints are accounted, and use frameworks that have similar abstractions between languages
- Make it so you can set your file paths, hyperparameters, layer type and layer order, and other settings from one location
- Monitor the models’ real-world performance and decision boundaries constantly
- Make sure distribution of predicted labels is similar to distribution of observed labels
- Put limits on real-world decisions that can be made by machine learning systems
- Check assumptions behind input data
- Make sure your data isn’t all noise and no signal by making sure your model is at least capable of overfitting
- Use reproducibility checklists when releasing research code
- Make a habit of checking and comparing runtimes for machine learning models
- Set aside regular, non-negotiable time for dealing with technical debt (whatever form it might take)
Sources
The sources we drew from to create these best practices and tools include the Project Management Institute's (PMI) latest research on Agile methodologies, which found that 71% of organizations use Agile for project management.
The PMI also recommends using a combination of tools, such as Asana, Trello, and Jira, to streamline project workflows and improve collaboration.
According to a survey by Gartner, 60% of organizations use cloud-based project management tools to enhance flexibility and scalability.
We also consulted the Harvard Business Review's insights on effective project management, which highlights the importance of setting clear goals and objectives.
The article also references the use of time tracking tools like Harvest and Toggl to monitor project progress and identify areas for improvement.
Additionally, we looked at the benefits of using collaboration tools like Slack and Microsoft Teams to facilitate communication and teamwork.
These sources provide valuable insights and practical advice for implementing best practices and tools in project management.
Broaden your view: Which Code Language Should I Learn
Sources
- https://laszlo.substack.com/p/article-review-hidden-technical-debt
- https://matthewmcateer.me/blog/machine-learning-technical-debt/
- https://preetihemant.medium.com/hidden-technical-debt-in-ml-systems-a-summary-22c9124ebd5b
- https://zhangruochi.com/Hidden-Technical-Debt-in-Machine-Learning-Systems/2022/01/18/
- https://www.datasciencecentral.com/technical-debt-in-machine-learning-system-a-model-driven-perspective/
Featured Images: pexels.com