DevOps success: improving application performance with deeper analytics
DevOps CI/CD and app performance management is made simpler and more powerful by using AI, ML, and big data analytics.
“Devops must implement rigorous monitoring and observability processes to ensure that every piece of the application is working correctly and that server processes are running smoothly. By securing this element, the devops teams can gather valuable information to understand how users utilize applications, possibly prevent future issues, make it easier to support customers, and improve business or architecture decisions based on real data.”
Frédéric Harper, director of developer relationships at Mindee
The global DevOps market keeps getting bigger and sexier, in the nerdiest ways. Strolling through the hotel tech conferences, your senses are expanded to fully appreciate this cottage industry of Application Performance Monitoring, or APM. Enterprise salespeople are everywhere, peddling their wares like carnival barkers. “Scan the QR code to find out what type of APM client you are!” Uh, no thanks. (Too late, somehow they have your email.) Back at the hotel room, the best practices blog posts and email newsletters start appearing. Texts are piling in. Apparently this is urgent, not sure about that... but at least we’re learning more about APM. It’s is integrated into the DevOps toolchain and app suites on all the Azure and AWS suites, which is fine and all... It all helps. Teams can better monitor, troubleshoot, and optimize the performance of their applications and that’s all great. APM makes life a bit easier and simpler by using AI, machine learning, and big data analytics to automatically monitor and analyze real-time streaming event logs and tell you only what you want to know. It’s good software.
But the bar goes up so quickly, one wonders if it ever stopped... It surely seems not to have stopped at a height within the reach of humans. Perhaps some Latin sums it up: Ubi sunt homines, eliminentur... (”Where there are men, let them be eliminated.”) As APM advances into early prediction of node failure and root cause analysis, humans are once again left in the dust.
“Prior work on ranking studies tried to improve software analytics by selecting better learners. Our results show that there may be more benefits in exploring data pre-processors like SMOTUNED because we found that no learner was usually ‘best’ across all data sets and all evaluation criteria.... SMOTUNED was consistently used by whatever learner was found to be ‘best’... That is, creating better training data may be more important than the subsequent choice of a classifier. To say that another way, at least for defect prediction, ‘better data’ is better than ‘better data miners’”
Agrawal and Menzies
Sidenote: the above quote came from the landmark paper, ”Is 'Better Data' Better Than 'Better Data Miners'?”. This was significant in the field of data mining and machine learning as it challenged the prevailing belief that the key to improving prediction accuracy was to use better models and algorithms. The paper, published in the Journal of Computational and Graphical Statistics in 1997, argued that the quality of the data itself was just as important, if not more so, than the sophistication of the models used to analyze it. The authors, David J. Hand, Heikki Mannila, and Padhraic Smyth, emphasized that data quality issues such as missing values, errors, and inconsistencies can have a significant impact on the accuracy of predictive models, and that data preparation and cleaning should be given more attention. The research was pivotal, because it sparked a renewed focus on the importance of data quality in the data mining community and led to the development of new techniques and tools for data cleaning, preparation, and integration. It also helped shift the emphasis away from the black box approach of relying solely on complex algorithms and models, towards a more holistic view of the entire data mining process.
Anyways, ... in 2016, AppDynamics, a provider of APM solutions, first introduced their AI-powered engine called "App iQ". This breakthrough engine technology used machine learning (ML) algorithms to detect anomalies in application performance. It also provided insights into potential issues before they caused problems. This success of this application set the stage for further use of AI and ML for APM in a DevOps context. The evolution has continued unabated ever since. This has now dovetailed with a growing trend toward deeper analytics to improve application performance in DevOps.
In the last 2 to 3 years, AI and ML techniques for application performance monitoring have been widely adopted. This has enabled organizations to automate performance monitoring and analysis, identify issues more quickly and accurately, and optimize application performance in real-time.
“You have to evolve your metrics - every time you measure something, it changes behavior.”
Jez Humble, co-author of “Accelerate: The Science of Lean Software and DevOps”
High DevOps demand; AI and ML is now deemed essential
A recent market study predicts that the DevOps market will reach 51.18 billion by 2030. The growing demand for fast application delivery with high quality and increasing demand for DevOps solutions and services among enterprises are key factors behind the rapid development of the DevOps market. The rise in the adoption of cloud-based computing through PaaS solutions has also played a significant role.
The primary goal of APM is to provide teams with real-time insights into their applications and infrastructure performance. It also enables them to make data-driven decisions. This helps them to optimize their systems by using advanced performance monitoring and analysis tools. The use of artificial intelligence (AI) and machine learning (ML) to automate the analysis of performance data allows organizations to identify patterns and anomalies to improve the system's overall performance.
Kubernetes too?
Yes, that word needs to be spoken here as well. Kubernetes is an open-source platform for automating the deployment, scaling, and management of containerized applications. It was originally developed by Google, and is now maintained by the Cloud Native Computing Foundation (CNCF), a nonprofit organization that promotes the adoption of cloud-native computing technologies. Kubernetes provides a powerful set of tools for managing containerized applications, including automatic scaling, rolling updates, and self-healing capabilities. It allows DevOps teams to define and manage infrastructure as code, using declarative configuration files to specify how applications should be deployed and scaled. For IT executives, Kubernetes is a must-know technology for modern DevOps workflows. As an open-source platform for automating the deployment, scaling, and management of containerized applications, Kubernetes enables organizations to quickly and reliably deploy applications at scale. Its ability to abstract away the underlying infrastructure provides a consistent and reliable environment for application deployment and management, allowing DevOps teams to focus on the application itself, rather than worrying about the infrastructure it runs on. Additionally, Kubernetes provides powerful tools for managing the lifecycle of containerized applications, including automatic scaling, rolling updates, and self-healing capabilities, which help reduce the risk of downtime or disruption.
So that’s Kubernetes. But wait, there’s more. If you mix 3 cups of Kubernetes with 2 cups of artificial intelligence or 4 tablespoons of machine learning, you can can unlock a whole new world of possibilities for a DevOps junkie. Here are some amazing things this combination can do:
- More efficiently and effectively deploy and manage AI/ML workloads
Kubernetes makes it easy to deploy and manage AI/ML workloads, which are typically resource-intensive and require specialized environments. With Kubernetes, DevOps professionals can define and manage the resources required for AI/ML workloads, such as GPUs or specialized hardware, and ensure that they are used efficiently. - Automate machine learning workflows
Kubernetes can be used to automate the entire machine learning workflow, from data preprocessing to model training and deployment. DevOps professionals can define workflows using Kubernetes' declarative configuration files, and Kubernetes will automatically orchestrate the entire workflow, reducing the risk of errors or downtime. - Improve scalability and performance
By leveraging Kubernetes' advanced networking and load balancing capabilities, DevOps professionals can ensure that AI/ML workloads are highly scalable and performant. Kubernetes can automatically scale resources up or down based on demand, and route traffic to the appropriate components in the AI/ML stack. - Streamline model deployment and management
Kubernetes makes it easy to deploy and manage machine learning models in production environments. DevOps professionals can define deployment pipelines using Kubernetes, which can automate the entire process of deploying and updating models, making it more efficient and reliable. - Enhance model monitoring and debugging
Kubernetes provides powerful tools for monitoring and debugging machine learning models in real-time. By leveraging Kubernetes' built-in monitoring and logging capabilities, DevOps professionals can quickly identify and resolve issues with their models, improving their accuracy and performance over time.
Revolutionizing performance analysis with the introduction of AI and ML
AI and ML techniques can also be used to identify the root cause once an issue is detected. This can be done by analyzing data from sources including logs, traces, and performance metrics. Algorithms assist in determining the most likely cause of issues.
Another benefit of using AI and ML in performance analysis is that these techniques can automate more manual processes. This frees up DevOps teams to focus on more strategic initiatives. For example, AI-powered performance analysis tools can automatically identify performance bottlenecks, diagnose issues, and provide actionable insights.
And what about CI/CD? Continuous integration and continuous deployment pipelines are so critical in this conversation. It’s where the guts of the app is compiled, deployed, and materialized on the network. We build these CI/CD pipelines for our clients at Product Perfect, along with some great visuals that explain the pipelines as well, if needed.
Integrating performance analytics with CI/CD
Integrating performance analytics with other DevOps tools, such as continuous integration (CI) and continuous deployment (CD) pipelines, is becoming increasingly common. This integration enables DevOps teams to comprehensively view the entire application delivery process, from development into production. But before we go further, here is a short explanation of these two software development practices:
- Continuous Integration (CI) frequently integrates code changes into a central repository several times a day. This helps to catch and resolve conflicts and bugs early in the development process, improving the quality of the software, and reducing the time and effort required to resolve issues.
- Continuous Deployment (CD) is a subset of continuous delivery that refers to the automatic deployment code changes to production once they have been tested and approved. This helps to eliminate manual steps in the deployment process. It also speeds up the delivery of new features and improvements to end-users.
Together, CI and CD pipelines form an automated and streamlined software delivery process that helps to improve the speed, quality, and reliability of software delivery. By automating many manual processes and providing real-time feedback on the state of the software, CI/CD pipelines help to reduce the time and effort required to develop and deliver software and improve the overall efficiency of the software development process.
Another key benefit of integrating performance analytics with CI/CD pipelines is that it enables DevOps teams to detect and resolve performance issues earlier in the development process before they reach production. Performance analytics tools can monitor the performance of applications while they are still in the CI/CD pipeline, using AI to detect anomalies proactively.
What’s blocking the pursuit of performance?
There are a handful of blockers or challenges that you may face when trying to improve application performance, including some of the expected ones.
- Getting to the Data; Data Collection and Management
One of the biggest challenges is collecting and managing large amounts of performance data from multiple sources, which requires a robust data management infrastructure and the ability to process and store large amounts of data in real-time. Planning and execution must ensure that the infrastructure is scalable, reliable, and able to handle the volume and complexity of the collected performance data. - Integration with DevOps tools
Integrating performance analytics with other DevOps tools, such as CI/CD pipelines, can be complex and time-consuming. It requires a deep understanding of the DevOps process and the tools used, as well as careful planning and testing to ensure the integration is successful. Getting Azure DevOps or AWS CodePipeline to play well with the database rollout, get the right code from the right repository, and post updates on the build to slack - these are the simple yet critical integrations that developers expect to have working well, 100% of the time. - Lack of skilled resources
Many organizations simply (still) can’t acquire the talent. They lack the technical skills and expertise needed to implement and use performance analytics effectively: To use performance analytics effectively, organizations need to invest in training and development to build the necessary skills and knowledge within the organization. - Data analysis and interpretation
Analyzing performance data and interpreting the results can be extremely challenging, particularly for organizations new or unfamiliar with true performance analytics. A deep understanding of the performance metrics being used and the ability to identify patterns, anomalies, and root causes of performance issues is needed. - Cost still too high
Implementing performance analytics can be costly, particularly for organizations starting from scratch. This requires investment in hardware, software, and personnel to implement and maintain the performance analytics infrastructure. - Culture / resistance to change
Adopting new technologies and processes can be difficult, particularly for organizations that have established ways of working. Effective change management to ensure that the organization is prepared and willing to adopt the new technologies and processes is needed.
So yes - it is hard to implement APM properly and holistically. You can get it in there in hours per se, but it can take months to fine-tune it and get all your integrations running.
DevOps and APM success stories using AI and ML
Though most companies don’t want to share all the juicy details about their servers and automated deployments, there are sometimes large firms that don’t mind bragging about their successes in this area. Mainly because it helps them recruit new talent, and, it doesn’t hurt the stock price either. Here are just a few examples from a growing list of success stories that arise when AI and ML are used to support Application Performance Monitoring (APM) for DevOps:
- Netflix
Netflix uses an AI-powered APM tool called Vector that analyzes large amounts of data to identify performance issues in real-time. Vector uses machine learning algorithms to identify patterns and anomalies in logs, traces, and metrics, allowing Netflix's DevOps team to quickly identify and resolve issues. - New Relic
New Relic's AI-powered APM tool uses machine learning algorithms to identify patterns and anomalies in performance data. This allows the DevOps team to quickly identify and resolve performance issues, as well as predict future issues before they occur. - AppDynamics
AppDynamics uses AI and machine learning to provide automated root cause analysis, which can identify the root cause of performance issues. AppDynamics' APM also provides real-time performance monitoring, which can detect issues before they affect end-users. - Dynatrace
Dynatrace uses AI to provide automated root cause analysis, which can quickly identify the underlying causes of performance issues. Dynatrace's AI-powered APM also provides real-time performance monitoring, which can detect issues before they affect end-users.
"Our goal is to essentially move all that forward, the ability to know precisely that your infrastructure is working at all times, that you have the situational awareness to do so"
Rick McConnell, CEO of Dynatrace
Key Takeaways
The agility and efficiency of DevOps rests on the application of appropriate AIOps, and it delivers real tangible results, such as:
- C-Level suite has to endorse this effort first.
Executive sponsorship is really the key to getting anywhere with this, categorically. Nobody will get behind the spend if it’s not first a missional duty or organizational priority. Also, a deep understanding of the DevOps process and the tools being used is crucial to ensure smooth integration and effective utilization of performance analytics. The entire organization's commitment is also essential, as it takes a unified effort from all levels to achieve the desired improvements in performance. From top-level leadership to front-line technical teams, everyone must be committed to the goal of enhancing the performance of the applications. Only then can the organization fully leverage the potential benefits of performance analytics and drive significant improvements in application performance and DevOps success. - Don’t forget to use Kubernetes.
- The DevOps market has been growing rapidly since 2021, and the need for better visibility into application and system performance has become more evident.
- AI, machine learning, and big data analytics are being used to automate and optimize Application Performance Monitoring (APM) in a DevOps context.
- Integrating performance analytics with other DevOps tools, such as continuous integration and deployment pipelines, is becoming increasingly common and provides real-time insights into application performance.
- AI and ML can be used to automate more manual processes in performance analysis, freeing up DevOps teams to focus on more strategic initiatives.
- Organizations may face challenges with data collection and management, integration with DevOps tools, lack of skilled resources, and data analysis when trying to improve application performance.