Business Intelligence Buyer's Guide

How to Overcome Resistance to Data Science Automation

Solutions Review’s Contributed Content Series is a collection of contributed articles written by thought leaders in enterprise technology. In this feature, dotData VP of Data Science Aaron Cheng offers commentary on how to break barriers and overcome resistance to data science automation.

Technology can change the world, but global adoption is not always a smooth process.  

Ten or 15 years ago, we saw this happen with the introduction of the first electric vehicles. Despite their promise to reduce greenhouse gasses and lower fuel costs, they faced market and consumer resistance for years. The same happens in areas like biotechnology which can improve health, agriculture, and the environment but are also cautioned for the ethical dilemmas, regulatory gaps, and public distrust. 

In 2023, automation powered by artificial intelligence (AI) and machine learning (ML) is undoubtedly one of the most significant technologies to hit the market. While automation has also reached the data science and ML community, concerns around the impact of automating the ML workflow on the data science profession are common.

Automation can enhance data science projects’ productivity, efficiency, and speed, However, many data scientists have concerns around embracing automation, fearing that it will compromise their results, reduce their creativity or replace their skills. How can an organization overcome these challenges and leverage automation in its data science process?

Download Link to Business Intelligence & Data Analytics Buyer's Guide

Data Science Automation

Taking Ownership and Responsibility for Automation Technologies

No new technology comes without risks, but the consequences of ignoring automation are too great. Gartner estimates that automation could result in a $15 trillion benefit to the global economy by 2030, but they also warn that automation has risks if not used responsibly.  

If left unchecked, automation can reduce the quality of ML models, cause them to drift, generate inaccuracies, bias, and even model collapse. The challenge for data science and machine learning teams is to adopt automation as a means of increasing productivity – rather than as a means of replacing existing processes.

Data teams should embrace automation tools as means of improving their own personal and team performance, leveraging the tools while maintaining close supervision, verification, and adjusting processes and usage to improve overall performance of the team.

In the world of data science, automation must be seamlessly combined with manual workflows to provide best results. For example, feature engineering is at the heart of the data science process – and is notoriously slow and manual in nature. While automation can provide significant benefits to the feature engineering process, it must also seamlessly combine with the manual efforts that are well-established in the team’s workflows. 

When properly developed, deployed, and maintained, automation can improve performance by mitigating human errors and inconsistencies that may impact the quality of the data science process.. 

Automation can also ensure that the data science process follows the best practices and standards of the industry, such as data quality checks, feature selection methods, and model validation techniques. The technology can also provide traceability and documentation of the data science process, facilitating auditing and compliance.

Finally, automation can also help data scientists communicate their findings and recommendations more effectively, by generating visualizations and dashboards.

Implementing, Deploying, and Piloting Automation: Best Practices

A recent IBM Chief Data Officer (CDO) study found that 63 percent of the top CDOs surveyed are aligned with their business strategy. The study identified an elite 8 percent of CDOs who allocate less revenue but generate greater business values. 

This elite group of CDOs focuses on technology and the business, ensuring data is tied to return on investment (ROI). They emphasize business models and business protection and highlight the importance of being internally and externally engaged. 

Aligning automation with business objectives is essential. Organizations must identify the real business problems they are trying to solve with automation and not just randomly deploy these tools.  

For example, if a company wants to increase customer satisfaction, reduce operational costs, or enhance competitive advantage, automation tools must be developed, customized, and deployed to meet those targets. Setting milestones and monitoring performance is essential to understand whether the instrument is efficient and what impact it is having. 

Choosing the right automation tools is also essential. Development teams should have the final say on whether a technology is suitable for their work, evaluating capabilities, and compatibility with the existing data science infrastructure and workflow. An organization should also consider the automation solution’s ease of use, scalability, and security. 

Creating a company-wide culture that encourages communication and cultivates innovation is critical. Data team experts should also be given the opportunity to increase their skill levels. A forward-looking management must constantly seek feedback and suggestions from its data teams to improve automation tools. 

When piloting data science automation platforms, organizations should also ensure the testing process is designed to evaluate tools within the right context. Data science teams are notoriously foused on “model quality” – usually measured by the accuracy of predictions. When evaluating automation solutions, however, model quality may not be the best metric for determining if a new platform will save time and accelerate the output of the team. 

Determining the best key performance indicators for properly evaluating data science automation is key. For example, if the automation solution is designed to accelerate feature discovery and engineering, measuring the cycle time of experimentation and the time needed to discover and evaluate features is far more important than the quality of any given feature, since data science is, by definition, an iterative ‘trial and error’ process when it comes to discovering new features for any ML model.

Final Thoughts

From resistance and skepticism, fear of losing control and ownership, to complexity and integration, ethics and accountability questions, there is much we can learn about automation misconceptions. There are underlying truths in the pushback of new technologies, demanding caution and vigilance. However, this vigilance can evolve into something productive. Automation by itself is never a threat but a responsibility and an opportunity. 

AI and ML are deeply connected to humans and are, in many ways, a reflection of our work and essence. By overcoming the resistance and confronting risk armed with professional standards, a holistic, safe, efficient, and compliant approach can be developed to leverage automation responsibly.

Latest posts by Aaron Cheng (see all)

Share This

Related Posts

Udacity Data Science Ad