Resistance to data science change is futile

By Markus Top, DataRobot partnership manager, KID Group.

Johannesburg, 20 Aug 2024

Markus Top, DataRobot partnership manager at KID Group.

In the ever-evolving technology landscape, data engineering remains a crucial pillar supporting the infrastructure of modern organisations. While the role and importance of this science hasn’t changed, data engineering skills, models and approaches are undergoing radical shifts to keep pace with a changing environment.

The world of data engineering is undergoing a paradigm shift driven by advancements in artificial intelligence (AI), machine learning (ML), cloud computing and big data technologies.

Traditional approaches to data management are giving way to more agile, scalable and automated processes that enable organisations to extract actionable insights from vast and diverse datasets in real-time.

These shifts not only demand technical expertise, but also require a strategic understanding of how data can be leveraged to drive business outcomes.

Large language models will act as co-pilots for existing data scientists, engineers and analysts, boosting productivity.

Some noteworthy new trends include seamless data sharing without pipelines, data lake house modelling, data mesh architecture and low code data integration, and all the while support for Python keeps growing.

In addition, large language models (LLMs) will act as co-pilots for existing data scientists, engineers and analysts, boosting productivity. It will become easier for data engineers to automate data integration, cleansing and pipeline generation, while BI engineers can leverage LLMs to optimise queries, do complex data analysis and even answer questions directly.

Another trend to look out for is retrieval-augmented generation, which is a technique that improves the accuracy and reliability of generative AI models by incorporating information from external sources.

It addresses the limitation of LLMs, which rely solely on internal knowledge, by allowing LLMs to access and cite relevant data from external repositories.

DataOps has emerged as a powerful force in breaking down the traditional silos between data producers and consumers.

Implications for career paths and team structure

As data engineering trends evolve, so too must the skill set and roles within data teams. Traditional roles such as data engineers, data analysts and database administrators are expanding to incorporate skills in AI, ML and data science.

Additionally, interdisciplinary roles such as data architects, data scientists and ML engineers are becoming increasingly prevalent as organisations seek to bridge the gap between data engineering and business intelligence.

Moreover, team structures are evolving to foster collaboration and innovation across departments. Cross-functional teams comprising data engineers, data scientists, business analysts and domain experts are becoming the norm, enabling organisations to break down silos and leverage diverse perspectives to solve complex problems.

One of the most significant transformations in the role of data teams is the transition from cost centres to profit centres. Historically viewed as supporting functions focused on data management and compliance, data teams can be at the forefront of driving revenue growth and creating new business opportunities through AI and data-driven insights.

Transforming a data engineering department from a cost centre to a profit centre involves leveraging data assets and capabilities to generate revenue directly or indirectly. Some strategies to achieve this include:

Data monetisation, which includes identifying opportunities to monetise proprietary data assets by offering them as subscription-based services, licensing data to external parties, or creating data-driven products and insights that can be sold to customers.

Develop data-driven products and services that address market needs or complement existing offerings. For example, a data engineering department can create data visualisation tools, or industry-specific datasets that can be sold as standalone products or bundled with existing offerings.

Collaboration with external partners, vendors, or industry stakeholders to jointly develop data-driven solutions, co-create new products, or establish data-sharing agreements that generate revenue through revenue sharing or licensing arrangements.

Another option is to offer value-added services, such as data consulting, analytics-as-a-service, or custom data solutions to external clients or internal stakeholders.

Navigating the broader tech industry landscape

As data engineering evolves, new technologies emerge and existing ones advance rapidly. Adapting to these changes requires continuous learning and investment in new tools and platforms.

Organisations may struggle to keep up with the pace of technological innovation and face challenges in integrating new solutions with existing systems.

With the exponential growth of data volumes, scalability becomes a critical concern. Data engineering solutions must be able to handle increasingly large and diverse datasets efficiently. Scaling infrastructure and processes to accommodate growing data volumes, while maintaining performance and reliability, can be a significant challenge.

Ensuring data quality and governance becomes more complex as data sources proliferate and data flows become more intricate. Maintaining consistency, accuracy and integrity across disparate datasets and data pipelines requires robust data governance frameworks and meticulous quality assurance processes.

As data becomes more valuable and sensitive, ensuring its security and privacy also becomes paramount. New security risks and compliance requirements are emerging, which organisations need to address proactively.

Skilled data engineers are in high demand, and competition for talent is fierce. As the environment evolves, organisations may struggle to find professionals with the necessary skills and experience to implement and manage complex data infrastructures and systems.

Retaining top talent in a rapidly-changing landscape can also be challenging, requiring organisations to invest in training and professional development initiatives.

Implementing and maintaining advanced data engineering solutions can be costly, particularly as organisations scale their data infrastructure and operations. Balancing the need for cutting-edge technology with cost considerations requires careful planning and optimisation of resources, which can possibly be offset with profits from transitioning to a profit centre.

Embracing new data engineering trends often requires organisational change, including shifts in processes, roles and culture.

Resistance to change and siloed organisational structures can hinder adoption and implementation efforts, requiring effective change management strategies to overcome this.

Addressing these challenges requires a proactive approach, strategic planning and a commitment to ongoing innovation and adaptation.

Resistance to data science change is futile

Advancements in AI, machine learning and cloud require data engineers to keep pace with trends, opportunities and challenges.

Implications for career paths and team structure

Navigating the broader tech industry landscape