Are you looking for smarter insights delivered straight to your inbox? Sign up for our weekly newsletters to receive updates on essential topics for enterprise AI, data, and security leaders.
The Challenge of Data Preparation
Data does not simply appear in the right location for enterprise analytics or AI; it must be carefully prepared and directed through data pipelines. This task falls under the realm of data engineering, which has historically been one of the most tedious and thankless responsibilities for enterprises. However, Google Cloud is now addressing the challenges of data preparation with the introduction of a series of AI agents that enhance the entire data lifecycle.
Innovative AI Agents
The Data Engineering Agent within BigQuery automates the creation of complex data pipelines using natural language commands. Additionally, the Data Science Agent transforms notebooks into intelligent workspaces capable of autonomously executing machine learning workflows. Moreover, the enhanced Conversational Analytics Agent now features a Code Interpreter that manages advanced Python analytics for business users.
Yasmeen Ahmad, managing director of data cloud at Google Cloud, shared insights with VentureBeat, stating, “When I think about who is doing data engineering today, it’s not just engineers; data analysts and data scientists also find it challenging to access high-quality data.” She noted that most workflows reported by users are predominantly focused on the laborious tasks of data wrangling and engineering.
Automating Data Pipelines
Google designed the Data Engineering Agent in BigQuery to facilitate the creation of complex data pipelines through natural language prompts. Users can describe multi-step workflows, and the agent takes care of the technical execution. This includes ingesting data from cloud storage, applying transformations, and conducting quality checks.
The agent automatically generates complex SQL and Python scripts, manages anomaly detection, schedules pipelines, and troubleshoots failures—tasks that typically require extensive engineering expertise and ongoing maintenance. It dissects natural language requests into multiple steps, beginning with establishing connections to data sources, creating suitable table structures, loading data, identifying primary keys for joins, addressing data quality issues, and applying cleaning functions.
Ahmad elaborated, “Ordinarily, that entire workflow would have involved writing a lot of complex code for a data engineer to build and manage a complex pipeline. Now, with the data engineering agent, it can create new pipelines from natural language and modify existing ones while troubleshooting issues.”
The Role of Data Engineers
Data engineers are typically very hands-on, and while the new data engineering agent simplifies many tasks, it does not eliminate the need for various tools used in building data pipelines, such as data streaming, orchestration, quality control, and transformation. Ahmad emphasized that engineers still appreciate the underlying tools and often view the agent as an expert partner and collaborator. Many data professionals prefer to see the code generated by the agent and visually inspect the pipelines it creates.
Consequently, while the agents can operate autonomously, data engineers have the opportunity to review the agent’s actions and provide additional suggestions for further adjustments or customization of the data pipeline.
The Competitive Landscape
Numerous vendors in the data space are developing agentic AI workflows. Startups like Altimate AI are focusing on specific agents for data workflows, while larger companies such as Databricks, Snowflake, and Microsoft are also advancing their own agentic AI technologies to assist data professionals.
Google’s approach stands out as it builds its agentic AI services for data through its Gemini Data Agents API. This strategy allows developers to incorporate Google’s natural language processing and code interpretation capabilities into their applications, marking a shift from closed, first-party tools to a more extensible platform. Ahmad explained, “Behind the scenes, all of these agents are being developed as a set of APIs, which we intend to make increasingly available to our partners.”