Could Conversational AI-Driven Data Analytics Finally Solve the Data Democratization Riddle? | by Galen Okazaki | Oct, 2024

0
3


Data Democratization –the process of making data accessible to everyone in an organization, regardless of their technical skills.

The democratization of data is a riddle that old-school Ralph Kimball acolytes like myself have been trying to solve for decades. Beginning with the user-friendly data models (data warehouses) and then onto the plethora of highly evolved, user-friendly business intelligence tools now available, we have come a long way.

And yet the ability to derive new insights from data, for the most part, remains the realm of data analysts, data scientists, and business analysts. For the vast majority of others within business organizations, the technical moat around data (real or imagined) persists.

A Glimmer of Hope?

In late November 2022, OpenAI’s release of ChatGPT enabled the general public (read: non-technical) to interact with a large language model (LLM) by simply typing in a request (prompt) in their natural language. Through this conversational user interface, users could prompt the LLM to answer questions about data it had been ‘trained’ on. In the case of ChatGPT, it was trained on, well… the internet.

ChatGPT put incredible data processing power in the hands of anyone who had access to it. As we became aware of this mechanism’s possibilities, many of us in the data analytics field soon began to ponder its potential impact on our own space.

We didn’t have to ponder for long…

A mere four months after the initial release of ChatGPT to the general public, OpenAI released an alpha version of a ChatGPT plugin called Code Interpreter. With it, anyone could load a dataset into ChatGPT, type a few prompts and invoke Python to perform regression analysis, descriptive analysis and even create visualizations. All without having to write any code!

The release of Code Interpreter gave us all a glimpse into how conversational AI-driven data analytics could work. It was mindblowing!

Not long after this, citing ChatGPT’s already established ability to write code (SQL, R, and Python, to name a few) along with the nascent capabilities of Code Interpreter, many began to predict the eventual demise of the data analyst role. (At the time, I begged to differ and even wrote an article about it).

Artwork Created by Galen Okazaki Using Midjourney

Will Generative AI Replace the Need for Data Analysts? Galen Okazaki for Towards Data Science

Granted, such a prediction didn’t seem like much of a stretch when you considered the possibility of even the least technically inclined in your business organization being able to derive insights from their data by simply typing or even verbally asking their questions.

So could Conversational AI-driven Data Analytics actually be the key to bridging the technical moat between data and its democratization?

Let’s take a closer look.

The Current State of Conversational AI-driven Data Analytics

So… it’s been almost a year and a half since that alpha version of Code Interpreter was released and how much progress have we made with conversational AI-driven data analytics? Probably not as much as you might have anticipated.

For example: In July 2023, ChatGPT’s Code Interpreter was rebadged and rereleased as Advanced Data Analysis. Not only was the name of Code Interpreter changed, but so was… umm… err… Well, at least its new name provides a more accurate description of what it actually does. 🤷‍♂️

In all fairness, Code Interpreter/Advanced Data Analysis is a fine tool, but it was never intended to be an enterprise-wide analytics solution. It still only works with static files you upload into it as you can’t connect it to a database.

For a better perspective, let’s look at some currently available analytic tools that have incorporated conversational AI interfaces.

Power BI Q&A

The first attempt at implementing conversational data analytics predated the ChatGPT release. In 2019, Microsoft’s ubiquitous Power BI released a feature called “Q&A.” It allowed users to type questions about their data in their natural language, as long as it is English (currently the only supported language).

This is done through a text box interface embedded within an existing dashboard or report. Through this interface, users ask questions about the dataset behind that particular dashboard or report in natural language. Power BI utilizes Natural Language Query(NLQ), to translate text questions into a query. The responses are rendered in visualizations.

While this feature has its uses, it has one significant limitation. Power BI Q&A is limited to only querying the dataset behind the report or dashboard being looked at, which is much too narrow of scope if your ultimate goal is the company-wide democratization of data.

Snowflake Cortex Analyst

A more suitable example of conversational AI-driven data analytics that could potentially support data democracy is Snowflake’s Cortex Analyst.

To briefly summarize, Snowflake itself is an ever-growing SaaS, cloud-based data warehousing and analytics platform that offers clients the option to scale their storage and/or compute up or down as they need. Its architecture also supports high-speed data processing and querying.

Cortex Analyst is Snowflake’s version of conversational AI-driven data analytics. Right off the bat, it has one huge advantage over Power BI’s Q&A, in that instead of only allowing users to query against a dataset behind an existing report or dashboard, Cortex Analyst will let the user query against the entire underlying database. It does this by relying on the semantic layer and model to interpret user requests.

This leads us to a critical point.

Having a fully vetted semantic layer in place is an absolute prerequisite to data democracy. It only makes sense that before you empower everyone within your company with the ability to work with data, there must be a universally agreed-upon definition of the data and metrics being used. More on this later.

While I’ve only discussed two examples of conversational AI-driven data analytics here, they should be enough to help you envision their potential role in data democratization.

Challenges to Data Democracy

While the ability to ask a question about your data in natural language and get an answer has significant potential, I believe that the biggest challenges to data democracy are not technological.

Let’s start with the prerequisites for successful data democratization. These include a strong data infrastructure that fully addresses the previously mentioned semantic layer and model, data literacy, data quality and data governance. In and of themselves, each of these is a significant project and the reality is that, for many companies, these are still works in progress.

That holds especially true for data literacy.

To wit, while 92% of business decision-makers believe that data literacy is important, only 34% of companies currently offer data literacy training (source Data Literacy Index, Wharton School of Business).

Another challenge is one that I have seen over the entirety of my career in data analysis. In my experience, there has always been a cadre of users (some of them at the C-level) who, for various reasons, refused to utilize the BI interfaces we created for them. While they were typically a minority of people, it did remind us that while bells and whistles are great, many will stubbornly continue to only work with what they are most familiar with.

Summary

A successful data democratization effort cannot be based on a specific technology, regardless of its promise. It requires a visionary, multi-pronged approach that includes a strong data infrastructure and an organizational data-first mindset, in addition to appropriate technologies.

So while conversational AI-driven data analytics cannot in and of itself solve the data democratization riddle, it can most certainly play a significant role in an overall effort.

Sidenote:

As someone who believes in enabling the lines of business to work with data, I see immense value in conversational AI-driven data analytics.

In my view, at least for the moment, the highest and best use of this tool would be in the hands of business analysts. Given their combined knowledge of how the business works(domain knowledge) and already established data literacy, they are the best equipped to leverage conversational analytics to get their answers without being encumbered by complex code.