Dataiku announced a new platform for operationalizing large language models in the enterprise. By decoupling the application layer, it provides a unique take on generative AI. A public preview will be available soon.
Omdia view
Summary
In late September 2023, Dataiku co-founder and CEO Florian Douetteau took the stage in New York for one of the company’s larger roadshow events, the Dataiku Everyday AI Conference. As has become almost commonplace these days, he announced Dataiku’s own take on a generative artificial intelligence (GAI) architecture capable of operationalizing large language models (LLMs) in the enterprise. Soon to be made available as a public preview, this new platform looks anything but commonplace, thanks to the company’s efforts to decouple the application layer from the underlying infrastructure.
Why this matters
While revving up the audience ahead of the company’s announcement of its new architecture, enigmatically named Dataiku LLM Mesh, company co-founder and CEO Douetteau offered his own enigmatic message for conference attendees. He said, “We are living in the time of fast food intelligence ... and those of us working in data and AI have a great responsibility ... to deliver on the promise of AI.”
What does that mean? Well, for this analyst, it means two things:
- First, we are living in an unprecedented time of abundance where AI as a “tool” is readily available to all interested parties. This is particularly apparent in the realm of GAI, where AI outcomes are no further than a chatbot conversation away from both business users and data scientists.
- Second, it means that AI practitioners must not allow this ubiquity to create a false perception of simplicity or capability. With GAI, companies can easily stand up a working proof of concept (PoC) in just a few days, but moving that PoC into production as a secure, performant, governable, coherent (e.g., repeatable), auditable, and trustworthy solution is a different story.
For Dataiku, as one of only a handful of vendors offering an independent operationalized AI development platform (e.g., a machine learning operations [MLOps] platform), such operational concerns are of paramount importance—right after the concept of choice. Unlike rival GAI MLOps providers Amazon Web Services (AWS; Bedrock), Google Cloud (Vertex AI), and Microsoft (Azure AI), Dataiku cannot afford the luxury of looking inward at platform-specific differentiators like highly optimized AI hardware acceleration. Rather, Dataiku must focus on GAI MLOps (hereafter referred to as LLMOps) while also supporting these differentiators across “all” platforms.
That which makes an LLM Mesh
This concept of choice has long been a defining quality for Dataiku, which provides its customers with choice—a choice of databases, languages, frameworks and libraries, models, and target deployment platforms. It figures heavily in Dataiku’s newly announced LLM Mesh architecture, which will enter the market with integrations covering the following lengthy list of LLM technology areas and providers:
- Hosted LLM API services
- Snowflake
- AI21 Labs
- Azure OpenAI Service
- Google Vertex AI
- AWS Bedrock
- OpenAI
- Anthropic
- Cohere
- MosaicML
- Self-hosted private LLM
- Hugging Face (Llama2, Falcon, Dolly 2, MPT, and thousands of other fully private LLMs)
- Vector stores
- Pinecone
- FAISS
- ChromaDB
- Accelerated computing
- NVIDIA
Dataiku’s LLM Mesh will bring these technologies under the wing of its well-established, operationalized AI platform, a platform that unifies the entire “traditional” ML development, deployment, and management lifecycle. These integrations on their own do not set Dataiku apart, as most platform players already let practitioners build LLM-based outcomes programmatically. What is different about LLM Mesh, however, is its addition of several platform functions/workflows that are specific to GAI. For example, LLM Mesh incorporates
- A full audit trail of model experiments and output.
- Native support for the popular retrieval augmented generation (RAG) use case for LLM query context and augmentation.
- Built-in tools to identify personally identifiable information (PII) and handle content moderation.
- Security and permission tools that work across supported services.
- Process caching regardless of underlying platform/services.
- Native routing and orchestration tools to support complex model chains and solution architectures.
- Tools specific to cost estimation and reporting—a hugely under-supported area within the industry.
- Equal access to low/no-code and pro-code tools for all the abovementioned capabilities and native Dataiku offerings.
On an architectural level, these new services are not “add-ons” to the Dataiku platform. Rather, they are core aspects of the overall platform maintained by the company. This approach enables Dataiku to decouple the application layer from the underlying infrastructure services layer and to do so for both traditional AI and GAI via LLM Mesh. It also frees practitioners to swap models, frameworks, vector databases, and other traditional AI- and LLM-specific assets in and out during all phases of the project lifecycle—even when those assets live on more than one target platform (hyperscaler, database, etc.). With such a higher level of abstraction in hand, practitioners are free to experiment, test, and iterate more rapidly.
Practitioners are also welcome to optimize deployed solutions by swapping underlying assets in and out when doing so is deemed financially beneficial. Whether for technological or financial reasons, such freedom enables practitioners to future-proof their AI development investments. Need to move to a new data lakehouse? Not a problem. Interested in pricing out privately hosted open source models against publicly hosted closed source models? Have at it. Want to do some A/B testing across two models, each house on different cloud platforms? Go to town.
Building toward everyday AI
It will take time for Dataiku to bring LLM Mesh fully to market, as the company is currently looking to introduce a closed preview before the end of the year. That said, Dataiku has a head start in delivering key aspects of LLM Mesh, including native support for RAG use cases, its own prompt engineering playground user experience (Prompt Studios), several LLM solution recipients to accelerate development, and established partnerships with key players, including NVIDIA and Pinecone.
In short, LLM Mesh is nothing new to Dataiku. For the company and co-founder and CEO Douetteau, GAI is just another major market innovation/disruption—the same as the introduction of Apache Spark or Kubernetes. Dataiku’s response—creating an abstraction layer for AI practitioners—seems extremely appropriate. It is almost as though Dataiku has been building toward this inflection point for the last decade to prepare for when companies begin treating AI as an engineering concern rather than a scientific endeavor.
Given the incredible rate of innovation and change within the GAI marketplace, where new models, new fine-tuning methodologies, and new model compilation techniques appear daily, Omdia believes that companies must invest in an underlying xOps platform (i.e., a platform capable of supporting MLOps and LLMOps). Companies that fail to adopt an xOps approach will fail to fully capitalize on emerging innovations like GAI and turn those innovations into everyday AI capabilities. On a practical level, this means that they will not only fail to meet their project goals (timelines, cost, etc.) but will also open themselves up to undue security, privacy, and compliance risks.
Appendix
Further reading
Event Recap: Google Cloud Next – August 2023 (September 2023)
“AWS extends generative AI platform capabilities with autonomous agents” (August 2023)
Author
Bradley Shimmin, Chief Analyst AI platforms, analytics, and data management