Google welcomed a global audience of virtual attendees to its Google Cloud Next ‘22 conference in October 2022 and managed the impossible. It delivered a concise message of openness and interoperability across its AI, data, and analytics portfolio, despite announcing more than 123 new Google Cloud Platform (GCP) capabilities. Key announcements that caught our attention centered on integrating external data sources, embracing unstructured data, and unifying analytics tools.
Turning maps into territories
Regardless of physical location, whether in person or digitally, major technology trade shows all suffer from the same drawback: noise. Unfortunately, companies, especially those with sizable portfolios, generate a lot of noise during these conferences as they fast-forward through announcement after announcement, calling out attractions as though they were a series of distant mountaintops separated by an impenetrable fog obscuring the full landscape below. This leaves attendees with two less-than-appealing choices: focus on a handful of product areas, ignore the rest, or do their best to see and understand what’s between those mountaintops. Regardless, this makes it hard for enterprise buyers to truly feel where the company is heading, let alone where the company stands on larger issues.
True to its history and this market-wide “norm,” Google did indeed take to the stage for its still virtual Google Cloud Next ‘22 conference and blitz through a tremendous amount of material, announcing more than 123 new GCP capabilities spanning its data cloud, AI and machine learning (ML), data and analytics, business intelligence (BI), databases, cloud security, and collaboration. And yet, with just one sentence, Google Cloud CEO Thomas Kurian, in his New York keynote address, swept away the fog to reveal the territory connecting its numerous announcements:
Google’s “vision for cloud computing is to simplify all of the technologies that organizations need, making it accessible by simplification to every organization around the world as software platforms that provide the foundation for your business to digitize and accelerate.”
This singular idea of simplification hits at the heart of most enterprise concerns – complexity. For enterprise IT practitioners seeking to exploit AI, data, and analytics, this specter of complexity feeds on the growing number of data silos, hybrid/multi-cloud deployments, database specialization, data fragmentation (disparate formats/models), as well as increasing pressure from the business to democratize data across the business and to do so in real-time. Keeping up with this constant influx of demands creates an ever-moving goalpost for IT practitioners, one they can never truly reach.
Such pressures are in great part reflected in enterprise buying priorities, which currently overemphasize spending on foundational issues such as integration compared with more forward-looking, greenfield investment opportunities such as data science platforms, tools, and services (see Figure 1).
Figure 1: Supportive vs. innovative investments
More frustrating, investments in forward-looking technologies such as AI and ML tend to introduce further data integration, transformation, and processing complexities, which increases the need to reallocate resources toward those issues. Unfortunately, within data science, this often results in data scientists and ML engineers spending more time on finding, provisioning, and validating data. This leaves fewer resources for them to tackle more impactful endeavors such as working with the business or building, training, and testing ML models.
How can enterprises combat this situation? Simply put, companies investing in AI, data, and analytics can greatly reduce technical debt across all three technology concerns by adopting open and integrated solutions. As but one example, instead of investing in a closed database platform that only speaks one query language and one file format, companies should look for database solutions capable of supporting many data formats, disparate data models, third-party APIs, and third-party software itself.
Google appears keenly aware of this challenge and the necessity of delivering AI, data, and analytics software that is both open and integrated. During the entirety of Google Cloud Next ‘22 and across keynote, product announcements, and hands-on sessions alike, Google consistently drove this idea home. This view was particularly evident in the company’s continued efforts to integrate and extend the reach of its data and analytics portfolio.
Opening up and extending BigQuery and BigLake
While most major technology companies will endeavor to emphasize new products during their annual conferences, Google instead focused on the two primary means of lowering technical debt – integration and openness.
To begin, Google announced a new capability for BigQuery that will open up the database considerably, helping users work directly and seamlessly with structured, semi-structured, streaming, and unstructured data. For example, Google announced Datastream for BigQuery (in preview), which lets users easily replicate data from external, operational data sources such as PostgreSQL, MySQL, Oracle, and AlloyDB. This uses an auto-scaling architecture to facilitate extract, load, and transform (ETL) data pipeline workflows without impacting analytical query performance.
Next up and making for a very welcome addition is the general availability of search indexing and search functions within BigQuery. This allows data practitioners to use standard SQL to find specific textual information stored in either unstructured text or semi-structured data. In so doing, Google enables users to avoid the cost of time and resources in exporting textual data to a separate search engine like Google Cloud Search. It also brings Google into alignment with many of its rivals that are actively moving search engines into analytical databases, as with Microsoft SQL Server or Apache Spark and Elasticsearch.
Further, Google announced support for a number of important industry standard data formats including Apache Iceberg (available today) as well as Linux Foundation Delta Lake and Apache Hudi (both forthcoming). These sit alongside improving support for Apache Spark within BigQuery with access to stored procedures within Apache Spark (in preview). Furthermore, Google expanded its data integration facilities for popular third-party solutions from Collibra, Databricks, Elastic, Fivetran, MongoDB, Sisu Data, Reltio, and Striim.
Taken together, these moves greatly expand the scope of Google BigQuery, which was launched in 2012 as the realization of a science project capable of scaling to support massive data analysis jobs in a highly performant manner. They also further solidify Google BigQuery as the company’s keystone data and analytics platform, supporting its increasingly unified data-driven transformation platform, Google Data Cloud.
Unifying Looker and Data Studio
During Google Cloud Next ‘22, the company made good on a long-standing promise to unify its previously bifurcated analytics offerings, Google Looker and Data Studio. Since acquiring Looker in 2020, Google has endeavored to integrate these two solutions while honoring its unique value propositions. From the outset, Looker has represented a traditional approach to business intelligence (BI), while Data Studio has catered to the data democratization ideals of data visualization.
In this way, Looker has focused on data professionals, while Data Studio has gained appeal with business users more broadly. The two offerings were complementary, but unifying their divergent approach from a technical and use case perspective has proven to be a difficult task for the company, at least until now.
During Cloud Next ‘22, Google announced that it would be recasting these two offerings under the single Looker umbrella brand. Building on existing work to integrate and unify Looker and Data Studio, Google announced that:
- Looker will remain branded as Looker, but as a part of Google Cloud Core, it will now be available from within Google Cloud Console. It will be closely tied in with the company’s core cloud infrastructure capabilities, including security and management services.
- Data Studio will be rebranded as Looker Studio, remaining a freely available product that holds true to the solution’s original focus on providing users with a true self-service analytics capability across more than 800 data sources (600 of those maintained as connectors by Google itself).
- Looker Studio Pro is being launched as a new, paid enterprise version of Looker Studio, featuring enterprise management features, team collaboration capabilities, and service level agreements (SLAs).
Beneath the covers, however, Looker Studio will take on the ability to access Looker data models (in preview). With a Looker modeling layer at the ready as a bridge to Looker Studio, Looker Studio customers can effectively blend lightweight, ad-hoc data with curated, secured, and trusted data that has been fully vetted by IT. This way, Google is not just creating a bridge between the two products. Customers that upgrade from Looker Studio to Looker Studio Pro will gain a host of enterprise-class management features (collaboration, SLAs, etc.). It will also help Google more effectively and completely integrate services from across its broad AI and ML portfolio, as with in-database ML within BigQuery.
Both Looker and Data Studio users will find this appealing. However, the real fireworks are still to come, as Google intends soon to integrate Looker and Looker Studio with its broader data and analytics platform, Google Dataplex. Announced in the spring of 2021, Dataplex serves as a data fabric for the enterprise, centralizing access to and governing, managing, and monitoring analytical data assets that may be spread across many disparate data silos.
Without unification across these three products (Looker, Data Studio, and Dataplex), Google would remain saddled with a bifurcated analytics story that would fall short of the growing market expectation of analytics tooling that allows for cross-product and cross-platform visibility of all company data assets. In this light, the company’s unification of Looker and Data Studio via a single data model is a crucial, strategic step forward, as it will enable Google to reveal the full power of Dataplex as a unifying data layer.
In evaluating these new capabilities directly announced at Cloud Next ‘22, it may seem that Google is simply filling in well-understood gaps in its existing product portfolio, moving each product ahead incrementally. That is true. However, in taking a step back, these new capabilities add up to a much larger sea change for the company. For the first time, Google can begin to promote GCP as a fully unified data and analytics platform.
Delivering a unified yet open data and analytics portfolio will prove crucial for Google as it seeks to match AWS and Microsoft move for move in convincing enterprises to move their analytical center of gravity to GCP. It is also imperative if the company wants to draw the attention of enterprise software developers and ISV partners alike.
For example, enabling BigQuery to handle a broader selection of data workloads and data formats, for example, directly supports the company’s Google Cloud Ready partner program, which seeks to call attention to partner solutions that create the best possible integration with BigQuery. It is important to note that this ecosystem is not merely limited to analytics players like Tableau and Qlik (both already program members). Early program members include many important AI players such as Dataiku, DataRobot, and Databricks.
Encouraging them to view GCP not as cloud service containing specialist data and AI tools, but rather as a full-service cloud platform capable of carrying their solutions forward within a marketplace defined by innovation and freedom of choice.
Further reading Omdia Universe: Modern Data Analytics Platform (July 2022)
Dreamforce ’22: Salesforce gets serious about data unification (September 2022)
Software Market Forecasts: Analytics and Data Management, 2021–26 (September 2022)
Bradley Shimmin, Chief Analyst, AI platforms, analytics, and data management