In November 2022, AWS gathered 50,000 customers and partners in Las Vegas for its 11th annual re:Invent conference. While the overall tone reached lofty heights with talk of vast, unfathomable, and extreme possibilities, many announcements made by the cloud giant focused on more down-to-earth yet impactful gains for AI practitioners seeking to build better AI outcomes in the enterprise. Practitioner dividends include better points of integration between key SageMaker tools, new operational capabilities, improved tooling user experiences, access to more data, and newfound responsible AI features – all pointing toward AWS as something more than the sum of its many parts.
AWS re:Invent – Key takeaways
- Recognizing its role in delivering cloud infrastructure services at an extreme scale, AWS emphasized its ability to bring the right integration, governance, and insights to bear in exploring the vast, unfathomable, and extreme realm of enterprise data.
- In support of data and analytics workloads, AWS introduced several platform-wide initiatives, featuring a zero-ETL integration between effort to link data warehouse, Redshift, and its MySQL- and PostgreSQL-compatible database, Aurora.
- Extending these efforts in support of data science workloads, AWS announced several major additions to its SageMaker family of services, most notably a new notebook experience that ties more closely ties SageMaker studio with the broader SageMaker suite of tools, as well as a new set of responsible AI capabilities.
- These updates speak to a broader but not overtly defined trend within AWS to evolve its cloud platform into a living, highly responsive, and extensible entity made up of myriads of self-managed asynchronous functions, as espoused within the company’s S3 storage service, which is comprised of more than 235 distributed microservices.
AWS re:Invent – A conference built for extremes
When Adam Selipsky, Chief Executive Officer of Amazon Web Services (AWS) took to the stage during the company’s 11th annual re:Invent conference in Las Vegas in late November 2022, he spoke of vast, unfathomable, and extreme possibilities. He highlighted the importance of exploration, noting the incredible strides made by science in exploring the unseen universe – an endeavor culminating in the recent work done by the James Webb Space Telescope in exploring the infrared spectrum.
The key to such exploration? According to Mr. Selipsky, customers need the right tools, best integration, correct governance, and timely insights to explore the vast, unfathomable, and extreme realm of enterprise data – capabilities found only within the AWS platform and supported by a track record of supporting the most extreme endeavors. To that end, Selipsky cited several major use cases with corroborating clients like Pinterest, which relies on AWS S3 storage to handle over a million terabytes of data, and Samsung, which manages more than 80,000 transactions per second on AWS.
Solidifying this grand vision as a relatable and actionable venture, Selipsky tackled what is perhaps the biggest form of technical debt plaguing the entire AI, analytics, and data management marketplace, namely extraction, transformation, and load (ETL) processes. Selipsky cited an unnamed customer who branded ETL as nothing more than a “...thankless, unsustainable black hole.” He then positioned AWS as offering a better way to deal with the limitations of ETL.
For example, Selipsky discussed the use of federated queries across numerous AWS analytical databases (i.e., Amazon Athena and Amazon Redshift) tools and even across clouds as an answer to the ongoing challenge of avoiding the rigidity and fragility of traditional ETL pipelines. Rather than move data from several databases into a central analytical repository via ETL, customers can instead issue a single query that’s capable of returning insight from across several data sources, all without the expense of moving data.
A long-established concept, federated queries are good for analytical databases. The real challenge (and opportunity) lies in creating this same federated synergy between operational and analytical databases. As a vendor with more than 13 operational and analytical databases at the ready, this issue of integration and data movement is not inconsequential for AWS.
Recognizing this, the vendor announced at the show what it terms as zero-ETL integration between its data warehouse, Redshift, and its MySQL- and PostgreSQL-compatible database, Aurora. Such integration does away with ETL pipelines and gives analytics practitioners nearly instantaneous access to analytical and operational data as it is being created, not after it has been extracted, transformed, and loaded into an analytics database.
Charting the motivation behind the innovation
Throughout Selipsky’s keynote and the numerous supporting keynotes, AWS introduced a sizable cadre of new AI- and data-centric capabilities akin to its zero-ETL solution. These included RDS read and write workload optimization and new, fully managed deployments for Amazon Aurora with MySQL compatibility, RDS for MySQL, and RDS for MariaDB. Amazon unveiled many similar capabilities during AWS re:Invent. Below is a short, curated list of key moves (see Table 1).
Table 1: Noteworthy announcements made at AWS re:Invent
Perform shadow tests within SageMaker
Practitioners can now route a copy of life inference requests to challenger (shadow) models. This allows users to move new models into production with greater assurance. It also lets them test out changes to any production asset, such as software patches, new frameworks, and even new AI hardware acceleration services.
New SageMaker Studio notebook experience
Featuring new built-in data preparation capabilities (using Amazon Data Wrangler) that enable practitioners to build and operationalize full data pipelines in support of AI. The new notebook experience also features full, collaborative functionality (Shared Spaces), built-in data visualization tools, and notebook automation (turning notebooks into scheduled jobs). This update also deepens and clarifies linkages with SageMaker services such as SageMaker Feature Store and SageMaker AutoML.
Asset sharing within Amazon SageMaker JumpStart
Practitioners can now share assets created using Amazon’s pre-built algorithms and pre-trained models the same way they share bespoke ML assets, sending those assets to other data scientists for review or to an operations team for deployment.
Full support for geospatial data within Amazon SageMaker
Currently in preview, this new feature builds on Amazon’s Location Services to enable practitioners to build, train, and deploy ML models using geospatial data. This service also supports many popular geospatial tools such as GeoPandas, Rasterio, and Geospatial Data Abstraction Library (GDAL).
New responsible AI capabilities
AWS has introduced several capabilities paramount in the fight to build trust in AI outcomes. In support of this goal, AWS introduced a central model dashboard, new operational user roles (compute and service), and an implementation of the Model Cards standard to promote transparency and explainability. Note that Omdia will cover these announcements in detail shortly.
Taken individually, these announcements highlight AWS’ interest in making life easier for enterprise AI, data, and analytics customers to adopt AWS cloud services and to do so using the technologies and tools they are comfortable with. As has become the norm among hyperscale cloud providers, AWS is simply removing complexities, streamlining operations, and adding new functionality. And yet, there appears to be something much more important at play here than the introduction of customary, incremental improvements.
A short foray into logical reasoning and cloud platforms
Just what is this “quintessence” permeating AWS re:Invent? Tuning into the keynote delivered by Dr. Werner Vogels, Amazon.com VP and CTO, revealed an important clue. During his introduction, Dr. Vogels, dressed as hacker Neo from the popular Matrix film franchise, re-enacted a famous scene from the original movie where Neo chooses to see the world as it truly is. But in this reenactment, the world revealed to Neo is that of total synchrony, where everything happens in linear increments, where a restaurant chef must cook an order of fries, one fry at a time.
According to Vogels, this is not how the real, real-world works. Linearity is an illusion, a construct we humans impose on the world in an effort to understand complex systems where everything happens all at once, asynchronously. AWS intends to escape the illusion of linearity and instead empower the asynchronous, decentralized reality of complex software systems.
Vogels likened such systems to the decentralized, asynchronous murmurations of starlings, which fly not according to a central control mechanism but instead using local command and control where each bird flies according to two simple rules: stay close to other starlings and stay away from predators. The result is an incredibly complex and beautiful airborne dance with thousands of birds moving as one.
Dr. Vogels further explained using nothing more than his t-shirt, which appropriately featured the lambda symbol. To many, this symbol pays homage to the popular Half-Life science fiction universe. However, given the nature of his keynote, certainly, it must refer to the Lambda calculus as invented by Alonzo Church in the early 1930s. If so, then the Lambda symbol must refer to the idea that from a single transformational rule for functions, we humans can build incredibly complex, asynchronous computational machines out of many anonymous functions that only exist programmatically during runtime, executed within higher-order functions.
This idea has gained quite a bit of traction among developers using languages like Lisp and Python as a means of writing very short functions that do not require any state. It has also found its way into AWS’ platform in the form of AWS Lambda, a service that enables developers to run code without provisioning or managing servers – a concept popularized as serverless computing.
For Vogels, technologies like AWS Lambda support an important step in the company’s evolution, a step outlined in the obscure but compelling notion of Gall’s Law, which states that “A complex system that works is invariably found to have evolved from a simple system that worked.” It also represents the way AWS is beginning to work as it continues to evolve its software from monolith to service oriented architecture (SOA), to microservices, and finally to shared services.
Learning to fly together as one
To put this all into context, consider Amazon Simple Storage Service (Amazon S3). When initially launched back in 2006 (on Pi day, as fate would have it), S3 service comprised just eight microservices. Today, that same shared service is made up of more than 235 distributed microservices.
When programmatic access to shared services, like Amazon S3, are orchestrated using an event-driven architecture (a key to AWS’ overall cloud-native approach to software), those shared services begin to behave like that flock of starlings. They incorporate new capabilities or adjust to changing conditions in a fluid manner. If AWS needs to fix S3 or wants to add a new service/capability, it does not need to take down the service for repairs to apply a security patch or refactor the entire S3 codebase to add a new data connector.
Similarly, enterprise developers can write event-driven software that utilizes S3 to create solutions that are truly abstracted away from the underlying infrastructure and, therefore, able to behave more like the real world, that is, to better handle requests in an asynchronous manner – not just one fry at a time.
What’s important to note here is that this asynchronous, event-driven approach to shared services is not just limited to Amazon S3 and developers building software to run on AWS. Rather it permeates the entirety of the AWS portfolio, sometimes on a theoretical level (e.g., containers as starlings), sometimes on a pragmatic level (e.g., developer APIs), and sometimes on a more practical level (e.g., application user experiences).
In looking at the many announcements made at re:Invent, it is easy to see these three levels at work, even on the practical level, with SageMaker’s suite of services and SageMaker Studio coming into closer alignment with one another as but one example. Why is this important? To this market observer, the AWS cloud platform has always suffered under a cloud of complexity stemming from its desire to offer an ever-growing but always disparate and uncoordinated collection of best-of-breed services.
Given the work cited by Dr. Vogels in evolving AWS toward a unified set of event-driven, shared services, such concerns over portfolio complexity have not taken into account the long-term evolution of AWS in accordance with Gall’s Law. In that light, adopting AWS is not about finding the best tool for the job. For AWS, it is about finding the best platform for any job, a platform that is truly more than the sum of its many parts.
Google Cloud Next ‘22: Openness and interoperability take center stage (November 2022)
Google Cloud Next ‘22 (October 2022)
SAP TechEd 22: SAP puts its unified business platform to work, empowering business users to build their own software (November 2022)
2023 Trends to Watch: Analytics and Data Management (December 2022)
Bradley Shimmin, Chief Analyst, AI platforms, analytics, and data management