Google launches a 9 exaflop cluster of Cloud TPU v4 pods into public preview

At its I/O developer conference, Google today announced the public preview of a full cluster of Google Cloud’s new Cloud TPU v4 Pods.

Google’s fourth iteration of its Tensor Processing Units launched at last year’s I/O and a single TPU pod consists of 4,096 of these chips. Each chip has a peak performance of 275 teraflops and each pod promises a combined compute power of up to 1.1 exaflops of compute power. Google now operates a full cluster of eight of these pods in its Oklahoma data center with up to 9 exaflops of peak aggregate performance. Google believes this makes this “the world’s largest publicly available ML hub in terms of cumulative computing power, while operating at 90% carbon-free energy.”

“We have done extensive research to compare ML clusters that are publicly disclosed and publicly available (meaning – running on Cloud and available for external users),” a Google spokesperson told me when I asked the company to clarify its benchmark. “Those clusters are powered by supercomputers that have ML capabilities (meaning that they are well-suited for ML workloads such as NLP, recommendation models etc. The supercomputers are built using ML hardware — e.g. GPUs (graphic processing units) — as well as CPU and memory. With 9 exaflops, we believe we have the largest publicly available ML cluster.”

At I/O 2021, Google’s CEO Sundar Pichai said that the company would soon have “dozens of TPU v4 pods in our data centers, many of which will be operating at or near 90% carbon-free energy. And our TPUv4 pods will be available to our cloud customers later this year.” Clearly, that took a bit longer than planned, but we are in the middle of a global chip shortage and these are, after all, custom chips.

Ahead of today’s announcement, Google worked with researchers to give them access to these pods. “Researchers liked the performance and scalability that TPU v4 provides with its fast interconnect and optimized software stack, the ability to set up their own interactive development environment with our new TPU VM architecture, and the flexibility to use their preferred frameworks, including JAX, PyTorch, or TensorFlow,” Google writes in today’s announcement. No surprise there. Who doesn’t like faster machine learning hardware?

Google says users will be able to slice and dice the new cloud TPU v4 cluster and its pods to meet their needs, whether that’s access to four chips (which is the minimum for a TPU virtual machine) or thousands — but also not too many, either, because there are only so many chips to go around.

As of now, these pods are only available in Oklahoma. “We have run an extensive analysis of various locations and determined that Oklahoma, with its exceptional carbon-free energy supply, is the best place to host such a cluster. Our customers can access it from almost anywhere,” a spokesperson explained.

Gensyn applies a token to distributed computing for AI developers, raises $6.5M

For self-driving cars and other applications developed using AI, you need what’s known as ‘deep learning’, the core concepts of which emerged in the ‘50s. This requires training models based on similar patterns as seen in the human brain. This, in turn, requires a large amount of compute power, as afforded by TPUs (Tensor Processing Units) or GPUs (Graphics Processing Units) running for lengthy periods. However, cost of this compute power is out of reach of most AI developers, who largely rent it from cloud computing platforms such as AWS or Azure. What is to be done?

Well, one approach is that taken by UK startup Gensyn. It’s taken the idea of the distributed computing power of older projects such as SETI@home and the COVID-19 focussed Folding@home and applied it in the direction of this desire for deep learning amongst AI developers.  The result is a way to get high-performance compute power from a distributed network of computers.

Gensyn has now raised a $6.5 million seed led by Eden Block, a Web3 VC. Also participating in the round is Galaxy Digital, Maven 11, Coinfund, Hypersphere, Zee Prime and founders from some blockchain protocols. This adds to a previously unannounced pre-seed investment of $1.1m in 2021 – led by 7percent Ventures and Counterview Capital, with participation from Entrepreneur First and id4 Ventures.
 
In a statement, Harry Grieve, co-founder of Gensyn, said: “The ballooning demand for hardware – and fat margins – is why the usual names like AWS and Azure have fought to command such high market share. The result is a market that is expensive and centralized…. We designed a better way – superior on price, with unlimited scalability, and no gatekeepers.” 

To achieve this, Gensyn says it will launch its decentralized compute network for training AI models. This network uses a blockchain to verify that the deep learning tasks have been performed correctly, triggering payments via a token. This then monetizes unused compute power in a verifiable manner. Gensyn also claims it’s a more environmentally conscious solution, because this compute power would otherwise go unused.

Lior Messika, managing partner at Eden Block, commented: “Gensyn’s goal of truly democratizing compute with decentralized technology is perhaps the most ambitious endeavor we’ve come across… The team aims to positively disrupt one of the largest and fastest-growing markets in the world, by drastically reducing the costs and friction associated with training neural networks at scale.” 

Over a call with me Grieve added: “Our estimate is that it’s up to 80% cheaper in the average price per unit or the kind of standard Nividia GPU, or 45 cents an hour, compared to about two bucks an hour for other cloud services.”

Google launches TensorFlow Enterprise with long-term support and managed services

Google open-sourced its TensorFlow machine learning framework back in 2015 and it quickly became one of the most popular platforms of its kind. Enterprises that wanted to use it, however, had to either work with third parties or do it themselves. To help these companies — and capture some of this lucrative market itself — Google is launching TensorFlow Enterprise, which includes hands-on, enterprise-grade support and optimized managed services on Google Cloud.

One of the most important features of TensorFlow Enterprise is that it will offer long-term support. For some versions of the framework, Google will offer patches for up to three years. For what looks to be an additional fee, Google will also offer engineering assistance from its Google Cloud and TensorFlow teams to companies that are building AI models.

All of this, of course, is deeply integrated with Google’s own cloud services. “Because Google created and open-sourced TensorFlow, Google Cloud is uniquely positioned to offer support and insights directly from the TensorFlow team itself,” the company writes in today’s announcement. “Combined with our deep expertise in AI and machine learning, this makes TensorFlow Enterprise the best way to run TensorFlow.”

Google also includes Deep Learning VMs and Deep Learning Containers to make getting started with TensorFlow easier and the company has optimized the enterprise version for Nvidia GPUs and Google’s own Cloud TPUs.

Today’s launch is yet another example of Google Cloud’s focus on enterprises, a move the company accelerated when it hired Thomas Kurian to run the Cloud businesses. After years of mostly ignoring the enterprise, the company is now clearly looking at what enterprises are struggling with and how it can adapt its products for them.

Google brings in BERT to improve its search results

Google today announced one of the biggest updates to its search algorithm in recent years. By using new neural networking techniques to better understand the intentions behind queries, Google says it can now offer more relevant results for about one in ten searches in the U.S. in English (with support for other languages and locales coming later). For featured snippets, the update is already live globally.

In the world of search updates, where algorithm changes are often far more subtle, an update that affects 10 percent of searches is a pretty big deal (and will surely keep the world’s SEO experts up at night).

Google notes that this update will work best for longer, more conversational queries — and in many ways, that’s how Google would really like you to search these days because it’s easier to interpret a full sentence than a sequence of keywords.

2019 10 25 0945 1

The technology behind this new neural network is called “Bidirectional Encoder Representations from Transformers,” or BERT. Google first talked about BERT last year and open-sourced the code for its implementation and pre-trained models. Transformers are one of the more recent developments in machine learning. They work especially well for data where the sequents of elements is important, which obviously makes them a useful tool for working with natural language and, hence, search queries.

This BERT update also marks the first time Google is using its latest Tensor Processing Unit (TPU) chips to serve search results.

Ideally, this means that Google Search is now better able to understand exactly what you are looking for and provide more relevant search results and featured snippets. The update started rolling out this week, so chances are you are already seeing some of its effects in your search results.

 

Google is making a fast specialized TPU chip for edge devices and a suite of services to support it

In a pretty substantial move into trying to own the entire AI stack, Google today announced that it will be rolling out a version of its Tensor Processing Unit — a custom chip optimized for its machine learning framework TensorFlow — optimized for inference in edge devices.

That’s a bit of a word salad to unpack, but here’s the end result: Google is looking to have a complete suite of customized hardware for developers looking to build products around machine learning, such as image or speech recognition, that it owns from the device all the way through to the server. Google will have the cloud TPU (the third version of which will soon roll out) to handle training models for various machine learning-driven tasks, and then run the inference from that model on a specialized chip that runs a lighter version of TensorFlow that doesn’t consume as much power. Google is exploiting an opportunity to split the process of inference and machine training into two different sets of hardware and dramatically reduce the footprint required in a device that’s actually capturing the data. That would result in faster processing, less power consumption, and potentially more importantly, a dramatically smaller surface area for the actual chip.

Google is also rolling out a new set of services to compile TensorFlow (Google’s machine learning development framework) into a lighter-weight version that can run on edge devices without having to call the server for those operations. That, again, reduces the latency and could have any number of results, from safety (in autonomous vehicles) to just a better user experience (voice recognition). As competition heats up in the chip space, both from the larger companies and from the emerging class of startups, nailing these use cases is going to be really important for larger companies. That’s especially true for Google as well, which also wants to own the actual development framework in a world where there are multiple options like Caffe2 and PyTorch.

Google will be releasing the chip on a kind of modular board not so dissimilar to the Raspberry Pi, which will get it into the hands of developers that can tinker and build unique use cases. But more importantly, it’ll help entice developers who are already working with TensorFlow as their primary machine learning framework with the idea of a chip that’ll run those models even faster and more efficiently. That could open the door to new use cases and ideas, and should it be successful, will lock those developers further into Google’s cloud ecosystem on both the hardware (the TPU) and framework (TensorFlow) level. While Amazon owns most of the stack for cloud computing (with Azure being the other largest player), it looks like Google is looking to own the whole AI stack – and not just offer on-demand GPUs as a stopgap to keep developers operating within that ecosystem.

Thanks to the proliferation of GPUs, machine learning has become increasingly common across a variety of use cases, which doesn’t just require the horsepower to train a model to identify what a cat looks like. It also needs the ability to take in an image and quickly identify that said four-legged animal is a cat based on the model it’s trained with tens of thousands (or more) images of what a cat is. GPUs were great for both use cases, but it’s clear that better hardware is necessary with the emergence of use cases like autonomous driving, photo recognition on cameras, or a variety of others — for which even millisecond-level lag is too much and power consumption, or surface area, is a dramatic limiting factor.

The edge-specialized TPU is an ASIC chip, a breed of chip architecture that’s increasingly popular for specific use cases like mining for cryptocurrency (such as larger companies like Bitmain). The chips excel at doing specific things really well, and it’s opened up an opportunity to tap various niches, such as mining cryptocurrency, with specific chips that are optimized for those calculations. These kinds of edge-focused chips tend to do a lot of low-precision calculations very fast, making the whole process of juggling runs between memory and the actual core significantly less complicated and consuming less power as a result.

While Google’s entry into this arena has long been a whisper in the Valley, this is a stake in the ground for the company that it wants to own everything from the hardware all the way up to the end user experience, passing through the development layer and others on the way there. It might not necessarily alter the calculus of the ecosystem, as even though it’s on a development board to create a playground for developers, Google still has to make an effort to get the hardware designed into other pieces of hardware and not just its own if it wants to rule the ecosystem. That’s easier said than done, even for a juggernaut like Google, but it is a big salvo from the company that could have rather significant ramifications down the line as every big company races to create its own custom hardware stack that’s specialized for its own needs.