Google Colaboratory launches a pay-as-you-go option, premium GPU access

Google Colaboratory (Colab for short), Google’s service designed to allow anyone to write and execute arbitrary Python code through a web browser, is introducing a pay-as-a-you-go plan. In its first pricing change since Google launched premium Colab plans in 2020, Colab will now give users the option to purchase additional compute time in Colab with or without a paid subscription.

Google says that the update won’t affect the free-of-charge Colab tier, which remains in its current form. The only material change is that users can buy access to compute in the form of “compute units,” starting at $9.99 for 100 units or $49.99 for 500.

As Google Colab product lead Chris Perry explains in a blog post:

Paid users now have the flexibility to exhaust compute quota, measured in compute units, at whatever rate they choose. As compute units are exhausted, a user can choose to purchase more with pay-as-you-go at their discretion. Once a user has exhausted their compute units their Colab usage quota will revert to our free of charge tier limits.

In tandem with the pay-as-you-go rollout, Google announced that paid Colab users can now choose between standard or “premium” GPUs in Colab — the latter typically being Nvidia V100 or A100 Tensor Core GPUs. (Standard GPUs in Colab are usually Nvidia T4 Tensor Core GPUs.) However, the company notes that getting a specific GPU chip type assignment isn’t guaranteed and depends on a number of factors, including availability and a user’s paid balance with Colab.

It goes without saying, but premium GPUs will also deplete Colab compute units faster than the standard GPUs.

Google began telegraphing the rollout of pay-as-you-go options in Colab several weeks ago, when it notified Colab users via email that it was adopting the aforementioned compute units system for subscribers. It framed the shift as a move toward transparency, allowing user to “have more control over how and when [they] use Colab.”

Some perceived the move as user-hostile — an attempt to charge more for or clamp down on Colab usage. But in a statement to TechCrunch, a Google spokesperson pointed out that limits have always applied to all tiers of Colab usage paid plans.

“[T]hese updates are meant to give users more visibility into … limits,” the spokesperson said via email. “Colab will continue supporting its free of charge tier, including basic GPU access.”

The sensitivity around pricing changes reflects how much Colab has grown since it spun out from an internal Google Research project in late 2017. The platform has become the de facto digital breadboard for demos within the AI research community — it’s not uncommon for researchers who’ve written code to include links to Colab pages on or alongside the GitHub repositories hosting the code.

Google Colaboratory launches a pay-as-you-go option, premium GPU access by Kyle Wiggers originally published on TechCrunch

Meta’s Make-A-Video AI achieves a new, nightmarish state of the art

Meta’s researchers have made a significant leap in the AI art generation field with Make-A-Video, the creatively named new technique for — you guessed it — making a video out of nothing but a text prompt. The results are impressive and varied, and all, with no exceptions, slightly creepy.

We’ve seen text-to-video models before — it’s a natural extension of text-to-image models like DALL-E, which output stills from prompts. But while the conceptual jump from still image to moving one is small for a human mind, it’s far from trivial to implement in a machine learning model.

Make-A-Video doesn’t actually change the game that much on the back end — as the researchers note in the paper describing it, “a model that has only seen text describing images is surprisingly effective at generating short videos.”

The AI uses the existing and effective diffusion technique for creating images, which essentially works backwards from pure visual static, “denoising” towards the target prompt. What’s added here is that the model was also given unsupervised training (that is to say, it examined the data itself with no strong guidance from humans) on a bunch of unlabeled video content.

What it knows from the first is how to make a realistic image; what it knows from the second is what sequential frames of a video look like. Amazingly, it is able to put these together very effectively with no particular training on how they should be combined.

“In all aspects, spatial and temporal resolution, faithfulness to text, and quality, Make-A-Video sets the new state-of-the-art in text-to-video generation, as determined by both qualitative and quantitative measures,” write the researchers.

It’s hard not to agree. Previous text-to-video systems used a different approach and the results were unimpressive but promising. Now Make-A-Video blows them out of the water, achieving fidelity in line with images from perhaps 18 months ago in original DALL-E or other past generation systems.

But it must be said: there’s definitely still something off about them. Not that we should expect photorealism or perfectly natural motion, but the results all have a sort of… well, there’s no other word for it: they’re a bit nightmarish, aren’t they?

Image Credits: Meta

Image Credits: Meta

There’s just some awful quality to them that is both dreamlike and terrible. The quality of the motion is strange, as if it’s a stop-motion movie. The corruption and artifacts give each piece a furry, surreal feel, like the objects are leaking. People blend into one another — there’s no understanding of objects’ boundaries or what something should terminate in or contact.

Image Credits: Meta

Image Credits: Meta

I don’t say all this as some kind of AI snob who only wants the best high-definition realistic imagery. I just think it’s fascinating that however realistic these videos are in one sense, they’re all so bizarre and off-putting in others. That they can be generated quickly and arbitrarily is incredible — and it will only get better. But even the best image generators still have that surreal quality that’s hard to put your finger on.

Make-A-Video also allows for transforming still images and other videos into variants or extensions thereof, much like how image generators can also be prompted with images themselves. The results are slightly less disturbing.

This really is a huge step up from what existed before, and the team is to be congratulated. It’s not available to the public just yet, but you can sign up here to get on the list for whatever form of access they decide on later.

Meta’s Make-A-Video AI achieves a new, nightmarish state of the art by Devin Coldewey originally published on TechCrunch

Google Search will soon begin translating local press coverage

At a Google Search-focused event this morning, Google announced that it will soon introduce ways to translate news coverage directly from Search. Starting in 2023, English users, for example, will be able to search and see translated links to news results from Spanish-language publishers in countries such as Mexico, in addition to links to articles written in their preferred language.

“Say you wanted to learn about how people in Mexico were impacted by the more than 7 magnitude earthquake earlier this month,” Google News product manager Itamar Snir and Google Search product manager Lauren Clark explained in a blog post. “With this feature, you’ll be able to search and see translated headlines for news results from publishers in Mexico, in addition to ones written in your preferred language. You’ll be able to read authoritative reporting from journalists in the country, giving you a unique perspective of what’s happening there.”

Google News translations

Image Credits: Google

Building off its earlier translation work, the feature will translate headlines and articles in French, German and Spanish into English to start on mobile and desktop.

Google has experimented with news translation before, three years ago adding the ability to display content in two languages together within the Google News app feed. But for the most part, the search giant has left it to users to translate content via tools like Chrome’s translate button and Google Translate. Presumably, should the Google Search news translation feature be well received, that’ll change for more languages in the future.

read more about Google Search On 2022 on TechCrunch

Google Search will soon begin translating local press coverage by Kyle Wiggers originally published on TechCrunch

CoRise’s approach to up-skilling involves fewer courses and more access

Despite the boom of education technology investment and innovation over the past few years, founder Julia Stiglitz, who broke into the edtech world as an early Coursera employee, thinks there’s a lot of room to grow. Her new startup, CoRise, sells expert-led programming to people who want to up-skill their careers. It’s a fresh play in a crowded sector, with heavyweights including Udemy, Udacity, Guild Education and, well, her her former employer.

“We haven’t solved the problems yet, and in fact, they’re growing,” Stiglitz said in an interview with TechCrunch. The edtech veteran is right: the next-generation of edtech is still looking for ways to balance motivation and behavior change, offered at an accessible price point in a scalable format. There’s an inherent tradeoff between engagement and scale – an elephant that even the unicorns have not entirely been able to avoid.

Enter CoRise, which wants to do it all. The startup, built by Stiglitz, Sourabh Bajaj, and Jacob Samuelson, pairs students who want to learn and improve on highly technical skills, such as devops or data science, with experts. CoRise defines experts as leaders at tech companies; advertised instructors include a data engineering manager at Drizly, former CTO at Wikimedia, director of machine learning at ShareChat, for example. Some classes, like this SQL crash course, are even taught by CoRise employees.

As far as early users go, it’s not going for the solopreneur who wants to break into tech. Instead, CoRise is selling to enterprises in need of more tailored solutions for their talent. In talking to learning and development leaders, the founder learned that organizations are either rolling out asynchronous education platforms to the entire staff, or bringing in consultants to do customer training; “there sort of wasn’t anything in between,” she said, so she built it.

Stiglitz doesn’t want CoRise to scale to a place where it hosts 20,000 courses taught by thousands of instructors. Instead, the startup wants to offer one applied machine learning course that teaches 1,000 or 5,000 students at a time.

By focusing on bigger cohorts, CoRise is taking a different approach than some of its competitors. Udemy founder Gagan Biyani, for example, is working on Maven, which offers expert-led programming that divides people up into small groups to nurture collaboration and the exchange of ideas. Stiglitz, meanwhile, thinks that smaller cohorts drive up the expense of the program. Standardized courses with bigger classes is the only way to get programming to “be really accessible”, in her view.

Single course access costs an average of $400, and students can buy an all-access pass to every cohort for around $1,000, she adds. For comparison, a single course on Maven – perhaps this one on founder finance – can cost $2,000.

“We’re trying to figure out how you get outcomes or results for learners at this scale, and still make it really accessible, still have instructors make solid revenue on it,” she said. “We need to figure out how to have lots of people in a cohort and still have a great experience.”

The challenge of big classes and standardized courses, of course, is the lacking of personalization. CoRise created a “nudging infrastructure” that looks at how an individual student is interacting with a course, associated lectures and due assignments. It also looks at things like, if the student has gone to office hours, or if they have submitted their work in time.

The back-end information helps CoRise then sends out an automated “nudge” or push notification to someone who needs a reminder to seek additional support. The course manager also follows up with a human response so students don’t feel like it’s all robots and automatic messages, the founder explained.

Over time, CoRise can get smarter on how to support students who are struggling before they even show up to office hours, a big vision shared among the personalized learning movement.

“A lot of what we’re trying to figure out is like what needs to be human to retain that motivational element? And then what can we scale up on the backend in order to drive scale and keep costs down to make a reasonable price,” she said. Stiglitz says that the average completion rate of the course is 78%. The startup’s nudge framework is certainly compelling, but is only one step toward a more customized and engaging experience for learners. And while low costs certainly matter – a lot – there can be a race to the bottom if other competitors also seek to drive price down to win over customers.

While the startup didn’t disclose the number of learners who have gone through its platform, it did say that they come from over 500 companies including Spotify, Walmart and Lyft. It has a 68 NPS score.

The startup has raised millions to better figure out the above. To date, CoRise tells TechCrunch that it has raised $8.5 million from Greylock, GSV and Cowboy Ventures since launch, with $5.5 million in its first-check and the following $3 million given recent traction. Other investors include Greg Brockman, co-founder Open AI, and Mustafa Suleyman, co-founder DeepMind.

My last question for Stiglitz was an annoying one: how does her focus on less classes and instructors sit with her investors? Wouldn’t they want her to always be launching new classes?

“The pressure is going to be scale, scale, scale, but it’s going to be scale, scale, scale, within the class,” she said. “We’re targeting large companies who want to roll out SQL training to 1,000 people, but they’re not going to want to roll out eight different versions of that class. That’s how we get scale.”

Image Credits: CoRise

CoRise’s approach to up-skilling involves fewer courses and more access by Natasha Mascarenhas originally published on TechCrunch

OpenAI removes the waitlist for DALL-E 2, allowing anyone to sign up

Several months after launching DALL-E 2 as a part of a limited beta, OpenAI today removed the waitlist for the AI-powered image-generating system (which remains in beta), allowing anyone to sign up and begin using it. Pricing will remain the same, with first-time users getting a finite amount of credits that can be put toward generating or editing an image or creating a variation of existing images.

“More than 1.5 million users are now actively creating over 2 million images a day with DALL-E — from artists and creative directors to authors and architects — with about 100,000 users sharing their creations and feedback in our Discord community,” OpenAI wrote in a blog post. “Learning from real-world use has allowed us to improve our safety systems, making wider availability possible today.”

OpenAI has yet to make DALL-E 2 available through an API, though the company notes in the blog post that one is in testing. Brands such as Stitch Fix, Nestlé and Heinz have piloted DALL-E 2 for ad campaigns and other commercial use cases, but so far only in an ad hoc fashion.

As we’ve previously written about, OpenAI’s conservative release cycle appears intended to subvert the controversy growing around Stability AI’s Stable Diffusion, an image-generating system that’s deployable in an open source format without any restrictions. Stable Diffusion ships with optional safety mechanisms. But the system has been used by some to create objectionable content, like graphic violence and pornographic, nonconsensual celebrity deepfakes.

Stability AI — which already offers a Stable Diffusion API, albeit with restrictions on certain content categories — was the subject of a critical recent letter from U.S. House Representative Anna G. Eshoo (D-CA) to the National Security Advisor (NSA) and the Office of Science and Technology Policy (OSTP). In it, she urged the NSA and OSTP to address the release of “unsafe AI models” that “do not moderate content made on their platforms.”

Heinz DALL-E 2

Heinz bottles as “imagined” by DALL-E 2. Image Credits: Heinz

“I am an advocate for democratizing access to AI and believe we should not allow those who openly release unsafe models onto the internet to benefit from their carelessness,” Eshoo wrote. “Dual-use tools that can lead to real-world harms like the generation of child pornography, misinformation and disinformation should be governed appropriately.”

Indeed, as they march toward ubiquity, countless ethical and legal questions surround systems like DALL-E 2, Midjourney and Stable Diffusion. Earlier this month, Getty Images banned the upload and sale of illustrations generated using DALL-E 2, Stable Diffusion and other such tools, following similar decisions by sites including Newgrounds, PurplePort and FurAffinity. Getty Images CEO Craig Peters told The Verge that the ban was prompted by concerns about “unaddressed right issues,” as the training datasets for systems like DALL-E 2 contain copyrighted images scraped from the web.

The training data presents a privacy risk as well, as an Ars Technica report last week highlighted. Private medical records — possibly thousands — are among the many photos hidden within the dataset used to train Stable Diffusion, according to the piece. Removing these records is exceptionally difficult as LAION isn’t a collection of files itself but merely a set of URLs pointing to images on the web.

In response, technologists like Mat Dryhurst and Holly Herndon are spearheading efforts such as Source+, a standard aiming to allow people to disallow their work or likeness to be used for AI training purposes. But these standards are — and will likely remain — voluntary, limiting their potential impact.

DALL-E 2 Eric Silberstein

Experiments with DALL-E 2 for different product visualizations. Image Credits: Eric Silberstein

OpenAI has repeatedly claimed to have taken steps to mitigate issues around DALL-E 2, including rejecting image uploads containing realistic faces and attempts to create the likeness of public figures, like prominent political figures and celebrities. The company also says it trained DALL-E 2 on a dataset filtered to remove images that contained obvious violent, sexual or hateful content. And OpenAI says it employs a mix of automated and human monitoring systems to prevent the system from generating content that violates its terms of service.

“In the past months, we have made our filters more robust at rejecting attempts to generate sexual, violent and other content that violates our content policy, and building new detection and response techniques to stop misuse,” the company wrote in the blog post published today. “Responsibly scaling a system as powerful and complex as DALL-E — while learning about all the creative ways it can be used and misused — has required an iterative deployment approach.”

OpenAI removes the waitlist for DALL-E 2, allowing anyone to sign up by Kyle Wiggers originally published on TechCrunch

Arthur.ai machine learning monitoring gathers steam with $42M investment

It’s widely understood that after machine learning models are deployed in production, the accuracy of the results can deteriorate over time. Arthur.ai launched in 2019 with the goal of helping companies monitor their models to ensure they stayed true to their goals. Since then, the company has also added explainability and bias mitigation to the array of services.

The tooling has been resonating in the market, and today the startup announced a hefty $42 million Series B. Company co-founder Adam Wenchel told TechCrunch it’s the largest round ever given to a machine learning monitoring startup.

Accuracy also means guarding against bias, and that’s something the company has been working on since we last spoke to them at the time of its $15 million Series A.

“We’ve worked a lot on the bias side of things. It’s becoming a lot more top of mind for people, like how do you keep these models from being discriminatory? And so we’ve done a lot of novel IP development around how do you automatically adjust the outputs of these models so that they meet whatever fairness constraints the customers want to achieve,” Wenchel said.

Explainability, as the name suggests, is understanding why you got the results you did. Wenchel uses the example of having high blood pressure, which could be from diet or other controllable factor, or it could be from a hereditary factor, you have no control over and might require medication to bring down. Understanding that there isn’t a one-size-fits-all answer is important can help prevent over generalizing what the machine learning model is telling you.

He said he definitely noticed a difference in raising this year versus the last time. “We had to meet with a dozen different investors to get those multiple term sheets as opposed to the frothy environment of 2020 when there were people who were calling every five minutes asking, are you ready? Are you ready? Are you ready yet? But it all worked out well for us,” he said.

Perhaps the company’s growth is one of the reasons for investor interest. The startup has averaged 58% ARR growth over the last four quarters, which looks even better when you consider the economic ups and downs we’ve been experiencing over the last couple of years.

The company has 55 employees today, up from 17 at the time of its Series A, and Wenchel says that diversity remains a company goal, one that they’ve been working on, both at the cap table level and at the employee level.

He says it’s particularly important in the research area, where having a diverse workforce can help prevent bias from creeping into their software. “We’ve published a number of papers and that team in particular is incredibly diverse, and I think a much better team for it,” he said.

Today’s round was led by Acrew Capital and Greycroft. The cap table includes Theresia Gouw from Acrew and Ashley Mayer from Coalition Operators. Gow will join the board under the terms of the funding.

Arthur.ai machine learning monitoring gathers steam with $42M investment by Ron Miller originally published on TechCrunch

Kumo aims to bring predictive AI to the enterprise with $18M in fresh capital

Kumo, a startup offering an AI-powered platform to tackle predictive problems in business, today announced that it raised $18 million in a Series B round led by Sequoia, with participation from A Capital, SV Angel and several angel investors. Co-founder and CEO Vanja Josifovski says the new funding will be put toward Kumo’s hiring efforts and R&D across the startup’s platform and services, which include data prep, data analytics and model management.

Kumo’s platform works specifically with graph neural networks, a class of AI system for processing data that can be represented as a series of graphs. Graphs in this context refer to mathematical constructs made up of vertices (also called nodes) that are connected by edges (or lines). Graphs can be used to model relations and processes in social, IT and even biological systems. For example, the link structure of a website can be represented by a graph where the vertices stand in for webpages and the edges represent links from one page to another.

Graph neural networks have powerful predictive capabilities. At Pinterest and LinkedIn, they’re used to recommend posts, people and more to hundreds of millions of active users. But as Josifovski notes, they’re computationally expensive to run — making them cost-prohibitive for most companies.

“Many enterprises today attempting to experiment with graph neural networks have been unable to scale beyond training data sets that fit in a single accelerator (memory in a single GPU), dramatically limiting their ability to take advantage of these emerging algorithmic approaches,” he told TechCrunch in an email interview. “Through fundamental infrastructural and algorithmic advancements, we have been able to scale to datasets in the many terabytes, allowing graph neural networks to be applied to customers with larger and more complicated enterprise graphs, such as social networks and multi-sided marketplaces.”

Using Kumo, customers can connect data sources to create a graph neural network that can then be queried in structured query language (SQL). Under the hood, the platform automatically trains the neural network system, evaluating it for accuracy and readying it for deployment to production.

Josifovski says that Kumo can be used for applications like new customer acquisition, customer loyalty and retention, personalization and next best action, abuse detection and financial crime detection. Previously the CTO of Pinterest and Airbnb Homes, Josifovski worked with Kumo’s other co-founders, former Pinterest chief scientist Jure Leskovec and Hema Raghavan, to develop the graph technology through Stanford and Dortmund University research labs.

“Companies spend millions of dollars storing terabytes of data but are able to effectively leverage only a fraction of it to generate the predictions they need to power forward-looking business decisions. The reason for this is major data science capacity gaps as well as the massive time and effort required to get predictions successfully into production,” Josifovski said. “We enable companies to move to a paradigm in which predictive analytics goes from being a scarce resource used sparingly into one in which it is as easy as writing a SQL query, thus enabling predictions to basically become ubiquitous — far more broadly adapted in use cases across the enterprise in a much shorter timeframe.”

Kumo remains in the pilot stage, but Josifovski says that it has “more than a dozen” early adopters in the enterprise. To date, the startup has raised $37 million in capital.

Kumo aims to bring predictive AI to the enterprise with $18M in fresh capital by Kyle Wiggers originally published on TechCrunch

AI is taking over the iconic voice of Darth Vader, with the blessing of James Earl Jones

From the cringe-inducing Jar Jar Binks to unconvincing virtual Leia and Luke, Disney’s history with CG characters is, shall we say, mixed. But that’s not stopping them from replacing one of the most recognizable voices in cinema history, Darth Vader, with an AI-powered voice replica based on James Earl Jones.

The retirement of Jones, now 91, from the role, is of course well-earned. But if Disney continues to have its way (and there is no force in the world that can stop it), Vader is far from done. It would be unthinkable to recast the character, but if Jones is done, what can they do?

The solution is Respeecher, a Ukrainian company that trains text-to-speech machine learning models with the (licensed and released) recordings of actors who, for whatever reason, will no longer play a part.

Vanity Fair just ran a great story on how the company managed to put together the Vader replacement voice for Disney’s “Obi-Wan Kenobi” — while the country was being invaded by Russia. Interesting enough, but others noted that it serves as confirmation that the iconic voice of Vader would officially from now on be rendered by AI.

This is far from the first case where a well-known actor has had their voice synthesized or altered in this way. Another notable recent example is “Top Gun: Maverick,” in which the voice of Val Kilmer (reprising his role as Iceman) was synthesized due to the actor’s medical condition.

That sounded good, but a handful of whispered lines aren’t quite the same as a 1:1 replacement for a voice even children have known (and feared) for decades. Can a small company working at the cutting edge of machine learning tech pull it off?

You can judge for yourself — here’s one compilation of clips — and to me it seems pretty solid. The main criticism of that show wasn’t Vader’s voice, that’s for sure. If you weren’t expecting anything, you would probably just assume it was Jones speaking the lines, not another actor’s voice being modified to fit the bill.

The giveaway is that it doesn’t actually sound like Jones does now — it sounds like he did in the ’70s and ’80s when the original trilogy came out. That’s what anyone seeing Obi-Wan and Vader fight will expect, probably, but it’s a bit strange to think about.

It opens up a whole new can of worms. Sure, an actor may license their voice work for a character, but what about when that character ages? What about a totally different character they voice, but that there is some similarity to? What recourse do they have if their voice synthesis files leak and people are using it willy-nilly?

It’s an interesting new field to work in, but it’s hardly without pitfalls and ethical conundra. Disney has already broken the seal on many transformative technologies in filmmaking and television, and borne the deserved criticism when what it put out did not meet audiences’ expectations.

But they can take the hits and roll with them — maybe even take a page from George Lucas’s book and try to rewrite history, improving the rendering of Grand Moff Tarkin in a bid to make us forget how waxy he looked originally. As long as the technology is used to advance and complement the creativity of writers, directors and everyone else who makes movies magic, and not to save a buck or escape tricky rights situations, I can get behind it.

AI is taking over the iconic voice of Darth Vader, with the blessing of James Earl Jones by Devin Coldewey originally published on TechCrunch

Perceptron: Multilingual, laughing, Pitfall-playing and streetwise AI

Research in the field of machine learning and AI, now a key technology in practically every industry and company, is far too voluminous for anyone to read it all. This column, Perceptron, aims to collect some of the most relevant recent discoveries and papers — particularly in, but not limited to, artificial intelligence — and explain why they matter.

Over the past few weeks, researchers at Google have demoed an AI system, PaLI, that can perform many tasks in over 100 languages. Elsewhere, a Berlin-based group launched a project called Source+ that’s designed as a way of allowing artists, including visual artists, musicians and writers, to opt into — and out of — allowing their work being used as training data for AI.

AI systems like OpenAI’s GPT-3 can generate fairly sensical text, or summarize existing text from the web, ebooks and other sources of information. But they’re historically been limited to a single language, limiting both their usefulness and reach.

Fortunately, in recent months, research into multilingual systems has accelerated — driven partly by community efforts like Hugging Face’s Bloom. In an attempt to leverage these advances in multilinguality, a Google team created PaLI, which was trained on both images and text to perform tasks like image captioning, object detection and optical character recognition.

Google PaLI

Image Credits: Google

Google claims that PaLI can understand 109 languages and the relationships between words in those languages and images, enabling it to — for example — caption a picture of a postcard in French. While the work remains firmly in the research phases, the creators say that it illustrates the important interplay between language and images — and could establish a foundation for a commercial product down the line.

Speech is another aspect of language that AI is constantly improving in. Play.ht recently showed off a new text-to-speech model that puts a remarkable amount of emotion and range into its results. The clips it posted last week sound fantastic, though they are of course cherry-picked.

We generated a clip of our own using the intro to this article, and the results are still solid:

Exactly what this type of voice generation will be most useful for is still unclear. We’re not quite at the stage where they do whole books — or rather, they can, but it may not be anyone’s first choice yet. But as the quality rises, the applications multiply.

Mat Dryhurst and Holly Herndon — an academic and musician, respectively — have partnered with the organization Spawning to launch Source+, a standard they hope will bring attention to the issue of photo-generating AI systems created using artwork from artists who weren’t informed or asked permission. Source+, which doesn’t cost anything, aims to allow artists to disallow their work to be used for AI training purposes if they choose.

Image-generating systems like Stable Diffusion and DALL-E 2 were trained on billions of images scraped from the web to “learn” how to translate text prompts into art. Some of these images came from public art communities like ArtStation and DeviantArt — not necessarily with artists’ knowledge — and imbued the systems with the ability to mimic particular creators, including artists like Greg Rutowski.

Stability AI Stable Diffusion

Samples from Stable Diffusion.

Because of the systems’ knack for imitating art styles, some creators fear that they could threaten livelihoods. Source+ — while voluntary — could be a step toward giving artists greater say in how their art’s used, Dryhurst and Herndon say — assuming it’s adopted at scale (a big if).

Over at DeepMind, a research team is attempting to solve another longstanding problematic aspect of AI: its tendency to spew toxic and misleading information. Focusing on text, the team developed a chatbot called Sparrow that can answer common questions by searching the web using Google. Other cutting-edge systems like Google’s LaMDA can do the same, but DeepMind claims that Sparrow provides plausible, non-toxic answers to questions more often than its counterparts.

The trick was aligning the system with people’s expectations of it. DeepMind recruited people to use Sparrow and then had them provide feedback to train a model of how useful the answers were, showing participants multiple answers to the same question and asking them which answer they liked the most. The researchers also defined rules for Sparrow such as “don’t make threatening statements” and “don’t make hateful or insulting comments,” which they had participants impose on the system by trying to trick it into breaking the rules.

Example of DeepMind’s sparrow having a conversation.

DeepMind acknowledges that Sparrow has room for improvement. But in a study, the team found the chatbot provided a “plausible” answer supported with evidence 78% of the time when asked a factual question and only broke the aforementioned rules 8% of the time. That’s better than DeepMind’s original dialogue system, the researchers note, which broke the rules roughly three times more often when tricked into doing so.

A separate team at DeepMind tackled a very different domain recently: video games that historically have been tough for AI to master quickly. Their system, cheekily called MEME, reportedly achieved “human-level” performance on 57 different Atari games 200 times faster than the previous best system.

According to DeepMind’s paper detailing MEME, the system can learn to play games by observing roughly 390 million frames — “frames” referring to the still images that refresh very quickly to give the impression of motion. That might sound like a lot, but the previous state-of-the-art technique required 80 billion frames across the same number of Atari games.

DeepMind MEME

Image Credits: DeepMind

Deftly playing Atari might not sound like a desirable skill. And indeed, some critics argue games are a flawed AI benchmark because of their abstractness and relative simplicity. But research labs like DeepMind believe the approaches could be applied to other, more useful areas in the future, like robots that more efficiently learn to perform tasks by watching videos or self-improving, self-driving cars.

Nvidia had a field day on the 20th announcing dozens of products and services, among them several interesting AI efforts. Self-driving cars are one of the company’s foci, both powering the AI and training it. For the latter, simulators are crucial and it is likewise important that the virtual roads resemble real ones. They describe a new, improved content flow that accelerates bringing data collected by cameras and sensors on real cars into the digital realm.

A simulation environment built on real-world data.

Things like real-world vehicles and irregularities in the road or tree cover can be accurately reproduced, so the self-driving AI doesn’t learn in a sanitized version of the street. And it makes it possible to create larger and more variable simulation settings in general, which aids robustness. (Another image of it is up top.)

Nvidia also introduced its IGX system for autonomous platforms in industrial situations — human-machine collaboration like you might find on a factory floor. There’s no shortage of these, of course, but as the complexity of tasks and operating environments increases, the old methods don’t cut it any more and companies looking to improve their automation are looking at future-proofing.

Example of computer vision classifying objects and people on a factory floor.

“Proactive” and “predictive” safety are what IGX is intended to help with, which is to say catching safety issues before they cause outages or injuries. A bot may have its own emergency stop mechanism, but if a camera monitoring the area could tell it to divert before a forklift gets in its way, everything goes a little more smoothly. Exactly what company or software accomplishes this (and on what hardware, and how it all gets paid for) is still a work in progress, with the likes of Nvidia and startups like Veo Robotics feeling their way through.

Another interesting step forward was taken in Nvidia’s home turf of gaming. The company’s latest and greatest GPUs are built not just to push triangles and shaders, but to quickly accomplish AI-powered tasks like its own DLSS tech for uprezzing and adding frames.

The issue they’re trying to solve is that gaming engines are so demanding that generating more than 120 frames per second (to keep up with the latest monitors) while maintaining visual fidelity is a Herculean task even powerful GPUs can barely do. But DLSS is sort of like an intelligent frame blender that can increase the resolution of the source frame without aliasing or artifacts, so the game doesn’t have to push quite so many pixels.

In DLSS 3, Nvidia claims it can generate entire additional frames at a 1:1 ratio, so you could be rendering 60 frames naturally and the other 60 via AI. I can think of several reasons that might make things weird in a high performance gaming environment, but Nvidia is probably well aware of those. At any rate you’ll need to pay about a grand for the privilege of using the new system, since it will only run on RTX 40 series cards. But if graphical fidelity is your top priority, have at it.

Illustration of drones building in a remote area.

Last thing today is a drone-based 3D printing technique from Imperial College London that could be used for autonomous building processes sometime in the deep future. For now it’s definitely not practical for creating anything bigger than a trash can, but it’s still early days. Eventually they hope to make it more like the above, and it does look cool, but watch the video below to get your expectations straight.

Perceptron: Multilingual, laughing, Pitfall-playing and streetwise AI by Kyle Wiggers originally published on TechCrunch

OpenAI open-sources Whisper, a multilingual speech recognition system

Speech recognition remains a challenging problem in AI and machine learning. In a step toward solving it, OpenAI today open-sourced Whisper, an automatic speech recognition system that the company claims enables “robust” transcription in multiple languages as well as translation from those languages into English.

Countless organizations have developed highly capable speech recognition systems, which sit at the core of software and services from tech giants like Google, Amazon and Meta. But what makes Whisper different, according to OpenAI, is that it was trained on 680,000 hours of multilingual and “multitask” data collected from the web, which lead to improved recognition of unique accents, background noise and technical jargon.

“The primary intended users of [the Whisper] models are AI researchers studying robustness, generalization, capabilities, biases and constraints of the current model. However, Whisper is also potentially quite useful as an automatic speech recognition solution for developers, especially for English speech recognition,” OpenAI wrote in the GitHub repo for Whisper, from where several versions of the system can be downloaded. “[The models] show strong ASR results in ~10 languages. They may exhibit additional capabilities … if fine-tuned on certain tasks like voice activity detection, speaker classification or speaker diarization but have not been robustly evaluated in these area.”

Whisper has its limitations, particularly in the area of text prediction. Because the system was trained on a large amount of “noisy” data, OpenAI cautions Whisper might include words in its transcriptions that weren’t actually spoken — possibly because it’s both trying to predict the next word in audio and trying to transcribe the audio itself. Moreover, Whisper doesn’t perform equally well across languages, suffering from a higher error rate when it comes to speakers of languages that aren’t well-represented in the training data.

That last bit is nothing new to the world of speech recognition, unfortunately. Biases have long plagued even the best systems, with a 2020 Stanford study finding systems from Amazon, Apple, Google, IBM and Microsoft made far fewer errors — about 35% — with users who are white than with users who are Black.

Despite this, OpenAI sees Whisper’s transcription capabilities being used to improve existing accessibility tools.

“While Whisper models cannot be used for real-time transcription out of the box, their speed and size suggest that others may be able to build applications on top of them that allow for near-real-time speech recognition and translation,” the company continues on GitHub. “The real value of beneficial applications built on top of Whisper models suggests that the disparate performance of these models may have real economic implications … [W]e hope the technology will be used primarily for beneficial purposes, making automatic speech recognition technology more accessible could enable more actors to build capable surveillance technologies or scale up existing surveillance efforts, as the speed and accuracy allow for affordable automatic transcription and translation of large volumes of audio communication.”

The release of Whisper isn’t necessarily indicative of OpenAI’s future plans. While increasingly focused on commercial efforts like DALL-E 2 and GPT-3, the company is pursuing several purely theoretical research threads, including AI systems that learn by observing videos.

OpenAI open-sources Whisper, a multilingual speech recognition system by Kyle Wiggers originally published on TechCrunch