Microsoft Azure sets its sights on more analytics workloads

Enterprises now amass huge amounts of data, both from their own tools and applications, as well as from the SaaS applications they use. For a long time, that data was basically exhaust. Maybe it was stored for a while to fulfill some legal requirements, but then it was discarded. Now, data is what drives machine learning models, and the more data you have, the better. It’s maybe no surprise, then, that the big cloud vendors started investing in data warehouses and lakes early on. But that’s just a first step. After that, you also need the analytics tools to make all of this data useful.

Today, it’s Microsoft turn to shine the spotlight on its data analytics services. The actual news here is pretty straightforward. Two of these are services that are moving into general availability: the second generation of Azure Data Lake Storage for big data analytics workloads and Azure Data Explorer, a managed service that makes easier ad-hoc analysis of massive data volumes. Microsoft is also previewing a new feature in Azure Data Factory, its graphical no-code service for building data transformation. Data Factory now features the ability to map data flows.

Those individual news pieces are interesting if you are a user or are considering Azure for your big data workloads, but what’s maybe more important here is that Microsoft is trying to offer a comprehensive set of tools for managing and storing this data — and then using it for building analytics and AI services.

(Photo credit:Josh Edelson/AFP/Getty Images)

“AI is a top priority for every company around the globe,” Julia White, Microsoft’s corporate VP for Azure, told me. “And as we are working with our customers on AI, it becomes clear that their analytics often aren’t good enough for building an AI platform.” These companies are generating plenty of data, which then has to be pulled into analytics systems. She stressed that she couldn’t remember a customer conversation in recent months that didn’t focus on AI. “There is urgency to get to the AI dream,” White said, but the growth and variety of data presents a major challenge for many enterprises. “They thought this was a technology that was separate from their core systems. Now it’s expected for both customer-facing and line-of-business applications.”

Data Lake Storage helps with managing this variety of data since it can handle both structured and unstructured data (and is optimized for the Spark and Hadoop analytics engines). The service can ingest any kind of data — yet Microsoft still promises that it will be very fast. “The world of analytics tended to be defined by having to decide upfront and then building rigid structures around it to get the performance you wanted,” explained White. Data Lake Storage, on the other hand, wants to offer the best of both worlds.

Likewise, White argued that while many enterprises used to keep these services on their on-premises servers, many of them are still appliance-based. But she believes the cloud has now reached the point where the price/performance calculations are in its favor. It took a while to get to this point, though, and to convince enterprises. White noted that for the longest time, enterprises that looked at their analytics projects thought $300 million projects took forever, tied up lots of people and were frankly a bit scary. “But also, what we had to offer in the cloud hasn’t been amazing until some of the recent work,” she said. “We’ve been on a journey — as well as the other cloud vendors — and the price performance is now compelling.” And it sure helps that if enterprises want to meet their AI goals, they’ll now have to tackle these workloads, too.

Fabula AI is using social spread to spot ‘fake news’

UK startup Fabula AI reckons it’s devised a way for artificial intelligence to help user generated content platforms get on top of the disinformation crisis that keeps rocking the world of social media with antisocial scandals.

Even Facebook’s Mark Zuckerberg has sounded a cautious note about AI technology’s capability to meet the complex, contextual, messy and inherently human challenge of correctly understanding every missive a social media user might send, well-intentioned or its nasty flip-side.

“It will take many years to fully develop these systems,” the Facebook founder wrote two years ago, in an open letter discussing the scale of the challenge of moderating content on platforms thick with billions of users. “This is technically difficult as it requires building AI that can read and understand news.”

But what if AI doesn’t need to read and understand news in order to detect whether it’s true or false?

Step forward Fabula, which has patented what it dubs a “new class” of machine learning algorithms to detect “fake news” — in the emergent field of “Geometric Deep Learning”; where the datasets to be studied are so large and complex that traditional machine learning techniques struggle to find purchase on this ‘non-Euclidean’ space.

The startup says its deep learning algorithms are, by contrast, capable of learning patterns on complex, distributed data sets like social networks. So it’s billing its technology as a breakthrough. (Its written a paper on the approach which can be downloaded here.)

It is, rather unfortunately, using the populist and now frowned upon badge “fake news” in its PR. But it says it’s intending this fuzzy umbrella to refer to both disinformation and misinformation. Which means maliciously minded and unintentional fakes. Or, to put it another way, a photoshopped fake photo or a genuine image spread in the wrong context.

The approach it’s taking to detecting disinformation relies not on algorithms parsing news content to try to identify malicious nonsense but instead looks at how such stuff spreads on social networks — and also therefore who is spreading it.

There are characteristic patterns to how ‘fake news’ spreads vs the genuine article, says Fabula co-founder and chief scientist, Michael Bronstein.

“We look at the way that the news spreads on the social network. And there is — I would say — a mounting amount of evidence that shows that fake news and real news spread differently,” he tells TechCrunch, pointing to a recent major study by MIT academics which found ‘fake news’ spreads differently vs bona fide content on Twitter.

“The essence of geometric deep learning is it can work with network-structured data. So here we can incorporate heterogenous data such as user characteristics; the social network interactions between users; the spread of the news itself; so many features that otherwise would be impossible to deal with under machine learning techniques,” he continues.

Bronstein, who is also a professor at Imperial College London, with a chair in machine learning and pattern recognition, likens the phenomenon Fabula’s machine learning classifier has learnt to spot to the way infectious disease spreads through a population.

“This is of course a very simplified model of how a disease spreads on the network. In this case network models relations or interactions between people. So in a sense you can think of news in this way,” he suggests. “There is evidence of polarization, there is evidence of confirmation bias. So, basically, there are what is called echo chambers that are formed in a social network that favor these behaviours.”

“We didn’t really go into — let’s say — the sociological or the psychological factors that probably explain why this happens. But there is some research that shows that fake news is akin to epidemics.”

The tl;dr of the MIT study, which examined a decade’s worth of tweets, was that not only does the truth spread slower but also that human beings themselves are implicated in accelerating disinformation. (So, yes, actual human beings are the problem.) Ergo, it’s not all bots doing all the heavy lifting of amplifying junk online.

The silver lining of what appears to be an unfortunate quirk of human nature is that a penchant for spreading nonsense may ultimately help give the stuff away — making a scalable AI-based tool for detecting ‘BS’ potentially not such a crazy pipe-dream.

Although, to be clear, Fabula’s AI remains in development at this stage, having been tested internally on Twitter data sub-sets at this stage. And the claims it’s making for its prototype model remain to be commercially tested with customers in the wild using the tech across different social platforms.

It’s hoping to get there this year, though, and intends to offer an API for platforms and publishers towards the end of this year. The AI classifier is intended to run in near real-time on a social network or other content platform, identifying BS.

Fabula envisages its own role, as the company behind the tech, as that of an open, decentralised “truth-risk scoring platform” — akin to a credit referencing agency just related to content, not cash.

Scoring comes into it because the AI generates a score for classifying content based on how confident it is it’s looking at a piece of fake vs true news.

A visualisation of a fake vs real news distribution pattern; users who predominantly share fake news are coloured red and users who don’t share fake news at all are coloured blue — which Fabula says shows the clear separation into distinct groups, and “the immediately recognisable difference in spread pattern of dissemination”.

In its own tests Fabula says its algorithms were able to identify 93 percent of “fake news” within hours of dissemination — which Bronstein claims is “significantly higher” than any other published method for detecting ‘fake news’. (Their accuracy figure uses a standard aggregate measurement of machine learning classification model performance, called ROC AUC.)

The dataset the team used to train their model is a subset of Twitter’s network — comprised of around 250,000 users and containing around 2.5 million “edges” (aka social connections).

For their training dataset Fabula relied on true/fake labels attached to news stories by third party fact checking NGOs, including Snopes and PolitiFact. And, overall, pulling together the dataset was a process of “many months”, according to Bronstein, He also says that around a thousand different stories were used to train the model, adding that the team is confident the approach works on small social networks, as well as Facebook-sized mega-nets.

Asked whether he’s sure the model hasn’t been trained to identified patterns caused by bot-based junk news spreaders, he says the training dataset included some registered (and thus verified ‘true’) users.

“There is multiple research that shows that bots didn’t play a significant amount [of a role in spreading fake news] because the amount of it was just a few percent. And bots can be quite easily detected,” he also suggests, adding: “Usually it’s based on some connectivity analysis or content analysis. With our methods we can also detect bots easily.”

To further check the model, the team tested its performance over time by training it on historical data and then using a different split of test data.

“While we see some drop in performance it is not dramatic. So the model ages well, basically. Up to something like a year the model can still be applied without any re-training,” he notes, while also saying that, when applied in practice, the model would be continually updated as it keeps digesting (ingesting?) new stories and social media content.

Somewhat terrifyingly, the model could also be used to predict virality, according to Bronstein — raising the dystopian prospect of the API being used for the opposite purpose to that which it’s intended: i.e. maliciously, by fake news purveyors, to further amp up their (anti)social spread.

“Potentially putting it into evil hands it might do harm,” Bronstein concedes. Though he takes a philosophical view on the hyper-powerful double-edged sword of AI technology, arguing such technologies will create an imperative for a rethinking of the news ecosystem by all stakeholders, as well as encouraging emphasis on user education and teaching critical thinking.

Let’s certainly hope so. And, on the educational front, Fabula is hoping its technology can play an important role — by spotlighting network-based cause and effect.

“People now like or retweet or basically spread information without thinking too much or the potential harm or damage they’re doing to everyone,” says Bronstein, pointing again to the infectious diseases analogy. “It’s like not vaccinating yourself or your children. If you think a little bit about what you’re spreading on a social network you might prevent an epidemic.”

So, tl;dr, think before you RT.

Returning to the accuracy rate of Fabula’s model, while ~93 per cent might sound pretty impressive, if it were applied to content on a massive social network like Facebook — which has some 2.3BN+ users, uploading what could be trillions of pieces of content daily — even a seven percent failure rate would still make for an awful lot of fakes slipping undetected through the AI’s net.

But Bronstein says the technology does not have to be used as a standalone moderation system. Rather he suggests it could be used in conjunction with other approaches such as content analysis, and thus function as another string on a wider ‘BS detector’s bow.

It could also, he suggests, further aid human content reviewers — to point them to potentially problematic content more quickly.

Depending on how the technology gets used he says it could do away with the need for independent third party fact-checking organizations altogether because the deep learning system can be adapted to different use cases.

Example use-cases he mentions include an entirely automated filter (i.e. with no human reviewer in the loop); or to power a content credibility ranking system that can down-weight dubious stories or even block them entirely; or for intermediate content screening to flag potential fake news for human attention.

Each of those scenarios would likely entail a different truth-risk confidence score. Though most — if not all — would still require some human back-up. If only to manage overarching ethical and legal considerations related to largely automated decisions. (Europe’s GDPR framework has some requirements on that front, for example.)

Facebook’s grave failures around moderating hate speech in Myanmar — which led to its own platform becoming a megaphone for terrible ethnical violence — were very clearly exacerbated by the fact it did not have enough reviewers who were able to understand (the many) local languages and dialects spoken in the country.

So if Fabula’s language-agnostic propagation and user focused approach proves to be as culturally universal as its makers hope, it might be able to raise flags faster than human brains which lack the necessary language skills and local knowledge to intelligently parse context.

“Of course we can incorporate content features but we don’t have to — we don’t want to,” says Bronstein. “The method can be made language independent. So it doesn’t matter whether the news are written in French, in English, in Italian. It is based on the way the news propagates on the network.”

Although he also concedes: “We have not done any geographic, localized studies.”

“Most of the news that we take are from PolitiFact so they somehow regard mainly the American political life but the Twitter users are global. So not all of them, for example, tweet in English. So we don’t yet take into account tweet content itself or their comments in the tweet — we are looking at the propagation features and the user features,” he continues.

“These will be obviously next steps but we hypothesis that it’s less language dependent. It might be somehow geographically varied. But these will be already second order details that might make the model more accurate. But, overall, currently we are not using any location-specific or geographic targeting for the model.

“But it will be an interesting thing to explore. So this is one of the things we’ll be looking into in the future.”

Fabula’s approach being tied to the spread (and the spreaders) of fake news certainly means there’s a raft of associated ethical considerations that any platform making use of its technology would need to be hyper sensitive to.

For instance, if platforms could suddenly identify and label a sub-set of users as ‘junk spreaders’ the next obvious question is how will they treat such people?

Would they penalize them with limits — or even a total block — on their power to socially share on the platform? And would that be ethical or fair given that not every sharer of fake news is maliciously intending to spread lies?

What if it turns out there’s a link between — let’s say — a lack of education and propensity to spread disinformation? As there can be a link between poverty and education… What then? Aren’t your savvy algorithmic content downweights risking exacerbating existing unfair societal divisions?

Bronstein agrees there are major ethical questions ahead when it comes to how a ‘fake news’ classifier gets used.

“Imagine that we find a strong correlation between the political affiliation of a user and this ‘credibility’ score. So for example we can tell with hyper-ability that if someone is a Trump supporter then he or she will be mainly spreading fake news. Of course such an algorithm would provide great accuracy but at least ethically it might be wrong,” he says when we ask about ethics.

He confirms Fabula is not using any kind of political affiliation information in its model at this point — but it’s all too easy to imagine this sort of classifier being used to surface (and even exploit) such links.

“What is very important in these problems is not only to be right — so it’s great of course that we’re able to quantify fake news with this accuracy of ~90 percent — but it must also be for the right reasons,” he adds.

The London-based startup was founded in April last year, though the academic research underpinning the algorithms has been in train for the past four years, according to Bronstein.

The patent for their method was filed in early 2016 and granted last July.

They’ve been funded by $500,000 in angel funding and about another $500,000 in total of European Research Council grants plus academic grants from tech giants Amazon, Google and Facebook, awarded via open research competition awards.

(Bronstein confirms the three companies have no active involvement in the business. Though doubtless Fabula is hoping to turn them into customers for its API down the line. But he says he can’t discuss any potential discussions it might be having with the platforms about using its tech.)

Focusing on spotting patterns in how content spreads as a detection mechanism does have one major and obvious drawback — in that it only works after the fact of (some) fake content spread. So this approach could never entirely stop disinformation in its tracks.

Though Fabula claims detection is possible within a relatively short time frame — of between two and 20 hours after content has been seeded onto a network.

“What we show is that this spread can be very short,” he says. “We looked at up to 24 hours and we’ve seen that just in a few hours… we can already make an accurate prediction. Basically it increases and slowly saturates. Let’s say after four or five hours we’re already about 90 per cent.”

“We never worked with anything that was lower than hours but we could look,” he continues. “It really depends on the news. Some news does not spread that fast. Even the most groundbreaking news do not spread extremely fast. If you look at the percentage of the spread of the news in the first hours you get maybe just a small fraction. The spreading is usually triggered by some important nodes in the social network. Users with many followers, tweeting or retweeting. So there are some key bottlenecks in the network that make something viral or not.”

A network-based approach to content moderation could also serve to further enhance the power and dominance of already hugely powerful content platforms — by making the networks themselves core to social media regulation, i.e. if pattern-spotting algorithms rely on key network components (such as graph structure) to function.

So you can certainly see why — even above a pressing business need — tech giants are at least interested in backing the academic research. Especially with politicians increasingly calling for online content platforms to be regulated like publishers.

At the same time, there are — what look like — some big potential positives to analyzing spread, rather than content, for content moderation purposes.

As noted above, the approach doesn’t require training the algorithms on different languages and (seemingly) cultural contexts — setting it apart from content-based disinformation detection systems. So if it proves as robust as claimed it should be more scalable.

Though, as Bronstein notes, the team have mostly used U.S. political news for training their initial classifier. So some cultural variations in how people spread and react to nonsense online at least remains a possibility.

A more certain challenge is “interpretability” — aka explaining what underlies the patterns the deep learning technology has identified via the spread of fake news.

While algorithmic accountability is very often a challenge for AI technologies, Bronstein admits it’s “more complicated” for geometric deep learning.

“We can potentially identify some features that are the most characteristic of fake vs true news,” he suggests when asked whether some sort of ‘formula’ of fake news can be traced via the data, noting that while they haven’t yet tried to do this they did observe “some polarization”.

“There are basically two communities in the social network that communicate mainly within the community and rarely across the communities,” he says. “Basically it is less likely that somebody who tweets a fake story will be retweeted by somebody who mostly tweets real stories. There is a manifestation of this polarization. It might be related to these theories of echo chambers and various biases that exist. Again we didn’t dive into trying to explain it from a sociological point of view — but we observed it.”

So while, in recent years, there have been some academic efforts to debunk the notion that social media users are stuck inside filter bubble bouncing their own opinions back at them, Fabula’s analysis of the landscape of social media opinions suggests they do exist — albeit, just not encasing every Internet user.

Bronstein says the next steps for the startup is to scale its prototype to be able to deal with multiple requests so it can get the API to market in 2019 — and start charging publishers for a truth-risk/reliability score for each piece of content they host.

“We’ll probably be providing some restricted access maybe with some commercial partners to test the API but eventually we would like to make it useable by multiple people from different businesses,” says requests. “Potentially also private users — journalists or social media platforms or advertisers. Basically we want to be… a clearing house for news.”

Big companies are not becoming data-driven fast enough

I remember watching MIT professor Andrew McAfee years ago telling stories about the importance of data over gut feeling, whether it was predicting successful wines or making sound business decisions. We have been hearing about big data and data-driven decision making for so long, you would think it has become hardened into our largest organizations by now. As it turns out, new research by NewVantage Partners finds that most large companies are having problems implementing an organization-wide, data-driven strategy.

McAfee was fond of saying that before the data deluge we have today, the way most large organizations made decisions was via the HiPPO — the highest paid person’s opinion. Then he would chide the audience that this was not the proper way to run your business. Data, not gut feelings, even those based on experience, should drive important organizational decisions.

While companies haven’t failed to recognize McAfee’s advice, the NVP report suggests they are having problems implementing data-driven decision making across organizations. There are plenty of technological solutions out there today to help them from startups all the way to the largest enterprise vendors, but the data (see, you always need to go back to the data) suggests that it’s not a technology problem, it’s people problem.

Executives can have farsighted vision that their organizations need to be data-driven. They can acquire all of the latest solutions to bring data to the forefront, but unless they combine that with a broad cultural shift and a deep understanding of how to use that data inside business processes, they will continue to struggle.

The study’s authors, Randy Bean and Thomas H. Davenport, wrote about the people problem in their study’s executive summary. “We hear little about initiatives devoted to changing human attitudes and behaviors around data. Unless the focus shifts to these types of activities, we are likely to see the same problem areas in the future that we’ve observed year after year in this survey.”

The survey found that 72 percent of respondents have failed in this regard, reporting they haven’t been able to create a data-driven culture, whatever that means to individual respondents. Meanwhile, 69 percent reported they had failed to create a data-driven organization, although it would seem that these two metrics would be closely aligned.

Perhaps most discouraging of all is that the data is trending the wrong way. Over the last several years, the report’s authors say that those organizations calling themselves data-driven has actually dropped each year from 37.1% in 2017 to 32.4% in 2018 to 31.0% in the latest survey.

This matters on so many levels, but consider that as companies shift to artificial intelligence and machine learning, these technologies rely on abundant amounts of data to work effectively. What’s more, every organization regardless of its size, is generating vast amounts of data, simply as part of being a digital business in the 21st century. They need to find a way to control this data to make better decisions and understand their customers better. It’s essential.

There is so much talk about innovation and disruption, and understanding and affecting company culture, but so much of all this is linked. You need to be more agile. You need to be more digital. You need to be transformational. You need to be all of these things — and data is at the center of all of it.

Data has been called the new oil often enough to be cliche, but these results reveal that the lesson is failing to get through. Companies need to be data-driven now, this instant. This isn’t something to be working towards at this point. This is something you need to be doing, unless your ultimate goal is to become irrelevant.

Global investor SparkLabs launches a consultancy business for corporates

Global investor SparkLabs is adding another business line after it announced a new consultancy division that’s aimed at working with Fortune 500 companies and other global corporates keen to deepen their position in tech.

Best known for its funds — which cover global deals, a crypto vehicle and a Korea-based fund — and over half a dozen accelerator programs worldwide, the organization is responding to interest it has fielded from LPs, corporates and other businesses keen to tap into its network and insights, SparksLabs Group co-founder Jimmy Kim told TechCrunch.

“We’ll be providing research reports on certain key industries and doing key networking and introductions into startups of their interest,” he said in an interview. “Initially, there will be a handful of staff and then we’ll just scale from there.”

SparkLabs Foundry will be headquartered in San Francisco but it will tap into the group’s global reach, including offices in markets like Singapore and Korea, and insight from a portfolio of more than 220 startups across its various activities.

Kim explained that, particularly for corporations based in Asia, simply opening an office in Silicon Valley doesn’t guarantee that they walk into the right networks for deal flow or gain key insight. That’s where SparkLabs is hoping to make a difference, and it expects that frontier tech including machine learning, blockchain, security and AI will be major focuses.

The new venture will be lead by some familiar faces. Scott Sorochak, a long-time mentor with the firm, recently joined from Blarney Ventures, and his team includes chief business officer Jaeson Ma, who co-founded SparkLabs portfolio startup 88Rising. Its list of advisors includes names like Sid Anand, PayPal’s chief data engineer, ex Procter & Gamble CTO Bruce Brown and smart oven startup Brava’s CEO Jon Pleasants.

Play Iconary, a simple drawing game that hides a deceptively deep AI

It may not seem like it takes a lot of smarts to play a game like Pictionary, but in fact it involves a lot of subtle and abstract visual and linguistic skills. This AI built to play a game like it is similarly complex, and its interpretations and creations when you play it (as you can now) may seem eerily human — but it’s also refreshing to have such an agent working collaboratively with you rather than beating you with superhuman skills.

Iconary, as the game’s creators at the Allen Institute for AI decided to call it to avoid lawsuits from Mattel, has you drawing and arranging icons to form phrases, or guessing at the arrangements of the computer player.

For instance, if you were to get the phrase “woman drinking milk from a glass,” you’d probably draw a woman — a stick figure, probably, and then select the “woman” icon from the computer’s interpretations of your sketch. Then you’d draw a glass, and place that near the woman. Then… milk? How do you draw milk? There is actually a milk bottle icon if you look for it, but you could also draw a cow and put that in or next to the glass.

The computer then guesses at what you’ve put together, and after a few tries it would probably get it. You can also play it the other way, where the computer arranges icons and you have to guess.

Now, let’s get this right out of the way: this is very different from Google’s superficially similar “Quick, Draw” game. In that one the system has been can only guess whether your drawing is one of a few hundred pre-selected objects it’s been specifically trained to recognize.

Not only are there some 75,000 phrases supported in Iconary, with more being added regularly, but there’s no way to train the AI on them — the way that any one of them can be represented is uncountable.

“When you start bringing in phrases, the problem space explodes,” explained Ali Farhadi, one of the creators of the project; I talked with him and researcher Aniruddah Kembhavi about Iconary ahead of its release. “Sure, you can easily recognize a cat or a dog. But can you recognize a cat purring, or a dog scratching its back? There’s a huge diversity in elements people choose and how they position them.”

Although Pictionary may seem at first like a game that depends on your drawing skill, it’s really much more about arranging ideas and understanding the relationship with them — seeing the intent behind the drawing. How else can some people manage to recognize a word or phrase from a handful of crude shapes and some arrows?

The AI behind Iconary, then, isn’t a drawing recognition engine at all but one that has been trained to recognize relationships between objects, informed by their type, position, number, and everything else. This is, the researchers say, the most significant example of AI collaborating meaningfully with humans yet created.

And this logic is kept fuzzy enough that several “person” icons gathered together could mean women, men, people, group, crowd, team, or anything else. How would you know if it was a “team?” Well, if you put a soccer ball near it or put them on a play field, it becomes obvious. If there’s a blackboard there, it’s probably a class. And so on.

Of course, I say “and so on,” but that small phrase in a way encompasses the entirety of human intuition and years of training on how to view and interpret the visual world. Naturally Iconary isn’t nearly as good at it as we are, but its logic is frequently surprisingly human.

If you can only get part of the answer, you can ask the AI to draw again, and just like we do in Pictionary it will adapt its representation to address your needs.

It was of course trained on human drawings collected via Mechanical Turk, but it isn’t just replicating what people drew. If the only thing it ever saw to represent a scientist was a man next to a microscope, how would it know to recognize the same idea in a woman, or standing next to an atom or rocket? In fact, the model has never been exposed to the phrases you can play with now. As the researchers write:

AllenAI has never before encountered the unique phrases in Iconary, yet our preliminary games have shown that our AI system is able to both successfully depict and understand phrases with a human partner with an often surprising deftness and nuance. This feat requires combining natural language understanding, computer vision, and the use of common sense to reason about phrases and their depictions within the constraints of a small vocabulary of possible icons. Being successful at Iconary requires skills beyond basic pattern recognition, including multi-hop reasoning, abstraction, collaboration, and adaptation

Instead of simply pairing “ball” with “sport,” it learned about why those objects are related, and how to exert some basic common sense — a sort of holy grail in AI, though this is only a small step in that direction. If one person draws “father” as a man bigger than a smaller person, it isn’t obvious to the computer that the father is the big one, not the small. And it’s another logical jump that a “mother” would be a similarly-sized woman, or that the small one is a child.

But by observing how people used the objects and how they relate to one another, the AI built up a network of ideas about how different things are represented or related. “Child” is closer to “student” than “horse,” for instance. And “student” is close to “desk” and “laptop.” So if you draw a child by a desk, maybe it’s a student? This kind of robust logic is so simple to us that we don’t even recognize we’re doing it, but incredibly hard to build into a machine learning agent.

This type of AI is deceptively broad and intelligent, but it isn’t flashy the way that the human-destorying AlphaStar or AlphaGo are. It isn’t superhuman — in fact, it’s not even close to human. But board and PC games are tightly bounded problem spaces with set rules and limits. Visual expression of a complex phrase like “crowd celebrating a victory on a street” isn’t a question of how fast the computer can process, but the depth of its understanding of the concepts involved, and how others think about them.

This kind of learning is also more broadly applicable in the real world. Robots and self-driving cars do need to know how to exceed human capacity in some cases, but it’s also massively important to be able to understand the world around them in the same way people do. When it sees a person by a hospital bed holding a book, what does that mean? When a person leaves a knife out next to a whole tomato? And so on.

“Real life problems involve semantics, abstraction, and collaboration,” said Farhadi. “They involve theory of mind.”

Interestingly, the agent is biased a bit (as these things tend to be) owing to the natural bias of our language. Images “read” from left to right, as people tend to draw them, since we also read in that direction, so keep that in mind.

Try playing a couple games both drawing and guessing, and you may be surprised at the cleverness and weirdness of the AI’s suggestions. Don’t feel bad about skipping one — the agent is still learning, and sometimes its attempts to represent ideas are a bit too abstract. But I certainly found myself impressed more than baffled.

If you’d like to learn more, stay tuned: the team behind the system will be publishing a paper on it later this year. I’ll update this post when that happens.

Databricks raises $250M at a $2.75B valuation for its analytics platform

Databricks, the company behind the Apache Spark big data analytics engine, today announced that it has raised a $250 million Series E round led by Andreessen Horowitz. Coatue Management, Microsoft and NEA, also participated in this round, which brings the company’s total funding to $498.5 million. Microsoft’s involvement here is probably a bit of a surprise, but it’s worth noting that it also worked with Databricks on the launch of Azure Databricks as a first-party service on the platform, something that’s still a rarity in the Azure cloud.

As Databricks also today announced, its annual recurring revenue now exceeds $100 million. The company didn’t share whether it’s cash flow-positive at this point, but Databricks CEO and co-founder Ali Ghodsi shared that the company’s valuation is now $2.75 billion.

Current customers, which the company says number around 2,000, include the likes of Nielsen,, Overstock, Bechtel, Shell and HP.

While Databricks is obviously known for its contributions to Apache Spark, the company itself monetizes that work by offering its Unified Analytics platform on top of it. This platform allows enterprises to build their data pipelines across data storage systems and prepare data sets for data scientists and engineers. To do this, Databricks offers shared notebooks and tools for building, managing and monitoring data pipelines, and then uses that data to build machine learning models, for example. Indeed, training and deploying these models is one of the company’s focus areas these days, which makes sense, given that this is one of the main use cases for big data, after all.

On top of that, Databricks also offers a fully managed service for hosting all of these tools.

“Databricks is the clear winner in the big data platform race,” said Ben Horowitz, co-founder and general partner at Andreessen Horowitz, in today’s announcement. “In addition, they have created a new category atop their world-beating Apache Spark platform called Unified Analytics that is growing even faster. As a result, we are thrilled to invest in this round.”

Ghodsi told me that Horowitz was also instrumental in getting the company to re-focus on growth. The company was already growing fast, of course, but Horowitz asked him why Databricks wasn’t growing faster. Unsurprisingly, given that it’s an enterprise company, that means aggressively hiring a larger sales force — and that’s costly. Hence the company’s need to raise at this point.

As Ghodsi told me, one of the areas the company wants to focus on is the Asia Pacific region, where overall cloud usage is growing fast. The other area the company is focusing on is support for more verticals like mass media and entertainment, federal agencies and fintech firms, which also comes with its own cost, given that the experts there don’t come cheap.

Ghodsi likes to call this “boring AI,” since it’s not as exciting as self-driving cars. In his view, though, the enterprise companies that don’t start using machine learning now will inevitably be left behind in the long run. “If you don’t get there, there’ll be no place for you in the next 20 years,” he said.

Engineering, of course, will also get a chunk of this new funding, with an emphasis on relatively new products like MLFlow and Delta, two tools Databricks recently developed and that make it easier to manage the life cycle of machine learning models and build the necessary data pipelines to feed them.

Healthcare wearables level up with new moves from Apple and Alphabet

Announcements that Apple has partnered with Aetna health insurance on a new app leveraging data from its Apple Watch and reports that Verily — one of the health-focused subsidiaries of Google‘s parent company — Alphabet, is developing a shoe that can detect weight and movement, indicate increasing momentum around using data from wearables for clinical health applications and treatments.

For venture capital investors, the movea from Apple and Alphabet to show new applications for wearable devices is a step in the right direction — and something that’s been long overdue.

“As a healthcare provider, we talk a lot about the important of preventative medicine, but the US healthcare system doesn’t have the right incentives in place to pay for it,” writes Cameron Sepah, an entrepreneur in residence at Trinity Ventures. “Since large employers largely pay for health care (outside of Medicaid and Medicare), they usually aren’t incentivized to pay for prevention, since employees don’t stay long enough for them to incur the long-term costs of health behaviors. So most startups in this space end up becoming an expendable wellness perk for companies. However, if an insurer like Aetna keeps its members long enough, there’s better alignment for disseminating this app.”

Sepah sees broader implications for the tie ups between health insurers and the tech companies making all sorts of devices to detect and diagnose conditions.

“Most patients relationship with their insurer is just getting paper bills/notifications in the mail, with terrible customer satisfaction (NPS) across the board,” Sepah wrote in an email. “But when there’s a way to build a closer relationship through a device that sits on your wrist, it opens possibilities to partner with other health tech startups that can notify patients when they are having mental health issues before they even recognize it (e.g. Mindstrong); or when they should get treatment for hypertension or sleep apnea (e.g. Cardiogram); or leverage their data into a digital chronic disease treatment program (e.g. Omada Health).”

Aetna isn’t the first insurer to tie Apple Watch data to their policies. In September 2018, John Hancock launched the Vitality program, which also gave users discounts on the latest Apple Watch if they linked it with John Hancock’s app. The company also gave out rewards if users changed their behavior around diet and exercise.

In a study conducted by Rand Europe of 400,000 people in the U.S., the U.K., and South Africa, research showed that users who wore an Apple Watch and participated in the Vitality benefits program averaged a 34 percent increase in physical activity compared to patients without the Apple Watch. It equated to roughly 5 extra days of working out per month.

“[It will] be interesting to see how CVS/Apple deal unfolds. Personalized health guidance based on a combination of individual medical records and real time wearable data is a huge and worthy goal,” wrote Greg Yap, a partner at the venture capital firm, Menlo Ventures . But, Yap wrote,I’m skeptical their first generation app will have enough data or training to deliver value to a broad population, but we’re likely to see some anecdotal benefits, and I find that worthwhile.”

Meanwhile the types of devices that record consumer health information are proliferating — thanks in no small part to Verily.

With the company reportedly working to co-develop shoes with sensors that monitor users’ movement and weight, according to CNBC, Verily is expanding its portfolio of connected devices for health monitoring and management. The company already has a watch that monitors certain patient data — including an FDA approved electrocardiogram — and is developing technologies to track diabetes-related eye disease in patients alongside smart lenses for cataract recovery.

It’s part of a broader push from technology companies to tie themselves closer to consumer health as they look to seize a part of the nearly $3 trillion healthcare industry.

If more data can be collected from wearable devices (or consumer behavior) and then monitored in a consistent fashion, tech companies ideally could suggest interventions faster and provide lower cost treatments to help avoid the need for urgent or emergency care.

These “top of the funnel” communications and monitoring services from tech companies could conceivably divert users and future healthcare patients into an alternative system that is potentially lower-cost with more of a focus on outcomes than on the volume of care and number of treatments prescribed.

Not all physicians are convinced that the use of persistent monitoring will result in better care. Dr. John Ioannidis, a celebrated professor from Stanford University, is skeptical about the utility of monitoring without a better understanding of what the data actually reveals.

“Information is good for you provided you know what it means. For much of that information we have no clue what it means. We have absolutely no idea what to do with it other than creating more anxiety,” Dr. Ioannidis said

The goal is to provide personalized guidance where machine learning can be used to identify problems and come up in concert with established therapeutic practices, according to investors who back life sciences starups.

“I think startups like Omada, Livongo, Lark, Vida, Virta, and others, can work and are already working on this overall vision of combining real time and personal historical data to deliver personalized guidance. But to be successful, startups need to be more narrowly focused and deliver improved outcomes and financial benefits right away,” according to Yap.


Rana el Kaliouby and Alexei Efroswill be speaking at TC Sessions: Robotics + AI April 18 at UC Berkeley

TechCrunch’s third robotics event is just over two and a half months away, and it’s already shaping up to be a doozy. We’ve already announced Anca Dragan, Hany Farid, Melonee Wise and Peter Barrett for our event and have an exciting pair of new names to share with you.

UC Berkeley’s Alexei Efros and Affectiva CEO Rana el Kaliouby will be joining us at Zellerbach Hall on April 18 for TC Sessions: Robotics+AI.

Alexei Efros is a professor at UC Berkeley’s Electrical Engineering and Computer Sciences and a member of the school’s Artificial Intelligence Research Lab. His work focuses on computer vision, graphics and computational photography, utilizing visual data to help better understand the world. Efros also researches robotics, machine learning and the use of computer vision in the humanities. Prior to joining UC Berkeley, he was member of CMU’s Robotics Institute.

Rana el Kaliouby is the cofounder and CEO of Affectiva, an MIT Media Lab spinoff that creates softs signed designed to recognize human emotions. el Kaliouby designed the startup’s underlying technology, which helps bring more depth and understanding to facial recognition. Prior to cofounding the company, she worked as an MIT research scientist, cofounding the school’s Autism & Communication Technology Initiative.

Early-Bird tickets are on sale now for just $249 – that’s $100 off full-priced tickets. Buy yours today here. Students can book a deeply discounted ticket for just $45 here.

Tencent moves into automotive with $150M joint venture

China’s internet firms are getting pally with giant state-owned automakers as they look to deploy their artificial intelligence and cloud computing services across traditional industries. Ride-hailing startup Didi Chuxing, which owns Uber China, announced earlier this week a new joint venture with state-owned BAIC. Hot on the heels came another entity set up between Tencent and the GAC Group.

GAC, which is owned by the Guangzhou municipal government in southern China, announced Thursday in a filing it will jointly establish a mobility company with social media and gaming behemoth Tencent, Guangzhou Public Transport Group alongside other investors.

The announcement followed an agreement between Tencent and GAC in 2017 to team up on internet-connected cars and smart driving, a deal that saw the carmaker tapping into Tencent’s expertise in mobile payments, social networking, big data and cloud services. Tencent, which is most famous for its instant messenger WeChat, went through a major restructuring last October to place more focus on enterprise-facing services, and the GAC tie-up appears to fit nicely into that pivot.

The fresh venture will bank a capital infusion of 1 billion yuan ($149 million) with GAC owning a 35 percent stake. Tencent and Guangzhou Public Transport will take up 25 percent and 10 percent, respectively.

A flurry of Chinese internet service providers have made forays into the automotive industry, marketing their digital and machine learning capabilities at old-school automakers. Besides Tencent, GAC has also recruited telecommunications equipment maker Huawei and voice assistant startup iFlytec to upgrade its vehicles. Search titan Baidu, on the other hand, operates an open platform for autonomous driving cars and has chosen state-owned Hongqi to test out its autonomous driving solutions. Ecommerce behemoth Alibaba has also set foot in transportation with a smart sedan jointly developed with state-owned SAIC.

Rigetti launches the public beta of its Quantum Cloud Services

Rigetti Computing, one of the leading startups in the quantum computing space, today announced the public beta of its Quantum Cloud Services (QCS) platform. With this, developers can get access to Rigetti’s quantum processors, as well as all the classical computing resources necessary to build and test quantum algorithms on this hybrid platform.

Beta users will get $5,000 in credits toward running their programs on the platform. The platform itself consists of classical computing resources (you still need those as the quantum chips are essentially specialized co-processors) and Rigetti’s quantum chips, including two of its latest Aspen quantum processors. In order to run your algorithms on those chips, you’ll have to book time using the service’s online booking system.

The core of the user experience, though, is Rigetti’s Quantum Machine Image, which features all of the company’s tools for building quantum algorithms, including its Forest SDK and a simulator for testing code. That image then runs on a regular server in Rigetti’s cloud, but it’s tightly coupled with the company’s quantum computing resources.

Developers also get access to the first set of quantum applications written by various Rigetti partners, like Zapata Computing, a company that specializes in algorithms for quantum computing. Those applications range from a tool for compressing quantum data to QuantumFreeze to some basic machine learning applications. There’s also a game where you help a penguin navigate a frozen lake that’s pocketed with holes. You can either make classical moves or split the penguin into a superposition of states. Why not, I guess.