Krisp snags $5M A round as demand grows for its voice-isolating algorithm

Krisp’s smart noise suppression tech, which silences ambient sounds and isolates your voice for calls, arrived just in time. The company got out in front of the global shift to virtual presence, turning early niche traction has into real customers and attracting a shiny new $5 million series A funding round to expand and diversify its timely offering.

We first met Krisp back in 2018 when it emerged from UC Berkeley’s Skydeck accelerator. The company was an early one in the big surge of AI startups, but with a straightforward use case and obviously effective tech it was hard to be skeptical about.

Krisp applies a machine learning system to audio in real time that has been trained on what is and isn’t the human voice. What isn’t a voice gets carefully removed even during speech, and what remains sounds clearer. That’s pretty much it! There’s very little latency (15 milliseconds is the claim) and a modest computational overhead, meaning it can work on practically any device, especially ones with AI acceleration units like most modern smartphones.

The company began by offering its standalone software for free, with paid tier that removed time limits. It also shipped integrated into popular social chat app Discord. But the real business is, unsurprisingly, in enterprise.

“Early on our revenue was all pro, but in December we started onboarding enterprises. COVID has really accelerated that plan,” explained Davit Baghdasaryan, co-founder and CEO of Krisp. “In March, our biggest customer was a large tech company with 2,000 employees — and they bought 2,000 licenses, because everyone is remote. Gradually enterprise is taking over, because we’re signing up banks, call centers and so on. But we think Krisp will still be consumer-first, because everyone needs that, right?”

Now even more large companies have signed on, including one call center with some 40,000 employees. Baghdasaryan says the company went from 0 to 600 paying enterprises, and $0 to $4M annual recurring revenue in a single year, which probably makes the investment — by Storm Ventures, Sierra Ventures, TechNexus and Hive Ventures — look like a pretty safe one.

It’s a big win for the Krisp team, which is split between the U.S. and Armenia, where the company was founded, and a validation of a global approach to staffing — world-class talent isn’t just to be found in California, New York, Berlin and other tech centers, but in smaller countries that don’t have the benefit of local hype and investment infrastructure.

Funding is another story, of course, but having raised money the company is now working to expand its products and team. Krisp’s next move is essentially to monitor and present the metadata of conversation.

“The next iteration will tell you not just about noise, but give you real time feedback on how you are performing as a speaker,” Baghdasaryan explained. Not in the toastmasters sense, exactly, but haven’t you ever wondered about how much you actually spoke during some call, or whether you interrupted or were interrupted by others, and so on?

“Speaking is a skill that people can improve. Think Grammar.ly for voice and video,” Baghdasaryan ventured. “It’s going to be subtle about how it gives that feedback to you. When someone is speaking they may not necessarily want to see that. But over time we’ll analyze what you say, give you hints about vocabulary, how to improve your speaking abilities.”

Since architecturally Krisp is privy to all audio going in and out, it can fairly easily collect this data. But don’t worry — like the company’s other products, this will be entirely private and on-device. No cloud required.

“We’re very opinionated here: Ours is a company that never sends data to its servers,” said Baghdasaryan. “We’re never exposed to it. We take extra steps to create and optimize our tech so the audio never leaves the device.”

That should be reassuring for privacy wonks who are suspicious of sending all their conversations through a third party to  be analyzed. But after all, the type of advice Krisp is considering can be done without really “understanding” what is said, which also limits its scope. It won’t be coaching you into a modern Cicero, but it might help you speak more consistently or let you know when you’re taking up too much time.

For the immediate future, though, Krisp is still focused on improving its noise-suppression software, which you can download for free here.

Sight Diagnostics raises $71M Series D for its blood analyzer

Sight Diagnostics, the Israel-based health-tech company behind the FDA-cleared OLO blood analyzer, today announced that it has raised a $71 million Series D round with participation from Koch Disruptive Technologies, Longliv Ventures (which led its Series C round)and crowd-funding platform OurCrowd. With this, the company has now raised a total of $124 million, though the company declined to share its current valuation.

With a founding team that used to work at Mobileye, among other companies, Sight made an early bet on using machine vision to analyze blood samples and provide a full blood count comparable to existing lab tests within minutes. The company received FDA clearance late last year, something that surely helped clear the way for this additional round of funding.

Image Credits: Sight Diagnostics

“Historically, blood tests were done by humans observing blood under a microscope. That was the case for maybe 200 years,” Sight CEO and co-founder Yossi Pollak told me. “About 60 years ago, a new technology called FCM — or flow cytometry — started to be used on large volume of blood from venous samples to do it automatically. In a sense, we are going back to the first approach, we just replaced the human eye behind the microscope with machine vision.”

Pollak noted that the tests generate about 60 gigabytes of information (a lot of that is the images, of course) and that he believes that the complete blood count is only a first step. One of the diseases it is looking to diagnose is COVID-19. To do so, the company has placed devices in hospitals around the world to see if it can gather the data to detect anomalies that may indicate the severity of some of the aspects of the disease.

“We just kind of scratched the surface of the ability of AI to help with with a wish with blood diagnostics,” said Pollak. “Specifically now, there’s so much value around COVID in decentralizing diagnostics and blood tests. Think keeping people — COVID-negative or -positive —  outside of hospitals to reduce the busyness of hospitals and reduce the risk for contamination for cancer patients and a lot of other populations that require constant complete blood counts. I think there’s a lot of potential and a lot of a value that we can bring specifically now to different markets and we are definitely looking into additional applications beyond [complate blood count] and also perfecting our product.”

So far, Sight Diagnostics has applied for 20 patents and eight have been issued so far. And while machine learning is obviously at the core of what the company does — with the models running on the OLO machine and not in the cloud — Pollak also stressed that the team has made breakthroughs around the sample preparation to allow it to automatically prepare the sample for analysis.

Image Credits: Sight Diagnostics

Pollok stressed that the company focused on the U.S. market with this funding round, which makes sense, given that it was still looking for its FDA clearance. He also noted that this marks Koch Disrupt Technologies’ third investment in Israel, with the other two also being healthcare startups.

“KDT’s investment in Sight is a testament to the company’s disruptive technology that we believe will fundamentally change the way blood diagnostic work is done,’ said Chase Koch, President of Koch Disruptive Technologies . “We’re proud to partner with the Sight team, which has done incredible work innovating this technology to transform modern healthcare and provide greater efficiency and safety for patients, healthcare workers, and hospitals worldwide.”

The company now has about 100 employees, mostly in R&D, with offices in London and New York.

Machine Learning for Product Managers – A Quick Primer

Currently, there are thousands of products, apps, and services driven by machine learning (ML) that we use every day. As was reported by Crunchbase, in 2019 there were 8,705 companies and startups that rely on this technology. According to PWC’s research, it’s predicted that ML and AI technologies will contribute about $15.7 trillion to global GDP by 2030. It’s obvious [...]

Read More...

The post Machine Learning for Product Managers – A Quick Primer appeared first on Mind the Product.

Machine Learning for Product Managers – A Quick Primer

Currently, there are thousands of products, apps, and services driven by machine learning (ML) that we use every day. As was reported by Crunchbase, in 2019 there were 8,705 companies and startups that rely on this technology. According to PWC’s research, it’s predicted that ML and AI technologies will contribute about $15.7 trillion to global GDP by 2030. It’s obvious [...]

Read More...

The post Machine Learning for Product Managers – A Quick Primer appeared first on Mind the Product.

The essential revenue software stack

From working with our 90+ portfolio companies and their customers, as well as from frequent conversations with enterprise leaders, we have observed a set of software services emerge and evolve to become best practice for revenue teams. This set of services — call it the “revenue stack” — is used by sales, marketing and growth teams to identify and manage their prospects and revenue.

The evolution of this revenue stack started long before anyone had ever heard the word coronavirus, but now the stakes are even higher as the pandemic has accelerated this evolution into a race. Revenue teams across the country have been forced to change their tactics and tools in the blink of an eye in order to adapt to this new normal — one in which they needed to learn how to sell in not only an all-digital world but also an all-remote one where teams are dispersed more than ever before. The modern “remote-virtual-digital”-enabled revenue team has a new urgency for modern technology that equips them to be just as — and perhaps even more — productive than their pre-coronavirus baseline. We have seen a core combination of solutions emerge as best-in-class to help these virtual teams be most successful. Winners are being made by the directors of revenue operations, VPs of revenue operations, and chief revenue officers (CROs) who are fast adopters of what we like to call the essential revenue software stack.

In this stack, we see four necessary core capabilities, all critically interconnected. The four core capabilities are:

  1. Revenue enablement.
  2. Sales engagement.
  3. Conversational intelligence.
  4. Revenue operations.

These capabilities run on top of three foundational technologies that most growth-oriented companies already use — agreement management, CRM and communications. We will dive into these core capabilities, the emerging leaders in each and provide general guidance on how to get started.

Revenue enablement

AI is struggling to adjust to 2020

2020 has made every industry reimagine how to move forward in light of COVID-19: civil rights movements, an election year and countless other big news moments. On a human level, we’ve had to adjust to a new way of living. We’ve started to accept these changes and figure out how to live our lives under these new pandemic rules. While humans settle in, AI is struggling to keep up.

The issue with AI training in 2020 is that, all of a sudden, we’ve changed our social and cultural norms. The truths that we have taught these algorithms are often no longer actually true. With visual AI specifically, we’re asking it to immediately interpret the new way we live with updated context that it doesn’t have yet.

Algorithms are still adjusting to new visual queues and trying to understand how to accurately identify them. As visual AI catches up, we also need a renewed importance on routine updates in the AI training process so inaccurate training datasets and preexisting open-source models can be corrected.

Computer vision models are struggling to appropriately tag depictions of the new scenes or situations we find ourselves in during the COVID-19 era. Categories have shifted. For example, say there’s an image of a father working at home while his son is playing. AI is still categorizing it as “leisure” or “relaxation.” It is not identifying this as ‘”work” or “office,” despite the fact that working with your kids next to you is the very common reality for many families during this time.

Image Credits: Westend61/Getty Images

On a more technical level, we physically have different pixel depictions of our world. At Getty Images, we’ve been training AI to “see.” This means algorithms can identify images and categorize them based on the pixel makeup of that image and decide what it includes. Rapidly changing how we go about our daily lives means that we’re also shifting what a category or tag (such as “cleaning”) entails.

Think of it this way — cleaning may now include wiping down surfaces that already visually appear clean. Algorithms have been previously taught that to depict cleaning, there needs to be a mess. Now, this looks very different. Our systems have to be retrained to account for these redefined category parameters.

This relates on a smaller scale as well. Someone could be grabbing a door knob with a small wipe or cleaning their steering wheel while sitting in their car. What was once a trivial detail now holds importance as people try to stay safe. We need to catch these small nuances so it’s tagged appropriately. Then AI can start to understand our world in 2020 and produce accurate outputs.

Image Credits: Chee Gin Tan/Getty Images

Another issue for AI right now is that machine learning algorithms are still trying to understand how to identify and categorize faces with masks. Faces are being detected as solely the top half of the face, or as two faces — one with the mask and a second of only the eyes. This creates inconsistencies and inhibits accurate usage of face detection models.

One path forward is to retrain algorithms to perform better when given solely the top portion of the face (above the mask). The mask problem is similar to classic face detection challenges such as someone wearing sunglasses or detecting the face of someone in profile. Now masks are commonplace as well.

Image Credits: Rodger Shija/EyeEm/Getty Images

What this shows us is that computer vision models still have a long way to go before truly being able to “see” in our ever-evolving social landscape. The way to counter this is to build robust datasets. Then, we can train computer vision models to account for the myriad different ways a face may be obstructed or covered.

At this point, we’re expanding the parameters of what the algorithm sees as a face — be it a person wearing a mask at a grocery store, a nurse wearing a mask as part of their day-to-day job or a person covering their face for religious reasons.

As we create the content needed to build these robust datasets, we should be aware of potentially increased unintentional bias. While some bias will always exist within AI, we now see imbalanced datasets depicting our new normal. For example, we are seeing more images of white people wearing masks than other ethnicities.

This may be the result of strict stay-at-home orders where photographers have limited access to communities other than their own and are unable to diversify their subjects. It may be due to the ethnicity of the photographers choosing to shoot this subject matter. Or, due to the level of impact COVID-19 has had on different regions. Regardless of the reason, having this imbalance will lead to algorithms being able to more accurately detect a white person wearing a mask than any other race or ethnicity.

Data scientists and those who build products with models have an increased responsibility to check for the accuracy of models in light of shifts in social norms. Routine checks and updates to training data and models are key to ensuring quality and robustness of models — now more than ever. If outputs are inaccurate, data scientists can quickly identify them and course correct.

It’s also worth mentioning that our current way of living is here to stay for the foreseeable future. Because of this, we must be cautious about the open-source datasets we’re leveraging for training purposes. Datasets that can be altered, should. Open-source models that cannot be altered need to have a disclaimer so it’s clear what projects might be negatively impacted from the outdated training data.

Identifying the new context we’re asking the system to understand is the first step toward moving visual AI forward. Then we need more content. More depictions of the world around us — and the diverse perspectives of it. As we’re amassing this new content, take stock of new potential biases and ways to retrain existing open-source datasets. We all have to monitor for inconsistencies and inaccuracies. Persistence and dedication to retraining computer vision models is how we’ll bring AI into 2020.

Using population health analysis to improve patient care brings Sema4 a $1.1 billion valuation

Sema4, the Stamford, Conn.-based digital healthcare company now worth just over $1 billion, takes its name from the system of sending messages via code.

And like its namesake, Sema4 is trying to send messages of its own to the broader healthcare system based on the signals it uncovers in massive datasets of population health that can reveal insights and best practices, according to the company’s founding chief executive, Eric Schadt.

Spun out from the Mt. Sinai Health System in June 2017, Sema4 is the second digital healthcare company in a week to reach a billion dollar valuation from investors (Ro, too, is now worth over $1 billion). In this case, Sema4’s $121 million financing came from BlackRock, Deerfield and Moore Capital, and follows only twelve months after another $120 million institutional financing from investors including Blackstone, Section 32, Oak HC/FT, Decheng, and the Connecticut Innovation Fund.

The company’s ability to attract capital may have something to do with a business model that’s managed to amass nearly 10 million patient records through partnerships with ten major health systems and several hundred thousand more patients through a strategy that has the company offer direct insights to patients as part of enhanced care services.

“My effort centered on… how do we aggregate bigger and bigger sources of data to better inform patients around their health and wellness,” said Schadt. 

Sema4 chief executive Eric Shcadt. Image Credit: Sema4

Sema4 works with physicians to provide analysis of genetic data so doctors can make informed decisions on what care would work best with their patients. “We’re providing a meaningful service on behalf of the physician and it’s a service that the physician wants us to do because they’re generally not adept at the genomics,” said Schadt. 

The company provides screening services for reproductive health and oncology as two of its core competencies, acting as a single point of care to collect and store information in a way that’s easily portable for patients, Schadt said

“We play in the testing arena as a growth hack engine to engage patients and generating high amounts of quality data and seek to engage with them to get to higher scales to build the biggest models to get what [doctors] need on any condition of interest,” he said. 

Sema4 is currently working in three areas, reproductive health, precision oncology, and now COVID-19. In April, the company had no ability to analyze tests for COVID-19, but did have lab space that was certified to perform the necessary analysis. Now, the company can handle15,000 tests per day.

As a result of the round, Andrew Elbardissi, a managing partner at Deerfield, as joined Sema4’s board of directors. Other recent additions to the board include Mike Pellini, the former chief executive of Foundation Medicine and current investor at Section32 (the venture firm launched by former Google Ventures head Bill Maris); former principal deputy commissioner of the Food and Drug Administration, Rachel Sherman; and former Goldman Sachs chief financial officer, Marty Chavez. 

“Sema4 is a leader at the forefront of one of the most exciting intersections in healthcare – the application of technology, AI and machine learning to help improve patient outcomes. We are excited to support this talented management team as Sema4 begins its next phase of growth,” said Will Abacassis, Managing Director at BlackRock, in a statement. 

Goldman Sachs acted as a financial advisor to Sema4 on the transaction.

 

Explorium reels in $31M Series B as data discovery platform grows

In a world with growing amounts of data, finding the right set for a particular machine learning model can be a challenge. Explorium has created a platform to make that an easier task, and today the startup announced a $31 million Series B.

The round was led by Zeev Venture with help from Dynamic Loop, Emerge and F2 capital. Today’s investment brings the total raised to $50 million, according to the company.

CEO and co-founder, Maor Shlomo says the company’s platform is designed to help people find the right data for their model. “The next frontier in analytics will not be about how you fine tune or improve a certain algorithm, it will be how do you find the right data to fit into those algorithms to make them as useful and impactful as possible,” he said.

He says that companies need this more than ever during the pandemic because this can help customers find more relevant data at a time when their historical data might not be useful to help build predictive models. For instance, if you’re a retailer, your historical shopping data won’t be relevant if you are in an area where you can no longer open your store, he says.

“There are so many environmental factors that are now influencing every business problem that organizations are trying to solve that Explorium is becoming this […] layer where you search for data to solve your business problems to fuel your predictive models,” he said.

When the pandemic hit in March, he worried about how it would affect his company, and he put a hold on hiring, but as he saw business increasing in April and May, he decided to accelerate again. The company currently has 87 employees between offices in Israel and the United States and he plans to be at 100 in the next couple of months.

When it comes to hiring, he says he doesn’t try to have hard and fast hiring rules like you have a certain degree or have gone to a certain school. “The only thing that’s important is getting good people hungry to succeed. The more diverse the culture is, the more diverse the group is, we find the more fun it is for people to discover each other and to discover different cultures,” Shlomo explained.

In terms of fundraising, the while the company needs money to fuel its growth, at the same time it still had plenty of money in the bank from last year’s round. “We got into the pandemic and we didn’t know how long it’s going to last, and [early on] we didn’t yet know how it would impact the business. Existing investors were always bullish about the company. We decided to just go with that,” he said.

The company was founded in 2017 and previously raised a $19.1 million Series A round last year.

Advertima rings up $17.5M for computer vision-powered behavioral analytics for in-store retail

Swiss computer vision startup, Advertima, has raised a €15 million Series A (~$17.5M) to build out a machine learning platform for physical retail stores to ‘upgrade’ the shopping experience via real-time shopper behavior analytics. The round is led by existing shareholder, Fortimo Group, a Swiss real estate company.

Fed by visual sensors, Advertima’s platform provides physical retail spaces with a real-time view of what’s going on in store — comprised of AI-powered behavioral and demographic analysis, as shoppers move through the space — with the aim of helping retailers better understand and respond dynamically to customers in store.

The startup calls this its “Human Data Layer” — noting that the tech can support features like smart inventory management and autonomous checkout.

Throw in digital signage (which it also offers) and its platform can be used to serve contextually relevant messaging intended for one or just a few pairs of nearby eyeballs — such as product offers for a particular gender or age bracket, or discounts for families — depending on who’s in proximity of the given digital eye.

Albeit ‘relevancy’ depends upon the calibre of the AI and the quality of the underlying training data. So certainly isn’t a given. Ads that seem to personally address you when you make eye contact, meanwhile, have been a sci-fi staple for years, of course. But the reality of ‘smart’ ads informed by AI analytics could very quickly stray into creepy territory.

An example message shown in a demo video on Advertima’s website isn’t great in this regard — as the system is shown IDing a stick woman and popping up a targeted message that reads: “Hello young woman. All alone?” (uhhh 😬). So retailers plugging such stuff into their stores need to be hyper sensitive to tone and context (and indeed take a robust approach to assessing how accurate the AI is, or isn’t).

Or, well, they could find shoppers fleeing in horror. (tl;dr no one likes to feel watched while they’re shopping. And if the AI misgenders a potential customer that could be a disaster.)

One flashy pledge from Advertima is that its approach to applying AI to guestimate who’s in the shop and what they’re doing is ‘privacy safe’ — with the startup noting there’s no facial recognition nor biometric detection involved in its system, for one thing.

It also specifies that the visual sensors required for the analytics to function do not store any image or video recordings. Instead it claims to “only process minimal anonymized data” — and only evaluate that in “aggregated form”.

“This means that the unintentional identification of a person is technically impossible,” is the top-line claim.

With long-standing data protection laws covering Europe, and EU lawmakers actively considering new rules to wrap around certain applications of artificial intelligence, there’s a legal incentive not to push such tech’s intrusiveness too far (at least for local use-cases). While Switzerland, which is not a Member of the EU (though it is part of the bloc’s single market), also has a reputation for strict domestic privacy laws — so this homegrown startup’s pitch at least reflects that context.

That said, its system appears to generate a “Person ID” (see below screengrab) — so we’ve asked how long it retains these individual-linked IDs for; and whether or not it links (or enables the linking of) the Person ID with any other data that might be gathered from the shopper, such as an email or a device ID. If the Person IDs are persistent it could enable a retailer to re-identify an individual via the Advertima visually tracked behavioral data — and then be in a position to plug these offline shopping behavior ‘insights’ into an identity-linked customer database or link it to an ad profile that’s maintained by a tracking giant or data broker for ad targeting purposes. All of which would be the opposite of ‘privacy safe’ — so we do have questions. We’ll update this report with any response from Advertima to this.

Image credit: Advertima marketing video

Advertima was founded back in 2016 and has so far forged partnerships with Switzerland’s largest retailer, Migros and the international grocer SPAR, to deploy its tech. It says the system is being used by 14 companies across eight countries at this stage.

It says the new funding will go on further developing its platform, and on scaling so the business can better address the global market for smart retail solutions. Although it’s competing in a space that includes Amazon’s cashierless tech so that’s one Goliath-sized big tech competitor to Advertima’s David.

In a press release announcing the Series A it notes it will be ploughing in €10M of its own revenue too — so touts a total spend of €25M over the next two years on building out its platform.

“We see a world where the physical and digital layers are merged to enhance our daily professional and private lives,” said Advertima Co-Founder and CEO, Iman Nahvi, commenting in a statement.

In a blog post announcing the Series A, he also talked up the autonomous store product — suggesting it will “change how people experience grocery shopping, cinemas, DIY stores, and a whole range of retailers”.

“Delivering smart inventory management, autonomous checkout, in-store analytics, and contextual content on smart digital screens will allow grocers and other retailers to maximize the efficiency of their stores, increase their revenues, and generate greater returns per square meter,” he wrote.

“Retailers can actualize an omnichannel strategy to orchestrate better experiences and relationships with their audience. Soon the standard for retailers will be holistically customer-centric: Cashierless checkouts, no lines, individualized experiences, and real-time product recognition for fast, easy, and fun shopping.”

Given that Amazon began licensing its ‘Just Walk Out’ cashierless tech to other retailers earlier this year, and various tech startups have sprung up to chase the potential of similar systems — such as AiFi, Grabango, Standard Cognition and Zippin — Advertima’s global growth ambitions are tempered by plenty of competition.

Physical retail has also taken a battering from the coronavirus pandemic. Although COVID-19 may, paradoxically, drive demand for cashierless tech — as a way to reduce the risk of viral exposure for staff and shoppers. AI technology being applied to eliminate retail jobs does raise wider socioeconomic questions too.

Also commenting in a supporting statement, Fortimo Group founder Remo Bienz added: “It is clear that the rapid digitalisation of our society is going to have an impact on consumer habits, especially in the retail sector. Advertima is at the cutting-edge of technology in the retail space. As a long-standing shareholder, we know how visionary their technology is, but also how it has been successfully adopted by major, global organisations and already generated significant revenues. We’re excited to be part of Advertima’s journey.”

Four steps for drafting an ethical data practices blueprint

In 2019, UnitedHealthcare’s health-services arm, Optum, rolled out a machine learning algorithm to 50 healthcare organizations. With the aid of the software, doctors and nurses were able to monitor patients with diabetes, heart disease and other chronic ailments, as well as help them manage their prescriptions and arrange doctor visits. Optum is now under investigation after research revealed that the algorithm (allegedly) recommends paying more attention to white patients than to sicker Black patients.

Today’s data and analytics leaders are charged with creating value with data. Given their skill set and purview, they are also in the organizationally unique position to be responsible for spearheading ethical data practices. Lacking an operationalizable, scalable and sustainable data ethics framework raises the risk of bad business practices, violations of stakeholder trust, damage to a brand’s reputation, regulatory investigation and lawsuits.

Here are four key practices that chief data officers/scientists and chief analytics officers (CDAOs) should employ when creating their own ethical data and business practice framework.

Identify an existing expert body within your organization to handle data risks

The CDAO must identify and execute on the economic opportunity for analytics, and with opportunity comes risk. Whether the use of data is internal — for instance, increasing customer retention or supply chain efficiencies — or built into customer-facing products and services, these leaders need to explicitly identify and mitigate risk of harm associated with the use of data.

A great way to begin to build ethical data practices is to look to existing groups, such as a data governance board, that already tackles questions of privacy, compliance and cyber-risk, to build a data ethics framework. Dovetailing an ethics framework with existing infrastructure increases the probability of successful and efficient adoption. Alternatively, if no such body exists, a new body should be created with relevant experts from within the organization. The data ethics governing body should be responsible for formalizing data ethics principles and operationalizing those principles for products or processes in development or already deployed.

Ensure that data collection and analysis are appropriately transparent and protect privacy

All analytics and AI projects require a data collection and analysis strategy. Ethical data collection must, at a minimum, include: securing informed consent when collecting data from people, ensuring legal compliance, such as adhering to GDPR, anonymizing personally identifiable information so that it cannot reasonably be reverse-engineered to reveal identities and protecting privacy.

Some of these standards, like privacy protection, do not necessarily have a hard and fast level that must be met. CDAOs need to assess the right balance between what is ethically wise and how their choices affect business outcomes. These standards must then be translated to the responsibilities of product managers who, in turn, must ensure that the front-line data collectors act according to those standards.

CDAOs also must take a stance on algorithmic ethics and transparency. For instance, should an AI-driven search function or recommender system strive for maximum predictive accuracy, providing a best guess as to what the user really wants? Is it ethical to micro-segment, limiting the results or recommendations to what other “similar people” have clicked on in the past? And is it ethical to include results or recommendations that are not, in fact, predictive, but profit-maximizing to some third party? How much algorithmic transparency is appropriate, and how much do users care? A strong ethical blueprint requires tackling these issues systematically and deliberately, rather than pushing these decisions down to individual data scientists and tech developers that lack the training and experience to make these decisions.

Anticipate – and avoid – inequitable outcomes

Division and product managers need guidance on how to anticipate inequitable and biased outcomes. Inequalities and biases can arise due simply to data collection imbalances — for instance, a facial recognition tool that has been trained on 100,000 male faces and 5,000 female faces will likely be differently effective by gender. CDAOs must help ensure balanced and representative data sets.

Other biases are less obvious, but just as important. In 2019, Apple Card and Goldman Sachs were accused of gender bias when extending higher credit lines to men than women. Though Goldman Sachs maintained that creditworthiness — not gender — was the driving factor in credit decisions, the fact that women have historically had fewer opportunities to build credit likely meant that the algorithm favored men.

To mitigate inequities, CDAOs must help tech developers and product managers alike navigate what it means to be fair. While computer science literature offers myriad metrics and definitions of fairness, developers cannot reasonably choose one in the absence of collaborations with the business managers and external experts who can offer deep contextual understanding of how data will eventually be used. Once standards for fairness are chosen, they must also be effectively communicated to data collectors to ensure adherence.

Align organizational structure with the process for identifying ethical risk

CDAOs often build analytics capacity in one of two ways: via a center of excellence, in service to an entire organization, or a more distributed model, with data scientists and analytics investments committed to specific functional areas, such as marketing, finance or operations. Regardless of organizational structure, the processes and rubrics for identifying ethical risk must be clearly communicated and appropriately incentivized.

Key steps include:

  • Clearly establishing accountability by creating linkages from the data ethics body to departments and teams. This can be done by having each department or team designate its own “ethics champion” to monitor ethics issues. Champions need to be able to elevate concerns to the data ethics body, which can advise on mitigation strategies, such as augmenting existing data, improving transparency or creating a new objective function.
  • Ensuring consistent definitions and processes across teams through education and training around data and AI ethics.
  • Broadening teams’ perspectives on how to identify and remediate ethical problems by facilitating collaborations across internal teams and sharing examples and research from other domains.
  • Creating incentives — financial or other recognitions — to build a culture that values the identification and mitigation of ethical risk.

CDAOs are charged with the strategic use and deployment of data to drive revenue with new products and to create greater internal consistencies. Too many business and data leaders today attempt to “be ethical” by simply weighing the pros and cons of decisions as they arise. This short-sighted view creates unnecessary reputational, financial and organizational risk. Just as a strategic approach to data requires a data governance program, good data governance requires an ethics program. Simply put, good data governance is ethical data governance.