Abuse – Product Management Confabulation

August 24, 2022

Deepfakes for all: Uncensored AI art model prompts ethics questions

A new open source AI image generator capable of producing realistic pictures from any text prompt has seen stunningly swift uptake in its first week. Stability AI’s Stable Diffusion, high fidelity but capable of being run on off-the-shelf consumer hardware, is now in use by art generator services like Artbreeder, Pixelz.ai and more. But the model’s unfiltered nature means not all the use has been completely above board.

For the most part, the use cases have been above board. For example, NovelAI has been experimenting with Stable Diffusion to produce art that can accompany the AI-generated stories created by users on its platform. Midjourney has launched a beta that taps Stable Diffusion for greater photorealism.

But Stable Diffusion has also been used for less savory purposes. On the infamous discussion board 4chan, where the model leaked early, several threads are dedicated to AI-generated art of nude celebrities and other forms of generated pornography.

Emad Mostaque, the CEO of Stability AI, called it “unfortunate” that the model leaked on 4chan and stressed that the company was working with “leading ethicists and technologies” on safety and other mechanisms around responsible release. One of these mechanisms is an adjustable AI tool, Safety Classifier, included in the overall Stable Diffusion software package that attempts to detect and block offensive or undesirable images.

However, Safety Classifier — while on by default — can be disabled.

Stable Diffusion is very much new territory. Other AI art-generating systems, like OpenAI’s DALL-E 2, have implemented strict filters for pornographic material. (The license for the open source Stable Diffusion prohibits certain applications, like exploiting minors, but the model itself isn’t fettered on the technical level.) Moreover, many don’t have the ability to create art of public figures, unlike Stable Diffusion. Those two capabilities could be risky when combined, allowing bad actors to create pornographic “deepfakes” that — worst-case scenario — might perpetuate abuse or implicate someone in a crime they didn’t commit.

A deepfake of Emma Watson, created by Stable Diffusion and published on 4chan.

Women, unfortunately, are most likely by far to be the victims of this. A study carried out in 2019 revealed that, of the 90% to 95% of deepfakes that are non-consensual, about 90% are of women. That bodes poorly for the future of these AI systems, according to Ravit Dotan, an AI ethicist at the University of California, Berkeley.

“I worry about other effects of synthetic images of illegal content — that it will exacerbate the illegal behaviors that are portrayed,” Dotan told TechCrunch via email. “E.g., will synthetic child [exploitation] increase the creation of authentic child [exploitation]? Will it increase the number of pedophiles’ attacks?”

Montreal AI Ethics Institute principal researcher Abhishek Gupta shares this view. “We really need to think about the lifecycle of the AI system which includes post-deployment use and monitoring, and think about how we can envision controls that can minimize harms even in worst-case scenarios,” he said. “This is particularly true when a powerful capability [like Stable Diffusion] gets into the wild that can cause real trauma to those against whom such a system might be used, for example, by creating objectionable content in the victim’s likeness.”

Something of a preview played out over the past year when, at the advice of a nurse, a father took pictures of his young child’s swollen genital area and texted them to the nurse’s iPhone. The photo automatically backed up to Google Photos and was flagged by the company’s AI filters as child sexual abuse material, which resulted in the man’s account being disabled and an investigation by the San Francisco Police Department.

If a legitimate photo could trip such a detection system, experts like Dotan say, there’s no reason deepfakes generated by a system like Stable Diffusion couldn’t — and at scale.

“The AI systems that people create, even when they have the best intentions, can be used in harmful ways that they don’t anticipate and can’t prevent,” Dotan said. “I think that developers and researchers often underappreciated this point.”

Of course, the technology to create deepfakes has existed for some time, AI-powered or otherwise. A 2020 report from deepfake detection company Sensity found that hundreds of explicit deepfake videos featuring female celebrities were being uploaded to the world’s biggest pornography websites every month; the report estimated the total number of deepfakes online at around 49,000, over 95% of which were porn. Actresses including Emma Watson, Natalie Portman, Billie Eilish and Taylor Swift have been the targets of deepfakes since AI-powered face-swapping tools entered the mainstream several years ago, and some, including Kristen Bell, have spoken out against what they view as sexual exploitation.

But Stable Diffusion represents a newer generation of systems that can create incredibly — if not perfectly — convincing fake images with minimal work by the user. It’s also easy to install, requiring no more than a few setup files and a graphics card costing several hundred dollars on the high end. Work is underway on even more efficient versions of the system that can run on an M1 MacBook.

A Kylie Kardashian deepfake posted to 4chan.

Sebastian Berns, a Ph.D. researcher in the AI group at Queen Mary University of London, thinks the automation and the possibility to scale up customized image generation are the big differences with systems like Stable Diffusion — and main problems. “Most harmful imagery can already be produced with conventional methods but is manual and requires a lot of effort,” he said. “A model that can produce near-photorealistic footage may give way to personalized blackmail attacks on individuals.”

Berns fears that personal photos scraped from social media could be used to condition Stable Diffusion or any such model to generate targeted pornographic imagery or images depicting illegal acts. There’s certainly precedent. After reporting on the rape of an eight-year-old Kashmiri girl in 2018, Indian investigative journalist Rana Ayyub became the target of Indian nationalist trolls, some of whom created deepfake porn with her face on another person’s body. The deepfake was shared by the leader of the nationalist political party BJP, and the harassment Ayyub received as a result became so bad the United Nations had to intervene.

“Stable Diffusion offers enough customization to send out automated threats against individuals to either pay or risk having fake but potentially damaging footage being published,” Berns continued. “We already see people being extorted after their webcam was accessed remotely. That infiltration step might not be necessary anymore.”

With Stable Diffusion out in the wild and already being used to generate pornography — some non-consensual — it might become incumbent on image hosts to take action. TechCrunch reached out to one of the major adult content platforms, OnlyFans, but didn’t hear back as of publication time. A spokesperson for Patreon, which also allows adult content, noted that the company has a policy against deepfakes and disallows images that “repurpose celebrities’ likenesses and place non-adult content into an adult context.”

If history is any indication, however, enforcement will likely be uneven — in part because few laws specifically protect against deepfaking as it relates to pornography. And even if the threat of legal action pulls some sites dedicated to objectionable AI-generated content under, there’s nothing to prevent new ones from popping up.

In other words, Gupta says, it’s a brave new world.

“Creative and malicious users can abuse the capabilities [of Stable Diffusion] to generate subjectively objectionable content at scale, using minimal resources to run inference — which is cheaper than training the entire model — and then publish them in venues like Reddit and 4chan to drive traffic and hack attention,” Gupta said. “There is a lot at stake when such capabilities escape out “into the wild” where controls such as API rate limits, safety controls on the kinds of outputs returned from the system are no longer applicable.”

April 25, 2022

Will advertisers flee a ‘free speech’ Twitter?

New Twitter owner Elon Musk has emphasized his belief that “free speech” is critical to Twitter’s future, even noting in the press release announcing the deal today that “free speech is the bedrock of a functioning democracy.” But there is one significant complication with running an unmoderated (or only lightly moderated) social platform supported by advertisers: and that’s the advertisers themselves. They might leave!

If Twitter were to turn back the dials on content moderation, it could allow more bullying, violent speech, hate speech, misinformation, and other abusive content to gain ground. This may make Twitter less palatable to newcomers who were already wary about posting in a “public square” — an area that impacts Twitter’s ongoing concerns with flat user growth. But it could also disincentivize advertisers from investing their budgets with the platform.

Twitter over the years, has worked steadily to make the site a less abusive place, to varying degrees of success.

The company has expanded its hate speech rules to include more nuance about what is and is not allowed on the app. In 2020, for example, it added detail to its ban on dehumanizing speech to include other areas like age, disability, and even disease. The latter was a timely addition as some Twitter users had begun making hateful and even racists remarks related to the spread of Covid-19. Last year, it announced it was looking to further improve its content moderation systems by analyzing the reports that users sent in to see where its existing rules could be falling short in terms of user protections. It has touted in biannual reports when its policies have decreased the amount of hate speech on its platform.

According to its latest, Twitter took action on 4.8 million unique accounts from January to June 2021, which includes targeting bots, spam and other bad actors, into addition to rules violations.

With Musk in charge, it’s a wonder where these sorts of efforts may end up. If anything, the pendulum will now swing in the other direction — away from strict moderation. And that could be bad for business.

Twitter accepts Elon Musk’s $44B acquisition offer

Advertisers are fairly allergic to having their brand’s name appear alongside hate speech and abusive or dangerous content on social platforms. They’ve proven this time and time again with boycotts for this specific reason.

In 2017, for example, brands and publishers in Europe said they would pull their advertising dollars from Google’s YouTube after their ads were revealed as being displayed alongside videos promoting terrorism and anti-Semitism. Google quickly responded to give the advertisers more control over their programmatic buys. And when top YouTube creators stepped out of line at other times, Google would side with the advertisers’ concerns over the matter. The video platform would immediately demonetize and pull ads from any creator experiencing a backlash for breaching its guidelines over what’s considered “advertiser-friendly” content.

In more recent years, larger advertisers began to throw their weight around in an effort to address Facebook’s content moderation policies. Big-name brands like Verizon, Boeing, Microsoft, Reebok, Patagonia, Hershey’s, Eddie Bauer, Adidas, Levi Strauss, Pfizer, HP, Best Buy, Denny’s, Unilever, and others joined an advertising boycott of the social network and Instagram in an effort to push Facebook (now called Meta) to increase its enforcement in the area of hate speech.

Yes, increase. Not decrease.

Here’s Unilver’s statement from that time, to give you an idea: “Based on the current polarization and the election that we are having in the U.S., there needs to be much more enforcement in the area of hate speech.”

The company was also boycotting Twitter at the time.

In total, over 1,110 companies including major corporations and small businesses had joined that particular boycott. Other corporations including Coca-Cola, Starbucks, Clorox, and Ford also temporarily paused their social ads but didn’t join the full-scale boycott.

Now imagine how these same brands will react if a “free speech” free-for-all really does take place on Twitter. The network is already isn’t a top priority given its comparatively smaller size relative to others like Facebook, Instagram and TikTok. And if it was home to divisive and dangerous speech? It might just be time to go.

This is not a minor concern for Twitter to weigh as it moves forward. The company has been experimenting with a number of new monetizable products — including Twitter Spaces and subscriptions — but its business today is almost entirely supported via advertising. During its fourth-quarter earnings, Twitter pulled in $1.57 billion in revenue — a miss on expectations of $1.58 billion because ad spending had slowed slightly in the quarter. Per eMarketer, advertising accounted for 89% of Twitter’s 2021 revenue and it forecast that would grow to above 91% by 2023. Ads are how Twitter even exists.

In other words, advertisers may have a lot of power as Twitter moves forward under Musk. If the company does, in fact, tweak or revamp its moderation policies, reinstate banned users (although not Trump, apparently), or if allows hate speech and other dangerous and abusive content to return, then advertisers may leave.

Trump says he won’t return to Twitter if his account is reinstated

And although Musk may very well be a billionaire, it’s not likely he wants to self-fund Twitter to keep it afloat.

Twitter is due to report its first-quarter earnings on Thursday. But it will not host a conference call, it said.

April 25, 2022

Will advertisers flee a ‘free speech’ Twitter?

Twitter over the years, has worked steadily to make the site a less abusive place, to varying degrees of success.

According to its latest, Twitter took action on 4.8 million unique accounts from January to June 2021, which includes targeting bots, spam and other bad actors, into addition to rules violations.

Twitter accepts Elon Musk’s $44B acquisition offer

Yes, increase. Not decrease.

The company was also boycotting Twitter at the time.

Trump says he won’t return to Twitter if his account is reinstated

And although Musk may very well be a billionaire, it’s not likely he wants to self-fund Twitter to keep it afloat.

Twitter is due to report its first-quarter earnings on Thursday. But it will not host a conference call, it said.

April 11, 2022

Kickstarter will now hide reported comments pending review

Kickstarter announced today it will now automatically hide from public view comments reported by creators until its Trust and Safety team has reviewed them and made a decision as to whether the comment should remain or be deleted, in an effort to curb the number of abusive comments visible on the platform.

Creators will also now have the option to select a reason for reporting a comment when flagging it for review, which Kickstarter hopes will allow its team to address abuse more quickly. When revoking commenting privileges, Kickstarter will provide backers with more specific information about when they can expect those privileges to be restored, it says.

“With this work, we’re being careful to care for the health of the whole system—creators need to feel safe, and backers need to be able to raise questions and concerns,” Kickstarter said in its blog post outlining the changes.

While there is value in hiding potentially abusive comments for further review, there’s also the possibility that such a system itself could be abused if the reviews don’t take place quickly enough. An unscrupulous creator using the platform could leverage the reporting feature to at least temporarily hide their negative comments, or those questioning the project’s viability or safety, while continuing to crowdfund, perhaps.

In order to further improve security, Kickstarter said it will work with the Kickstarter Community Advisory Council, launching in May, which will be made up of creators who are knowledgeable about a range of fields. Applications, which closed on April 6, were open to creators who had run at least one campaign and had an account for at least a year (and were “in good standing”). Members must commit to serving at least a year on the council and attending all six of the year’s two-hour meetings. In exchange, members will receive a $5,000 honorarium for the year. Kickstarter has not yet announced the members of this council.

The council will have “a special focus on helping us prioritize the development of new features that help ensure that the platform is as useful, welcoming, and inclusive as it can possibly be,” Kickstarter said. The council’s responsibilities will also include providing input as the platform navigates moving to blockchain, a plan which prompted backlash late last year, particularly when it came to concerns over the environmental impact of blockchain technology’s energy usage.

Kickstarter plans to move its crowdfunding platform to the blockchain

February 15, 2022

Twitter launches beta test of anti-abuse tool ‘Safety Mode,’ adds prompts to enable it

Twitter is broadening access to a feature called Safety Mode, designed to give users a set of tools to defend themselves against the toxicity and abuse that is still far too often a problem on its platform. First introduced to a small group of testers last September, Safety Mode will today launch into beta for more users across English-speaking markets including the U.S., U.K., Canada, Australia, Ireland, and New Zealand.

The company says the expanded access will allow it to collect more insights into how well Safety Mode works and learn what sort of improvements still need to be made. Alongside the rollout, Safety Mode will also prompt users when they may need to enable it, Twitter notes.

As a public social platform, Twitter faces a continual struggle with conversation health. Over the years, it’s rolled out a number of tweaks and updates in an attempt to address this issue — including features that would automatically hide unpleasant and insulting replies behind an extra click; allow users to limit who could reply to their tweets; let users hide themselves from search; and warn users about conversations that are starting to go off the rails, among other things.

But Safety Mode is more of a defensive tool than one designed to proactively nudge conversations in the right direction.

It works by automatically blocking accounts for 7 days which are replying to the original poster with harmful language or sending uninvited, repetitive replies — like insults and hateful remarks or mentions. During the time that Safety Mode is enabled, those blocked accounts will be unable to follow the original poster’s Twitter account, see their tweets and replies, or send them Direct Messages.

Twitter’s algorithms determine which accounts to temporarily block by assessing the language used in the replies and examining the relationship between the tweet’s author and those replying. If the poster follows the replier or interacts with the replier frequently, for example, the account won’t be blocked.

The idea with Safety Mode is to give users under attack a way to quickly put up a defensive system without having to manually block each account that’s harassing them — a near impossibility when a tweet goes viral, exposing the poster to elevated levels of abuse. This situation is one that happens not only to celebs and public figures whose “cancelations” make the headlines, but also to female journalists, members of marginalized communities, and even everyday people, at times.

It’s also not a problem unique to Twitter — Instagram launched a similar set of anti-abuse features last year, after several England footballers were viciously harassed by angry fans after the team’s defeat in the Euro 2020 final.

Based on early testers’ feedback, Twitter learned people want more help identifying when an attack may be getting underway, it says. As a result, the company today says the feature will also now prompt users to enable it when the system detects potentially harmful or uninvited replies. These prompts can appear in the user’s Home Timeline or as a device notification if the user is not currently on Twitter. This should help the user from having to dig around in Twitter’s Settings to locate the feature.

Image Credits: Twitter

Previously, Safety Mode was tested by 750 users during the early trials. It will now roll out the beta to around 50% of users (randomly selected) in the supported markets. It says it’s exploring how those users may give Twitter feedback directly in the app.

Twitter has not shared when it plans to make Safety Mode publicly available to its global users.

September 8, 2021

Spotify playlist curators complain about ongoing abuse that favors bad actors over innocent parties

A number of Spotify playlist curators are complaining that the streaming music company is not addressing the ongoing issue of playlist abuse, which sees bad actors reporting playlists that have gained a following in order to give their own playlists better visibility. Currently, playlists created by Spotify users can be reported in the app for a variety of reasons — like sexual, violent, dangerous, deceptive, or hateful content, among other things. When a report is submitted, the playlist in question will have its metadata immediately removed, including its title, description, and custom image. There is no internal review process that verifies the report is legitimate before the metadata is removed.

Bad actors have learned how to abuse this system to give themselves an advantage. If they see a rival playlist has more users than their own, they will report their competitors in hopes of giving their playlist a more prominent ranking in search results.

According to the curators affected by this problem, there is no limit to the number of reports these bad actors can submit, either. The curators complain that their playlists are being reported daily, and often multiple times per day.

The problem is not new. Users have been complaining about playlist abuse for years. A thread on Spotify’s community forum about this problem is now some 30 pages deep, in fact, and has accumulated over 330 votes. Victims of this type of harassment have also repeatedly posted to social media about Spotify’s broken system to raise awareness of the problem more publicly. For example, one curator last year noted their playlist had been reported over 2,000 times, and said they were getting a new email about the reports nearly every minute. That’s a common problem and one that seems to indicate bad actors are leveraging bots to submit their reports.

Hi @askmikewarner looking for help So many curators and artists are suffering because of constant, groundless playlist reports on @Spotify. Some are being hit systematically and repeatedly. @SpotifyCares and distributors are aware of what's going on, but nothing is being done!

— Andy Salvanos (@AndySalvanos) November 30, 2020

Many curators say they’ve repeatedly reached out to Spotify for help with this issue and were given no assistance.

Curators can only reply to the report emails from Spotify to appeal the takedown, but they don’t always receive a response. When they ask Spotify for help with this issue, the company only says that it’s working on a solution.

While Spotify may suspend the account that abused the system when a report is deemed false, the bad actors simply create new accounts to continue the abuse. Curators on Spotify’s community forums suggested that an easy fix to the bot-driven abuse would be to restrict accounts from being able to report playlists until their accounts had accumulated 10 hours of streaming music or podcasts. This could help to ensure they were a real person before they gained permission to report abuse.

One curator, who maintains hundreds of playlists, said the problem had gotten so bad that they created an iOS app to continually monitor their playlists for this sort of abuse and to reinstate any metadata once a takedown was detected. Another has written code to monitor for report emails, and uses the Spotify API to automatically fix their metadata after the false reports. But not all curators have the ability to build an app or script of their own to deal with this situation.

Image Credits: Spotify (screenshot of reporting flow)

TechCrunch asked Spotify what it planned to do about this problem, but the company declined to provide specific details.

“As a matter of practice, we will continue to disable accounts that we suspect are abusing our reporting tool. We are also actively working to enhance our processes to handle any suspected abusive reports,” a Spotify spokesperson told us.

The company said it is currently testing several different improvements to the process to curb the abuse, but would not say what those tests may include, or whether tests were internal or external. It could not provide any ballpark sense of when its reporting system would be updated with these fixes, either. When pressed, the company said it doesn’t share details about specific security measures publicly as a rule, as doing so could make abuse of its systems more effective.

Often, playlists are curated by independent artists and labels who are looking to promote themselves and get their music discovered, only to have their work taken down immediately, without any sort of review process that could sort legitimate reports from bot-driven abuse.

Curators complain that Spotify has been dismissing their cries for help for far too long, and Spotify’s vague and non-committal response about a coming solution only validates those complaints further.

September 3, 2021

Apple delays plans to roll out CSAM detection in iOS 15

Apple has delayed plans to roll out its child sexual abuse (CSAM) detection technology that it chaotically announced last month, citing feedback from customers and policy groups.

That feedback, if you recall, has been largely negative. The Electronic Frontier Foundation said this week it had amassed more than 25,000 signatures from consumers. On top of that, close to 100 policy and rights groups, including the American Civil Liberties Union, also called on Apple to abandon plans to roll out the technology.

In a statement on Friday morning, Apple told TechCrunch:

“Last month we announced plans for features intended to help protect children from predators who use communication tools to recruit and exploit them, and limit the spread of Child Sexual Abuse Material. Based on feedback from customers, advocacy groups, researchers and others, we have decided to take additional time over the coming months to collect input and make improvements before releasing these critically important child safety features.”

Apple’s so-called NeuralHash technology is designed to identify known CSAM on a user’s device without having to possess the image or knowing the contents of the image. Because a user’s photos stored in iCloud are end-to-end encrypted so that even Apple can’t access the data, NeuralHash instead scans for known CSAM on a user’s device, which Apple claims is more privacy-friendly than the current blanket scanning that cloud providers use.

But security experts and privacy advocates have expressed concern that the system could be abused by highly resourced actors, like governments, to implicate innocent victims or to manipulate the system to detect other materials that authoritarian nation states find objectionable.

Within a few weeks of announcing the technology, researchers said they were able to create “hash collisions” using NeuralHash, effectively tricking the system into thinking two entirely different images were the same.

iOS 15 is expected out later in the next few weeks.

Read more:

June 16, 2021

Google updates its kids online safety curriculum with lessons on gaming, video and more

Google announced today it’s updating and expanding its digital safety and citizenship curriculum called Be Internet Awesome, which is aimed at helping school-aged children learn to navigate the internet responsibly. First introduced four years ago, the curriculum now reaches 30 countries and millions of kids, says Google. In the update rolling out today, Google has added nearly a dozen more lessons for parents and educators that tackle areas like online gaming, search engines, video consumption, online empathy, cyberbullying and more.

The company says it had commissioned the University of New Hampshire’s Crimes Against Children Research Center to evaluate its existing program, which had last received a significant update back in 2019, when it added lessons that focused on teaching kids to spot disinformation and fake news.

The review found that program did help children in areas like dealing with cyberbullying, online civility and website safety, but recommended improvements in other areas.

Google then partnered with online safety experts like Committee for Children and The Net Safety Collaborative to revise its teaching materials. As a result, it now has lessons tailored to specific age groups and grade levels, and has expanded its array of subjects and set of family resources.

The new lessons include guidance around online gaming, search engines and video consumption, as well as social-emotional learning lessons aimed at helping students address cyberbullying and online harassment.

For example, some of the new lessons discuss search media literacy — meaning, learning how to use search engines like Google’s and evaluating the links and results it returns, as a part of an update to the program’s existing media literacy materials.

Other lessons address issues like practicing empathy online, showing kindness, as well as what to do when you see something upsetting or inappropriate, including cyberbullying.

Concepts related to online gaming are weaved into the new lessons, too, as, today, kids have a lot of their social interactions in online games which often feature ways to interact with other players in real-time and chat.

Here, kids are presented with ideas related to being able to verify an online gamer’s identity — are they really another kid, for example? The materials also explain what sort of private information should not be shared with people online.

Image Credits: Google

Among the new family resources, the updated curriculum now points parents to the recently launched online hub, families.google, which offers a number of tips and information about tools to help families manage their tech usage.

For example, Google updated its Family Link app that lets parents set controls around what apps can be used and when, and view activity reports on screen time usage. It also rolled out parental control features on YouTube earlier this year, aimed at families with tweens and teens who are too old for a YouTube Kids account, but still too young for an entirely unsupervised experience.

Google says the updated curriculum is available today to parents, families, teachers and educators, via the Be Internet Awesome website.

June 15, 2021

Twitter is eyeing new anti-abuse tools to give users more control over mentions

Twitter is looking at adding new features that could help users who are facing abusive situations on its platform as a result of unwanted attention pile-ons, such as when a tweet goes viral for a reason they didn’t expect and a full firehose of counter tweets get blasted their way.

Racist abuse also remains a major problem on Twitter’s platform.

The social media giant says it’s toying with providing users with more controls over the @mention feature to help people “control unwanted attention” as privacy engineer, Dominic Camozzi, puts it.

The issue is that Twitter’s notification system will alert a user when they’ve been directly tagged in a tweet — drawing their attention to the contents. That’s great if the tweet is nice or interesting. But if the contents is abusive it’s a shortcut to scale hateful cyberbullying.

Twitter is badged these latest anti-abuse ideas as “early concepts” — and encouraging users to submit feedback as it considers what changes it might make.

Sometimes you want to talk, and sometimes you just … don't.

Check out these early concepts that could help control unwanted attention on Twitter.

Feedback, especially at this beginning stage, is invited (and wanted)! pic.twitter.com/6SpzqiwFlL

— Dominic Camozzi (@_dcrc_) June 14, 2021

Potential features it’s considering include letting users ‘unmention’ themselves — i.e. remove their name from another’s tweet so they’re no longer tagged in it (and any ongoing chatter around it won’t keep appearing in their mentions feed).

It’s also considering making an unmention action more powerful in instances where an account that a user doesn’t follow mentions them — by providing a special notification to “highlight potential unwanted situations”.

If the user then goes ahead and unmentions themselves Twitter envisages removing the ability of the tweet-composer to tag them again in future — which looks like it could be a strong tool against strangers who abuse @mentions.

Twitter is also considering adding settings that would let users restrict certain accounts from mentioning them entirely. Which sounds like it would have come in pretty handy when president Trump was on the platform (assuming the setting could be deployed against public figures).

Twitter permanently bans President Trump

Twitter also says it’s looking at adding a switch that can be flipped to prevent anyone on the platform from @-ing you — for a period of one day; three days; or seven days. So basically a ‘total peace and quiet’ mode.

It says it wants to make changes in this area that can work together to help users by stopping “the situation from escalating further” — such as by providing users with notifications when they’re getting lots of mentions, combined with the ability to easily review the tweets in question and change their settings to shield themselves (e.g. by blocking all mentions for a day or longer).

The known problem of online troll armies coordinating targeted attacks against Twitter users means it can take disproportionate effort for the object of a hate pile-on to shield themselves from the abuse of so many strangers.

Individually blocking abusive accounts or muting specific tweets does not scale in instances when there may be hundreds — or even thousands — of accounts and tweets involved in the targeted abuse.

For now, it remains to be seen whether or not Twitter will move forward and implement the exact features it’s showing off via Camozzi’s thread.

A Twitter spokeswoman confirmed the concepts are “a design mock” and “still in the early stages of design and research”. But she added: “We’re excited about community feedback even at this early stage.”

The company will need to consider whether the proposed features might introduce wider complications on the service. (Such as, for example, what would happen to automatically scheduled tweets that include the Twitter handle of someone who subsequently flips the ‘block all mentions’ setting; does that prevent the tweet from going out entirely or just have it tweet out but without the person’s handle, potentially lacking core context?)

Nonetheless, those are small details and it’s very welcome that Twitter is looking at ways to expand the utility of the tools users can use to protect themselves from abuse — i.e. beyond the existing, still fairly blunt, anti-abuse features (like block, mute and report tweet).

Co-ordinated trolling attacks have, for years, been an unwanted ‘feature’ of Twitter’s platform and the company has frequently been criticized for not doing enough to prevent harassment and abuse.

The simple fact that Twitter is still looking for ways to provide users with better tools to prevent hate pile-ons — here in mid 2021 — is a tacit acknowledgment of its wider failure to clear abusers off its platform. Despite repeated calls for it to act.

A Google search for “* leaves Twitter after abuse” returns numerous examples of high profile Twitter users quitting the platform after feeling unable to deal with waves of abuse — several from this year alone (including a number of footballers targeted with racist tweets).

Other examples date back as long ago as 2013, underlining how Twitter has repeatedly failed to get a handle on its abuse problem, leaving users to suffer at the hands of trolls for well over a decade (or, well, just quit the service entirely).

One recent high profile exit was the model Chrissy Teigen — who had been a long time Twitter user, spending ten years on the platform — but who pulled the plug on her account in March, writing in her final tweets that she was “deeply bruised” and that the platform “no longer serves me positively as it serves me negatively”.

A number of soccer players in the UK have also been campaigning against racism on social media this year — organizing a boycott of services to amp up pressure on companies like Twitter to deal with racist abusers.

At least 70 racial slurs on my social accounts counted so far. For those working to make me feel any worse than I already do, good luck trying

— Marcus Rashford MBE (@MarcusRashford) May 26, 2021

While public figures who use social media may be more likely to face higher levels of abusive online trolling than other types of users, it’s a problem that isn’t limited to users with a public profile. Racist abuse, for example, remains a general problem on Twitter. And the examples of celebrity users quitting over abuse that are visible via Google are certainly just the tip of the iceberg.

It goes without saying that it’s terrible for Twitter’s business if highly engaged users feel forced to abandon the service in despair.

The company knows it has a problem. As far back as 2018 it said it was looking for ways to improve “conversational health” on its platform — as well as, more recently, expanding its policies and enforcement around hateful and abusive tweets.

It has also added some strategic friction to try to nudge users to be more thoughtful and take some of the heat out of outrage cycles — such as encouraging users to read an article before directly retweeting it.

Perhaps most notably it has banned some high profile abusers of its service — including, at long last, president troll Trump himself earlier this year.

A number of other notorious trolls have also been booted over the years, although typically only after Twitter had allowed them to carry on coordinating abuse of others via its service, failing to promptly and vigorously enforce its policies against hateful conduct — letting the trolls get away with seeing how far they could push their luck — until the last.

By failing to get a proper handle on abusive use of its platform for so long, Twitter has created a toxic legacy out of its own mismanagement — one that continues to land it unwanted attention from high profile users who might otherwise be key ambassadors for its service.

Sentropy launches tool for people to protect themselves from social media abuse, starting with Twitter

Twitter claims increased enforcement of hate speech and abuse policies in last half of 2019

Twitter launches a way to report abusive use of its Lists feature

Twitter launches its anti-abuse filter for Direct Messages

June 6, 2021

Google’s Gradient Ventures leads $8.2M Series A for Vault Platform’s misconduct reporting SaaS

Fixing workplace misconduct reporting is a mission that’s snagged London-based Vault Platform backing from Google’s AI focused fund, Gradient Ventures, which is the lead investor in an $8.2 million Series A that’s being announced today.

Other investors joining the round are Illuminate Financial, along with existing investors including Kindred Capital and Angular Ventures. Its $4.2M seed round was closed back in 2019.

Vault sells a suite of SaaS tools to enterprise-sized or large/scale-up companies to support them to pro-actively manage internal ethics and integrity issues. As well as tools for staff to report issues, data and analytics is baked into the platform — so it can support with customers’ wider audit and compliance requirements.

In an interview with TechCrunch, co-founder and CEO Neta Meidav said that as well as being wholly on board with the overarching mission to upgrade legacy reporting tools like hotlines provided to staff to try to surface conduct-related workplace risks (be that bullying and harassment; racism and sexism; or bribery, corruption and fraud), as you might expect Gradient Ventures was interested in the potential for applying AI to further enhance Vault’s SaaS-based reporting tool.

A feature of its current platform, called ‘GoTogether’, consists of an escrow system that allows users to submit misconduct reports to the relevant internal bodies but only if they are not the first or only person to have made a report about the same person — the idea being that can help encourage staff (or outsiders, where open reporting is enabled) to report concerns they may otherwise hesitate to, for various reasons.

Vault now wants to expand the feature’s capabilities so it can be used to proactively surface problematic conduct that may not just relate to a particular individual but may even affect a whole team or division — by using natural language processing to help spot patterns and potential linkages in the kind of activity being reported.

“Our algorithms today match on an alleged perpetrator’s identity. However many events that people might report on are not related to a specific person — they can be more descriptive,” explains Meidav. “For example if you are experiencing some irregularities in accounting in your department, for example, and you’re suspecting that there is some sort of corruption or fraudulent activity happening.”

“If you think about the greatest [workplace misconduct] disasters and crises that happened in recent years — the Dieselgate story at Volkswagen, what happened in Boeing — the common denominator in all these cases is that there’s been some sort of a serious ethical breach or failure which was observed by several people within the organization in remote parts of the organization. And the dots weren’t connected,” she goes on. “So the capacity we’re currently building and increasing — building upon what we already have with GoTogether — is the ability to connect on these repeated events and be able to connect and understand and read the human input. And connect the dots when repeated events are happening — alerting companies’ boards that there is a certain ‘hot pocket’ that they need to go and investigate.

“That would save companies from great risk, great cost, and essentially could prevent huge loss. Not only financial but reputational, sometimes it’s even loss to human lives… That’s where we’re getting to and what we’re aiming to achieve.”

There is the question of how defensible Vault’s GoTogether feature is — how easily it could be copied — given you can’t patent an idea. So baking in AI smarts may be a way to layer added sophistication to try to maintain a competitive edge.

“There’s some very sophisticated, unique technology there in the backend so we are continuing to invest in this side of our technology. And Gradient’s investment and the specific we’re receiving from Google now will only increase that element and that side of our business,” says Meidav when we ask about defensibility.

Commenting on the funding in a statement, Gradient Ventures founder and managing partner, Anna Patterson, added: “Vault tackles an important space with an innovative and timely solution. Vault’s application provides organizations with a data-driven approach to tackling challenges like occupational fraud, bribery or corruption incidents, safety failures and misconduct. Given their impressive team, technology, and customer traction, they are poised to improve the modern workplace.”

The London-based startup was only founded in 2018 — and while it’s most keen to talk about disrupting legacy hotline systems, which offer only a linear and passive conduit for misconduct reporting, there are a number of other startups playing in the same space. Examples include the likes of LA-based AllVoices, YC-backed Whispli, Hootsworth and Spot to name a few.

Competition seems likely to continue to increase as regulatory requirements around workplace reporting keep stepping up.

AllVoices raises $3 million to build a platform for anonymous harassment and bias reporting

The incoming EU Whistleblower Protection Directive is one piece of regulation Vault expects will increase demand for smarter compliance solutions — aka “TrustTech”, as it seeks to badge it — as it will require companies of more than 250 employees to have a reporting solution in place by the end of December 2021, encouraging European businesses to cast around for tools to help shrink their misconduct-related risk.

She also suggests a platform solution can help bridge gaps between different internal teams that may need to be involved in addressing complaints, as well as helping to speed up internal investigations by offering the ability to chat anonymously with the original reporter.

Meidav also flags the rising attention US regulators are giving to workplace misconduct reporting — noting some recent massive awards by the SEC to external whistleblowers, such as the $28M paid out to a single whistleblower earlier this year (in relation to the Panasonic Avionics consultant corruption case).

She also argues that growing numbers of companies going public (such as via the SPAC trend, where there will have been reduced regulatory scrutiny ahead of the ‘blank check’ IPO) raises reporting requirements generally — meaning, again, more companies will need to have in place a system operated by a third party which allows anonymous and non-anonymous reporting. (And, well, we can only speculate whether companies going public by SPAC may be in greater need of misconduct reporting services vs companies that choose to take a more traditional and scrutinized route to market… )

“Just a few years back I had to convince investors that this category it really is a category — and fast forward to 2021, congratulations! We have a market here. It’s a growing category and there is competition in this space,” says Meidav.

“What truly differentiates Vault is that we did not just focus on digitizing an old legacy process. We focused on leveraging technology to truly empower more misconduct to surface internally and for employees to speak up in ways that weren’t available for them before. GoTogether is truly unique as well as the things that we’re doing on the operational side for a company — such as collaboration.”

She gives an example of how a customer in the oil and gas sector configured the platform to make use of an anonymous chat feature in Vault’s app so they could provide employees with a secure direct-line to company leadership.

“They’ve utilizing the anonymous chat that the app enables for people to have a direct line to leadership,” she says. “That’s incredible. That is such a progress, forward looking way to be utilizing this tool.”

Vault Platform’s suite of tools include an employee app and a Resolution Hub for compliance, HR, risk and legal teams (Image credits: Vault Platform)

Meidav says Vault has around 30 customers at this stage, split between the US and EU — its core regions of focus.

And while its platform is geared towards enterprises, its early customer base includes a fair number of scale-ups — with familiar names like Lemonade, Airbnb, Kavak, G2 and OVO Energy on the list.

Scale ups may be natural customers for this sort of product given the huge pressures that can be brought to bear upon company culture as a startup switches to expanding headcount very rapidly, per Meidav.

“They are the early adopters and they are also very much sensitive to events such as these kind of [workplace] scandals as it can impact them greatly… as well as the fact that when a company goes through a hyper growth — and usually you see hyper growth happening in tech companies more than in any other type of sector — hyper growth is at time when you really, as management, as leadership, it’s really important to safeguard your culture,” she suggests.

“Because it changes very, very quickly and these changes can lead to all sorts of things — and it’s really important that leadership is on top of it. So when a company goes through hyper growth it’s an excellent time for them to incorporate a tool such as Vault. As well as the fact that every company that even thinks of an IPO in the coming months or years will do very well to put a tool like Vault in place.”

Expanding Vault’s own team is also on the cards after this Series A close, as it guns for the next phase of growth for its own business. Presumably, though, it’s not short of a misconduct reporting solution.

Vault Platform raises $4.2M to fix workplace misconduct reporting