Amazon adds ‘Alexa, delete what I said today’ command

Buried in the this morning’s Echo Show 5 announcement are a couple of new security features worth highlighting. In addition to the inclusion of a built-in camera shutter on the new smart display are a pair of Echo commands that let users delete voice recordings with an Alexa command.

“Alexa, delete what I said today” rolls out to Alexa users starting today. “Alexa, delete what I just said” will arriving for U.S. in the coming weeks and other countries where the smart assistant is available in the next month. Amazon has offered the ability to delete recordings via the app for some time now, but this brings the functionality to the front with a simple via command.

The process works similarly to deleting recordings via the app, starting the deletion process immediately.

While the company has long contended that it doesn’t actively record conversations and protects records on encrypted servers, the always-on nature of Echo and similar smart home products have raised alarms among security analysts and regular users alike.

The addition of the feature is clearly a response to such pushback and an attempt for Amazon to let users be a bit more proactive about controlling how Amazon treats their conversations.

 

 

Amazon defeated shareholder’s vote on facial recognition by a wide margin

Efforts by shareholders to instruct Amazon to stop selling its facial recognition technology to government customers failed by a wide margin, according to a new corporate filing with regulators.

About 2.4 percent of shareholders voted for the proposal, a fraction of the 50 percent necessary to pass. The measure needed to reach a 5 percent threshold for it to be re-introduced to shareholders again.

A second proposal to ask Amazon to carry out an independent human rights assessment of its facial recognition technology also failed. About 27.5 percent of shareholders voted in favor of the proposal.

Amazon has come under fire for its facial recognition tech, Rekognition following accusations of bias and that it’s inaccurate, which critics say can be used to racially discriminate against minorities.

The ALCU first raised “profound” concerns with Rekognition last year after it was installed at airports, public places and by police. The company has also pitched the product to Immigration and Customs Enforcement.

Although there was growing support from civil liberties groups like the ACLU as well as the public, senior Amazon staff have a majority stake and voting rights — making any dissent from outside shareholders difficult. Amazon founder and chief executive Jeff Bezos retains 12 percent of the company’s stock. The company’s top four institutional shareholders collectively hold about the same amount of voting rights as Bezos.

Read more:

EU-US Privacy Shield complaint to be heard by Europe’s top court in July

A legal challenge to the EU-US Privacy Shield, a mechanism used by thousands of companies to authorize data transfers from the European Union to the US, will be heard by Europe’s top court this summer.

The General Court of the EU has set a date of July 1 and 2 to hear the complaint brought by French digital rights group, La Quadrature du Net, against the European Commission’s renegotiated data transfer agreement which argues the arrangement is still incompatible with EU law on account of US government mass surveillance practices.

Privacy Shield was only adopted three years ago after its forerunner, Safe Harbor, was struck down by the European Court of Justice in 2015 following the 2013 exposé of US intelligence agencies’ access to personal data, revealed by NSA whistleblower Edward Snowden.

The renegotiated arrangement tightened some elements, and made the mechanism subject to annual reviews by the Commission to ensure it functions as intended. But even before it was adopted it faced fierce criticism — with data protection and privacy experts couching it as an attempt to put lipstick on the same old EU-law breaching pig.

The Shield’s continued survival has also been placed under added pressure as a consequence of the Trump administration — which has entrenched rather than rolled back privacy-hostile US laws, as well as dragging its feet on key appointments that the Commission said the arrangement’s survival depends on.

Ahead of last year’s annual Privacy Shield review the EU parliament called for the mechanism to be suspended until the US came into compliance. (The Commission ignored the calls.)

In one particularly embarrassing moment for the mechanism it emerged that disgraced political data company, Cambridge Analytica, had been signed up to self-certify its ‘compliance’ with EU privacy law…

La Quadrature du Net is a long time critic of Privacy Shield, filing its complaint back in October 2016 — immediately after Privacy Shield got up and running. It argues the mechanism breaches fundamental EU rights and does not provide adequate protection for EU citizens’ data.

It subsequently made a joint petition with a French NGO for its complaint to be heard before the General Court of the EU, in November 2016. Much back and forth followed, with exchanges of writing between the two sides laying out the arguments and counter arguments.

The Commission has been supported in this process by countries including the US, France and the UK and companies including Microsoft and tech industry association, Digitaleurope, whose members include Amazon, Apple, Dropbox, Facebook, Google, Huawei, Oracle and Qualcomm (to name a few).

While La Quadrature du Net getting support from local consumer protection organisation UFC Que Choisir and the American Civil Liberties Union — which it says provided “a detailed description of the US surveillance regime”.

“The General Court of the EU has deemed our complaint serious and grave enough to open proceedings,” La Quadrature du Net says now.

It will be up to the court in Luxembourg to hear and judge the complain.

A decision on the legality of Privacy Shield will follow some time after July — perhaps in just a handful of months, as the CJEU has been known to move quickly in cases involving the defence of fundamental EU rights. Though it may also take the court longer to issue a judgement.

All companies signed up to the Privacy Shield should be aware of the risk and have contingencies in place in case the arrangement is struck down.

Nor is this complaint the only legal questions facing Privacy Shield. A challenge filed to a separate data transfer mechanism in Ireland by privacy campaigner Max Schrems — whose original challenge brought down Safe Harbor — has also now been referred by Irish courts to the CJEU, in what’s being referred to as ‘Schrems II’.

In that case Facebook has attempted to block the court’s referral of questions to the CJEU — by seeking to appeal to Ireland’s Supreme Court, even though there is not normally a right to appeal a referral to the CJEU.

Facebook was granted leave to appeal — and Ireland’s Supreme Court is expected to rule on that appeal early next month. The appeals process has not stayed the referral, though. Nor does it impinge upon La Quadrature du Net’s complaint against Privacy Shield being heard later this summer.

First American site bug exposed 885 million sensitive title insurance records

News just in from security reporter Brian Krebs: Fortune 500 real estate insurance giant First American exposed approximately 885 million sensitive records because of a bug in its website.

Krebs reported that the company’s website was storing and leaking bank account numbers, statements, mortgage and tax records, and Social Security numbers and driving license images in an enumerable format — so anyone who knew a valid web address for a document simply had to change the address by one digit to view other documents, he said.

There was no authentication required — such as a password or other checks — to prevent access to other documents.

According to Krebs’ report, the earliest document was labeled “000000075” — with newer documents increasing in numerical order, he said.

The data goes back at least to 2003, said Krebs.

“Many of the exposed files are records of wire transactions with bank account numbers and other information from home or property buyers and sellers,” wrote Krebs. First American is one of the largest real estate title insurance giants in the U.S., earning $5.8 billion in revenue in 2018.

A spokesperson for First American did not immediately respond to a request for comment but told Krebs that its web application was shut down and that there would be “no further comment” until its review was complete.

Although the website was down many of the documents are still cached in search engines, security researcher John Wethington told TechCrunch. We’re not linking to the exposed data while the data is still readable.

It’s the latest breach of sensitive mortgage data in recent months.

TechCrunch exclusively reported in January a trove of more than 24 million financial and banking documents were left inadvertently exposed on a public cloud storage server for anyone to access. The data contained loan and mortgage agreements, repayment schedules and other highly sensitive financial and tax documents that reveal an intimate insight into a person’s financial life.

Google’s lead EU regulator opens formal privacy probe of its adtech

Google’s lead data regulator in Europe has opened a formal investigation into its processing of personal data in the context of its online Ad Exchange, TechCrunch has learnt.

This follows a privacy complaint pertaining to adtech’s real-timing bidding (RTB) system filed under Europe’s GDPR framework last year.

The statutory inquiry into Google’s adtech that’s being opened by the Irish Data Protection Commission (DPC), cites section 110 of Ireland’s Data Protection Act 2018, which means that the watchdog suspects infringement — and will now investigate its suspicions.

The DPC writes that the inquiry is “to establish whether processing of personal data carried out at each stage of an advertising transaction is in compliance with the relevant provisions of the General Data Protection Regulation, including the lawful basis for processing, the principles of transparency and data minimisation, as well as Google’s retention practices”.

We’ve reached out to Google for comment.

As we reported earlier this week complaints about the RTB system used by online advertisers have been stacking up across Europe.

The relevant complaint in this instance was lodged last fall by Dr Johnny Ryan of private browser Brave, and alleges “wide-scale and systemic breaches of the data protection regime” by Google and others in the behavioral advertising industry.

Where Google is concerned the complaint focuses on its DoubleClick/Authorized Buyers ad system.

In a nutshell, the RTB complaints argue the system is inherently insecure — and that’s incompatible with GDPR’s requirement that personal data is processed “in a manner that ensures appropriate security”.

Commenting on the Irish DPC opening an inquiry in a statement, Ryan said: “Surveillance capitalism is about to become obsolete. The Irish Data Protection Commission’s action signals that now — nearly one year after the GDPR was introduced — a change is coming that goes beyond just Google. We need to reform online advertising to protect privacy, and to protect advertisers and publishers from legal risk under the GDPR”.

Similar complaints against RTB have been filed in the UK, Poland, Spain, Belgium, Luxembourg and the Netherlands.

Ireland is leading the investigation of Google’s adtech as the company designates Google Ireland as the data controller for EU users.

Amazon shareholders reject facial recognition sale ban to governments

Amazon shareholders have rejected two proposals that would have requested the company not to sell its facial recognition technology to government customers.

The breakdown of the votes is not immediately known. A filing with the vote tally is expected later this week.

The first proposal would have requested Amazon to limit the sale of its Rekognition technology to police, law enforcement and federal agencies. A second resolution would have demanded an independent human and civil rights review into the use of the technology.

It followed accusations that the technology has bias and inaccuracies, which critics say can be used to racially discriminate against minorities.

The votes were non-binding, allowing the company to reject the outcome of the vote.

But the vote was almost inevitably set to fail. Following his divorce, Amazon founder and chief executive Jeff Bezos retains 12 percent of the company’s stock as well as the voting rights in his ex-wife’s remaining stake. The company’s top four institutional shareholders, including The Vanguard Group, Blackrock, FMR and State Street, collectively hold about the same amount of voting rights as Bezos.

The resolutions failed despite an effort by the ACLU to back the measures, which the civil liberties group accused the tech giant of being “non-responsive” to privacy concerns.

In remarks, Shankar Narayan, ACLU of Washington, said: “The fact that there needed to be a vote on this is an embarrassment for Amazon’s leadership team. It demonstrates shareholders do not have confidence that company executives are properly understanding or addressing the civil and human rights impacts of its role in facilitating pervasive government surveillance.”

“While we have yet to see the exact breakdown of the vote, this shareholder intervention should serve as a wake-up call for the company to reckon with the real harms of face surveillance and to change course,” he said.

The civil liberties group rallied investors ahead of the Wednesday annual meeting in Seattle, where the tech giant has its headquarters. In a letter, the group said the sale of Amazon’s facial recognition tech to government agencies “fundamentally alters the balance of power between government and individuals, arming governments with unprecedented power to track, control, and harm people.”

“As shown by a long history of other surveillance technologies, face surveillance is certain to be disproportionately aimed at immigrants, religious minorities, people of color, activists, and other vulnerable communities,” the letter added.

The ACLU said investors and shareholders had the power “to protect Amazon from its own failed judgment.”

Amazon pushed back against claims that the technology is inaccurate, and called on the U.S. Securities and Exchange Commission to block the shareholder proposal prior to its annual shareholder meeting. The government agency blocked Amazon’s efforts to stop the vote, amid growing scrutiny of its product.

Amazon spokesperson Lauren Lynch said on Tuesday, prior to the meeting, that the company operates “in line with our code of conduct which governs how we run our business and the use of our products.”

An email to the company following Wednesday’s meeting was unreturned at the time of writing.

Read more:

DuckDuckGo founder Gabriel Weinberg is coming to Disrupt

2019 is the year Facebook announced a ‘pivot to privacy’. At the same time, Google is trying to claim that privacy means letting it exclusively store and data-mine everything you do online. So what better time to sit down at Disrupt for a chat about what privacy really means with DuckDuckGo founder and CEO Gabriel Weinberg?

We’re delighted to announce that Weinberg is joining us at Disrupt SF (October 2-4).

The pro-privacy search engine he founded has been on a mission to shrink the shoulder-surfing creepiness of Internet searching for more than a decade, serving contextual keyword-based ads, rather than pervasively tracking users to maintain privacy-hostile profiles. (If you can’t quite believe the decade bit; here’s DDG’s startup Elevator Pitch — which we featured on TC all the way back in 2008.)

It’s a position that looks increasingly smart as big tech comes under sharper political and regulatory scrutiny on account of the volume of information it’s amassing. (Not to mention what it’s doing with people’s data.)

Despite competing as a self-funded underdog against the biggest tech giants around, DuckDuckGo has been profitable and gaining users at a steady clip for years. It also recently took in a chunk of VC to capitalize on what its investors see as a growing international opportunity to help Internet users go about their business without being intrusively snooped on. Which makes a compelling counter narrative to the tech giants.

In more recent developments it has added a tracker blocker to its product mix — and been dabbling in policy advocacy — calling for a revival of a Do Not Track browser standard, after earlier attempts floundered with the industry, failing to reach accord.

The political climate around privacy and data protection does look to be pivoting in such a way that Do Not Track could possibly swing back into play. But if — and, yes it’s a big one — privacy ends up being a baked in Internet norm how might a pioneer like DuckDuckGo maintain its differentiating edge?

While, on the flip side, what if tech giants end up moving in on its territory by redefining privacy in their own self-serving image? We have questions and will be searching Weinberg for answers.

There’s also the fact that many a founder would have cut and run just half a decade into pushing against the prevailing industry grain. So we’re also keen to mine his views on entrepreneurial patience, and get a better handle on what makes him tick as a person — to learn how he’s turned a passion for building people-centric, principled products into a profitable business.

Disrupt SF runs October 2 – October 4 at the Moscone Center in San Francisco. Tickets are available here.

Apple has a plan to make online ads more private

For years, the web has been largely free thanks to online ads. The problem is that nobody likes them. When they’re not obnoxiously taking over your entire screen or autoplaying, they’re tracking you everywhere you go online.

Ads can track where you go and which sites you visit and can be used to build up profiles on individuals — even if you never click on one. And when you do, they know what you bought and then they share that with other sites so they know you were up late buying ice cream, cat food, or something a little more private.

The obvious logic would be to use an ad-blocker. But that’s not what keeps the internet thriving and available. Apple says it’s figured out some middle ground that keeps ads alive but without their nefarious ad tracking capabilities.

The tech giant came up with Privacy Preserving Ad Click Attribution. Yes, it’s a mouthful but the tech itself shows promise.

A bit of background: Any time you buy something online, the store that placed the ad knows you bought something and so do the other sites where the ad was placed. When a person clicks on an ad, the store wants to know which site the ad was clicked on so they know where to keep advertising, known as ad attribution. Ads often use tracking images — tiny, near-invisible pixel-sized trackers embedded on websites that know when you’ve opened a webpage. These pixels carry cookies, which make it easy for ads to track users across pages and entire websites. Using these invisible trackers, websites can build up profiles on people — whether they click ads or not — from site to site, such as their interests, what they want to buy, and more.

Apple’s thinking, outlined in a blog post Wednesday, is that ads don’t need to share that you bought something from an online store with anyone else. Ads just need to know that someone — and not an identifiable person — clicked on an ad on a site and bought something on another.

By taking the identifiable person out of the equation, Apple says its new technology can help preserve user privacy without reducing the effectiveness on ad campaigns.

Apple’s new web technology, soon to be built into its Safari browser, is broken down into four parts.

Firstly, nobody should be identifiable based off their ad clicks. Ads often use long and unique tracking codes to identify a user visiting various sites and buying things. By limiting the number of campaign IDs to just a few dozen, an advertiser won’t be able to assign unique tracking codes to each ad click, making it far more difficult to track individual users across the web. Secondly, only the website where the ad was clicked will be allowed to measure ad clicks, cutting out third-parties. Thirdly, the browser should delay the sending of ad click and conversion data — such as when someone signs up for a site or buys something — at random by up to two days to further hide the user’s activity. That data is sent through a dedicated private browsing window to ensure it’s not associated with any other browsing data.

Lastly, Apple said it can do this at the browser level, limiting how much data the ad networks and merchants can see.

Instead of knowing exactly who bought what and when, the privacy ad click technology will instead report back ad click and conversion data without identifying the person.

“As more and more browsers acknowledge the problems of cross-site tracking, we should expect privacy-invasive ad click attribution to become a thing of the past,” wrote Apple engineer John Wilander in a blog post.

One of the core features of the technology is the limiting the amount of data that ads can collect.

“Today’s practice of ad click attribution has no practical limit on the bits of data, which allows for full cross-site tracking of users using cookies,” explained Wilander. “But by keeping the entropy of attribution data low enough, we believe the reporting can be done in a privacy preserving way.”

Simply put, by restricting the number of campaign and conversion IDs to just 64, advertisers are prevented from using long and unique values that can be used as a unique identifier to track a user from site to site. Apple says that restricted number will still give advertisers enough information to know how well their ads are performing. Advertisers, for example, can still see that a particular ad campaign leads to more completed purchases, based off a specific conversion ID, than other ad campaigns when they’re run on specific site in the last 48 hours.

But Apple concedes that real-time tracking of purchases may be a thing of the past if the technology becomes widely adopted. By delaying the ad click and conversion reports by up to two days, advertisers lose real-time insight into who buys what and when. Apple says there’s no way to protect a user’s privacy if attribution reports are sent as soon as someone buys something.

Apple is set to switch on the privacy feature by default in Safari later this year but knows it can’t go in alone. The company has proposed the technology as a standard to the World Wide Web Consortium in the hope other browser makers will pick up the torch and run with it.

Anyone with a short memory will know that web standards don’t always take off. The ill-fated Do Not Track web standard was meant to allow browser users to send a signal to websites and ad networks not to be tracked. The major browser makers adopted the feature, but mired in controversy, the standard never took off.

Apple thinks its proposed standard can succeed — chiefly because unlike Do Not Track the privacy ad click technology can be enforced in the browser with other privacy-minded technology. In Safari’s case, that’s intelligence tracking prevention. Other browsers, like Google Chrome and Mozilla Firefox are also doubling down on privacy features in an effort to win over the privacy crowd. Apple is also betting on users actively wanting this privacy technology, while balancing the concerns of advertisers who don’t want to be shut out through more drastic measures like users installing ad and content blockers.

The new privacy technology is in its developer-focused Safari Technology Preview 82, released last week, and will be available for web developers later this year.

Read more:

London’s Tube network to switch on wi-fi tracking by default in July

Transport for London will roll out default wi-fi device tracking on the London Underground this summer, following a trial back in 2016.

In a press release announcing the move, TfL writes that “secure, privacy-protected data collection will begin on July 8” — while touting additional services, such as improved alerts about delays and congestion, which it frames as “customer benefits”, as expected to launch “later in the year”.

As well as offering additional alerts-based services to passengers via its own website/apps, TfL says it could incorporate crowding data into its free open-data API — to allow app developers, academics and businesses to expand the utility of the data by baking it into their own products and services.

It’s not all just added utility though; TfL says it will also use the information to enhance its in-station marketing analytics — and, it hopes, top up its revenues — by tracking footfall around ad units and billboards.

Commuters using the UK capital’s publicly funded transport network who do not want their movements being tracked will have to switch off their wi-fi, or else put their phone in airplane mode when using the network.

To deliver data of the required detail, TfL says detailed digital mapping of all London Underground stations was undertaken to identify where wi-fi routers are located so it can understand how commuters move across the network and through stations.

It says it will erect signs at stations informing passengers that using the wi-fi will result in connection data being collected “to better understand journey patterns and improve our services” — and explaining that to opt out they have to switch off their device’s wi-fi.

Attempts in recent years by smartphone OSes to use MAC address randomization to try to defeat persistent device tracking have been shown to be vulnerable to reverse engineering via flaws in wi-fi set-up protocols. So, er, switch off to be sure.

We covered TfL’s wi-fi tracking beta back in 2017, when we reported that despite claiming the harvested wi-fi data was “de-personalised”, and claiming individuals using the Tube network could not be identified, TfL nonetheless declined to release the “anonymized” data-set after a Freedom of Information request — saying there remains a risk of individuals being re-identified.

As has been shown many times before, reversing ‘anonymization’ of personal data can be frighteningly easy.

It’s not immediately clear from the press release or TfL’s website exactly how it will be encrypting the location data gathered from devices that authenticate to use the free wi-fi at the circa 260 wi-fi enabled London Underground stations.

Its explainer about the data collection does not go into any real detail about the encryption and security being used. (We’ve asked for more technical details.)

“If the device has been signed up for free Wi-Fi on the London Underground network, the device will disclose its genuine MAC address. This is known as an authenticated device,” TfL writes generally of how the tracking will work. (Ergo, this is another instance where ‘free’ wi-fi isn’t actually free — as one security expert we spoke to pointed out.)

“We process authenticated device MAC address connections (along with the date and time the device authenticated with the Wi-Fi network and the location of each router the device connected to). This helps us to better understand how customers move through and between stations — we look at how long it took for a device to travel between stations, the routes the device took and waiting times at busy periods.”

“We do not collect any other data generated by your device. This includes web browsing data and data from website cookies,” TfL adds, saying also that “individual customer data will never be shared and customers will not be personally identified from the data collected by TfL”.

In a section entitled “keeping information secure” it further writes: “Each MAC address is automatically depersonalised (pseudonymised) and encrypted to prevent the identification of the original MAC address and associated device. The data is stored in a restricted area of a secure location and it will not be linked to any other data at a device level.  At no time does TfL store a device’s original MAC address.”

Privacy and security concerns were raised about the location tracking around the time of the 2016 trial — such as why TfL had used a monthly salt key to encrypt the data rather than daily salts, which would have decreased the risk of data being re-identifiable should it leak out.

Such concerns persist — and security experts are now calling for full technical details to be released, given TfL is going full steam ahead with a rollout.

 

A report in Wired suggests TfL has switched from hashing to a system of tokenisation – “fully replacing the MAC address with an identifier that cannot be tied back to any personal information”, which TfL billed as as a “more sophisticated mechanism” than it had used before. We’ll update as and when we get more from TfL.

Another question over the deployment at the time of the trial was what legal basis it would use for pervasively collecting people’s location data — since the system requires an active opt-out by commuters a consent-based legal basis would not be appropriate.

In a section on the legal basis for processing the Wi-Fi connection data, TfL writes now that its ‘legal ground’ is two-fold:

  • Our statutory and public functions
  • to undertake activities to promote and encourage safe, integrated, efficient and economic transport facilities and services, and to deliver the Mayor’s Transport Strategy

So, presumably, you can file ‘increasing revenue around adverts in stations by being able to track nearby footfall’ under ‘helping to deliver (read: fund) the mayor’s transport strategy’.

(Or as TfL puts it: “[T]he data will also allow TfL to better understand customer flows throughout stations, highlighting the effectiveness and accountability of its advertising estate based on actual customer volumes. Being able to reliably demonstrate this should improve commercial revenue, which can then be reinvested back into the transport network.”)

On data retention it specifies that it will hold “depersonalised Wi-Fi connection data” for two years — after which it will aggregate the data and retain those non-individual insights (presumably indefinitely, or per its standard data retention policies).

“The exact parameters of the aggregation are still to be confirmed, but will result in the individual Wi-Fi connection data being removed. Instead, we will retain counts of activities grouped into specific time periods and locations,” it writes on that.

It further notes that aggregated data “developed by combining depersonalised data from many devices” may also be shared with other TfL departments and external bodies. So that processed data could certainly travel.

Of the “individual depersonalised device Wi-Fi connection data”, TfL claims it is accessible only to “a controlled group of TfL employees” — without specifying how large this group of staff is; and what sort of controls and processes will be in place to prevent the risk of A) data being hacked and/or leaking out or B) data being re-identified by a staff member.

A TfL employee with intimate knowledge of a partner’s daily travel routine might, for example, have access to enough information via the system to be able to reverse the depersonalization.

Without more technical details we just don’t know. Though TfL says it worked with the UK’s data protection watchdog in designing the data collection with privacy front of mind.

“We take the privacy of our customers very seriously. A range of policies, processes and technical measures are in place to control and safeguard access to, and use of, Wi-Fi connection data. Anyone with access to this data must complete TfL’s privacy and data protection training every year,” it also notes elsewhere.

Despite holding individual level location data for two years, TfL is also claiming that it will not respond to requests from individuals to delete or rectify any personal location data it holds, i.e. if people seek to exercise their information rights under EU law.

“We use a one-way pseudonymisation process to depersonalise the data immediately after it is collected. This means we will not be able to single out a specific person’s device, or identify you and the data generated by your device,” it claims.

“This means that we are unable to respond to any requests to access the Wi-Fi data generated by your device, or for data to be deleted, rectified or restricted from further processing.”

Again, the distinctions it is making there are raising some eyebrows.

What’s amply clear is that the volume of data that will be generated as a result of a full rollout of wi-fi tracking across the lion’s share of the London Underground will be staggeringly massive.

More than 509 million “depersonalised” pieces of data, were collected from 5.6 million mobile devices during the four-week 2016 trial alone — comprising some 42 million journeys. And that was a very brief trial which covered a much smaller sub-set of the network.

As big data giants go, TfL is clearly gunning to be right up there.

Millions of Instagram influencers had their contact data scraped and exposed

A massive database containing contact information of millions of Instagram celebrities and its most valuable users has been found online.

The database, hosted by Amazon Web Services, was left exposed and without a password allowing anyone to look inside. At the time of writing, the database had over 49 million records — but was growing by the hour.

From a brief review of the data, each record contained public data scraped from influencer Instagram accounts, including their bio, profile picture, the number of followers they have, if they’re verified, and their location by city and country, but also contained their private contact information, such as the Instagram account owner’s email address and phone number.

Security researcher Anurag Sen discovered the database and alerted TechCrunch in an effort to find the owner and get the database secured. We traced the database back to Mumbai-based social media marketing firm Chtrbox, which pays influencers to post sponsored content on their accounts. Each record in the database contained a record that calculated the worth of each account, based off the number of followers, engagement, reach, likes and shares they had. This was used as a metric to determine how much the company could pay an Instagram celebrity or influencer to post an ad.

TechCrunch found several high-profile influencers in the exposed database, including prominent food bloggers, celebrities and other social media influencers.

We contacted several people at random whose information was found in the database and provided them their phone numbers. Two of the people responded and confirmed their email address and phone number found in the database was used to set up their Instagram accounts. Neither had any involvement with Chtrbox, they said.

Shortly after we reached out, Chtrbox pulled the database offline. Pranay Swarup, the company’s founder and chief executive, did not respond to a request for comment and several questions, including how the company obtained private Instagram account email addresses and phone numbers.

The scraping effort comes two years after Instagram admitted a security bug in its developer API allowed hackers to scrape the email addresses and phone numbers of six million Instagram accounts. The hackers later sold the data for bitcoin.

Months later, Instagram — now with more than a billion users — choked its API to limit the number of requests apps and developers can make on the platform.

A spokesperson for Facebook, which owns Instagram, said it was looking into the matter. “Scraping data of any kind is prohibited on Instagram,” said the spokesperson. “We’re investigating how and what data was obtained and will share an update soon.”