The Future of Privacy on the way to a Cookieless World
People have been talking about data for quite a while now. How it’s more valuable than oil, how it can be very dangerous in the wrong hands, how with big data comes big responsibility etc. etc. Companies harvest our data in huge quantities and use it to tell us what we need. Eventually, privacy concerns were raised and officials started to force tech companies’ hands with tighter regulations, while individuals started seeking other products that put privacy first.
The latest response that gave another stir to the privacy debate was the reveal of Google’s plan for making the third-party cookies obsolete using an approach that they call Privacy Sandbox. While, undoubtedly, this is huge news considering Google owns 64.1% of the browser market share, let’s not get ahead of ourselves and first have a look at some of the key historical points that brought us here.
November 3, 2015 - Mozilla releases its initial Tracking Protection as a feature for its private browsing mode in Firefox.
Jun 5, 2017 - Apple releases first iteration of Intelligent Tracking Prevention limiting the time-frames of cookies.
November 14, 2017 - With the release of Firefox 57, Mozilla added an option to enable Tracking Protection outside of private browsing (Though it was still not turned on by default).
March 17, 2018 - The Facebook/Cambridge Analytica data scandal was a major political scandal in early 2018 when it was revealed that Cambridge Analytica had harvested the personal data of millions of people's Facebook profiles without their consent and used it for political advertising purposes.
May 25, 2018 - Following 2014 landmark ECJ decision on the Right to be Forgotten and the 2017 ePrivacy Regulation proposal, General Data Protection Regulation (GDPR), EU’s strongest privacy protection law yet, went into effect. This is around the time that you have probably started noticing those cookie consent messages in pretty much every serious website.
August 30, 2018 - Mozilla explained the ongoing change in their approach to Anti-tracking.
October 25, 2018 - Facebook is fined the maximum amount possible for data breaches in Cambridge Analytica scandal by The Information Commissioner’s Office (ICO).
January 21, 2019 - Google fined £44m under GDPR for not properly disclosing to users how data is collected across its services.
Feb 21, 2019 - Apple blocks 3rd-party tracking cookies altogether with Intelligent Tracking System 2.1.
May 8, 2019 - Google Chrome’s initial restriction on how third-party cookies should be used.
July 12, 2019 - The Federal Trade Commission has approved fining Facebook roughly $5bn to settle an investigation into the company’s privacy violations.
July, 2019 - By mid 2019 pretty much every tech giant was facing multiple investigations by EU official bodies.
August 12, 2019 - It is revealed that Facebook will be facing more fines, this time in billions, under the GDPR law by EU.
August 22, 2019 - Explaining the downside of large scale blocking of cookies, Google announces initial ideas for their Privacy Sandbox approach.
June 4, 2019 - Firefox starts blocking third-party cookies by default.
January 14, 2020 - And finally Google’s more solid plans for Privacy Sandbox is revealed.
So what is it about Privacy and Cookies?
Cookies are little bits of information that websites leave on our devices in order to track our movements around the web. They are one of the methods used to add a persistent state to websites. For instance, they can keep you logged into your frequently-used web services, remember your choices (i.e. items you have previously added to your shopping cart), and help advertisers and content providers target personalised content at you.
What are first-party and third-party cookies?
Cookies that are set and used by the domain of the actual site that you are visiting, are referred to as first-party cookies. First-party cookies are generally regarded as the more friendly type of cookie. They’re typically used by publishers to monitor how people are using their websites. Whereas, cookies from domains other than the current site are referred to as third-party cookies which are mostly used for advertising purposes like building profiles of potential customers and retargeting.
In short, a lot of companies mostly depend on third-party cookies in order to identify you (are you logged-in, are you a returning customer, have you seen the latest security notice yet etc.), and profile you (what are your interests, are you a power-user, are you getting engaged, do you have pets, did that ad you saw in site X eventually lead you to buy a product in site Y, does your behaviour follow a pattern or help identify new patterns etc.) as a user while you visit different domains everyday. As you can already tell, while some of this cross-domain tracking activity is very helpful for a users’ overall web experience, the others might raise some eyebrows in terms of privacy.
What is the problem with third-party cookies?
Going back to the beginning, Google’s announcement to make third-party cookies obsolete in two years time was perceived as the last nail in the coffin for tracking users via this technique. Apparently, the advertising industry had seen this coming way before and the problems had been stated on many occasions:
Lack of transparency and bad user experience: Although cookie consents are currently becoming household practices, opting in and out of tracking is still cumbersome and largely fragmented for users.
Inefficiency in truly tracking the user: It is more of an abstraction of who the user might be when it comes to tracking, due to the unreliable persistence of cookies across devices or even across browsers and sometimes sites.
Cookie overload and internet bloat: Every new vendor, publisher, segment, attribute, platform etc. multiplies the cookie load, slowing the page loads for publishers, degrading user experience, creating redundant data and increasing the risk of data leakage.
However, because of the legacy issues the progress has been quite slow and needed major data scandals and stricter regulations in order to trigger more direct actions in the field.
As stated in the above timeline, Apple and Mozilla had already chosen severe routes to block third-party cookie tracking. However, Google based their plans to create a Privacy Sandbox on the argument that, without an agreed upon set of standards, attempts to improve user privacy will have unintended consequences which refers to Apple’s and Mozilla’s course of actions. They go on to say “... large scale blocking of cookies undermine people’s privacy by encouraging opaque techniques such as fingerprinting” which means users cannot control how their data is collected. Secondly they mentioned that “... blocking cookies without another way to deliver relevant ads significantly reduces publishers’ primary means of funding ...” which might mean that “... we will see much less accessible content for everyone.”
So what is Privacy Sandbox?
What does the future look like?
All these tell us quite a bit about how data might be handled going forward in a cookieless world.
White and Black Lists
It is not hard to predict that some sort of whitelisting mechanism will take place in the future of data tracking. Reliable sources (that comply with the new standards) will be allowed to interact with the data the user has agreed to share. Similarly, bad actors will be blacklisted by default by the new gatekeepers. Mozilla has already been using Disconnect’s tracker lists for its Enhanced Tracking Protection and Google is considering using First-Party Sets to identify different domains owned by the same entities in a first-party context. This means that if abc.com and xyz.com are both owned by the same entity then browsers will treat their cookies (or their data access requests) as if it is coming from the same source, therefore it can be allowed cross-domain access.
Data collection will evolve...
Hopefully for the better. As the industry moves towards a more standardised way of collecting data, identification of users will become more reliable, regulation will become easier and fragmentation will decrease for the most part. This might well lead to more concentration on data control but more on that in a second.
As we move our lives more and more into the online space, knowingly or not, we keep creating a generous amount of data through the services and devices we use. You might think you have blocked the tracking cookies but more illicit techniques can still identify you just by looking at how you use your devices, like the calibration setting of your device orientation, preferred language, battery level etc.
As a result, big data studies have been having a field day which has triggered a search for practices that prevent extraction of real identities and anonymize us, as individuals, in the big pool of data.
Data anonymization is the procedure to remove personally identifiable information from data sets. Techniques like Differential Privacy, Federated Learning and Homomorphic Encryption are some of the methods that are being worked on, proposed and applied to solve the issue to some extent. Going forward a Sandbox environment, which doesn't let any identifiable data leave your device and only share anonymized data, will surely benefit the users in terms of their privacy and build more trust.
Better transparency and user experience vs. greater power concentration
Regulations force transparency for the benefit of user privacy and control. However the fragmented structure of current tracking practices makes it very hard for users to feasibly make choices on a large scale. Therefore, centralised solutions will potentially offer a better user experience. For instance, widely agreed standards might enable users to control how they share their data through their browser without the need to go through consent pop-ups every time they visit a new website or clear their cookies. They can potentially pick and choose, switch between presets and opt-out much easier in a Sandbox environment.
However, this opens up a new chapter in the heated debate about walled gardens. Walled gardens are closed ecosystems where users can control their data centrally for all the applications/websites that are part of the ecosystem. On the flip side, this creates a huge concentration of user data and therefore a monopoly on how that data is being used and who is it being shared with. The worries are more than circumvential. When Facebook, one of the largest walled gardens, has been hit with the Cambridge Analytica scandal, people (in the wider sense) have seen how a concentrated lump of data can be used for purposes that it is not intended or shared for.
Besides, lots of publishers, advertisers and ad providers worry that the proposed Sandbox or walled garden approaches will tip the imbalance in the playing field even further, favoring the large tech ecosystems and creating additional challenges for smaller outfits in complying with the new privacy standards.