How Differential Privacy and Federated Learning Will Shape the Future of Personalization in a Post-Cookie Age

Share this post

Today, nearly 70 percent of digital spending are powered by some form of recommendation engines. Across a broad range of internet services that comprise walled gardens, spanning social networks, search engines, digital entertainment, education, or commerce, we leave behind traces of our data with and without consent, in part or in full. As indicators of our habits and behaviours, these traces have played a key role in how advertisers are able to identify target customers to allow for better-personalised marketing messages.

However, this personalisation comes at a cost. Across a fragmented ecosystem, we are now grappling with duplicated identities. While consumers have benefitted, brands have yet to receive their due return in full, burdened by gross inefficiencies in their marketing investments. With the next 20 percent of digital spending addressing these inefficiencies, their effectiveness will drop within the next 18 months by at least 50 percent as the half-life of third-party cookies descends to certain death. Add to that the “convenience fee” or the so-called “AdTech tax” that takes away 30 percent of these investments even further, putting costs towards fixing the end-to-end supply chain.

The much-needed provenance to the actual delivery of the impressions is equally under scrutiny. Forget about the last 10 percent of digital spending for the moment—we are now in clear and present danger with almost 90 percent of spends trying to solve either provenance or personalisation. Be it GDPR or CCPA, other such similar regulations worldwide will result in nearly six billion consumers being protected by stringent data protection frameworks. Platforms will need to seek explicit consent for using consumer data and more importantly, will be held responsible for providing information on how, why, and what the data was used for.

Google Chrome began the new decade with its announcement that it would be following in the footsteps both Safari and Firefox with its decision to phase out support for third-party cookies. With that, nearly 90 percent of online browser behaviour will be directly linked to first-party consent when engaging with the World Wide Web.

Welcome to the realities of Web 3.0.

Navigating a balancing act

In 2019, Google released its open-source Differential Privacy Library in anticipation of the announcement is made at the start of this year, casting the spotlight on the relatively unknown statistical technique. A privacy-preserving algorithm designed by cryptographers, differential privacy involves introducing random distortion to a data set at the moment it’s collected in order to obfuscate individual data points before being anonymised on a server. Rather than gaining access to data points that can be easily traced back or reverse-engineered to individuals themselves, marketers are instead able to access a new data set that provides a sufficiently accurate approximation from which behavioural profiles can be based on. This method is technically probabilistic, but almost deterministic in terms of results.

Despite the efforts of data anonymisation that we see today, it’s clear that the industry continues to miss the mark with sensitive data points such as gender, date of birth, or even political leanings and sexual orientation easily uncovered. Mere anonymisation of personally identifiable information (PII) does not automatically make it privacy compliant. Differential privacy, on the other hand, both adequately encrypts and masks data so that advertisers are still able to deliver ads that continue to be relevant for their target audiences while minimising the amount of raw user data shared directly with them. However, the approximation that differential privacy offers can often come at the cost of accuracy with findings citing that adding noise to a data set can result in a decline in predictive accuracy by almost 70 percent.

As we all know, privacy is only one part of the equation––data aggregation is another matter. To address this, the differential privacy library has also made use of a machine learning technique known as federated learning. Introducing unprecedented benefits in cost-efficiencies by way of lower storage costs and insecurity due to its decentralised model, federated learning ensures that a user’s data never leaves their local storage, staying within the device from which it came from. A machine learning model is sent to a device with the consent of the user and is trained on-device, before pushing out summaries to the server. When combined with differential privacy, data is afforded with another layer of security and protection, before being aggregated as insights. Apple, most notably, has leveraged federated learning on Siri to train a speaker recognition model across all of its users’ devices without their audio ever leaving local storage.

Largely being explored within the AI arms of some of the world’s leading tech manufacturers, federated learning exemplifies the promise of privacy-preserving enhancements even at scale. When blockchain is added to the mix, how these outcomes can be securely stored in a decentralised manner for on-demand provenance is another challenge being explored.

The promise of innovation

The marketing sector would know better than any other that when real-world data is at stake, finding the perfect balance between privacy, security, and utility is a challenge. While innovations in cryptography and machine learning such as differential privacy and federated learning are on the rise, they can only go so far today. Though they offer a glimmer of hope in solving the industry’s privacy woes, applying them at an industry-wide level is another matter entirely. If recent years are of any indication, the conversation surrounding data privacy rights will continue to drum on, and as technologies increase in their sophistication, so will the nature of the security risks that lie ahead. With federated learning offering a view of the potential of decentralised systems, the very infrastructures we rely on as an industry must gradually evolve away from their centralised foundations: legacy structures built for centralised databases must give way for new-age infrastructures where decentralised data is sure to be the norm. As marketers walk the tightrope of privacy and personalisation, it’s important that we keep our eyes on the horizon––not only focusing on what’s in our immediate paths but what lies ahead in the future.

Catch G’Man along with fellow panelists David J. Moore, CEO of Britepool, and Sam Goldberg, Co-Founder and President of Lucidity, at this Advertising Week Europe 2020 during their session entitled “Privacy, Personalisation, and Provenance: Blockchain to the Rescue” on March 16th.

Share this post