Where does the data come from?

An explanation of the origin of the data in this site, and why its collection is private.


Jul 22nd, 2017

We take privacy very seriously. Over the course of the last two years, we have been building a set of privacy-tools, including a state-of-the-art anti-tracking technology that uses an algorithmic, data-driven approach to remove unique identifiers (UIDs) from third party requests, which we found to perform better than the traditional blocklist approach, maximising protection while minimizing site breakage. You can read more about it in 'How does Cliqz Anti-tracking work'.

We block hundreds of millions of cookies and remove tens of millions of UIDs per day. This has given us important insights on the tracker landscape.

The data presented in this site is collected by the Cliqz browser and extension for Firefox, and from the Ghostery extension for users who have enabled 'HumanWeb' data collection. We receive a message for each page loaded in the browser (except in private tabs), which describes the third-party requests required to load that page. We take the following steps to ensure that this data is anonymised:

The data collected was audited by external researchers in April 2017. Some theoretical attacks to link messages were found which affected a small subset of messages. These issues were subsequently fixed to remove this attack vector. For example, we no longer collect the paths of the third-party requests, as on some sites specific resources (such as avatars) will only be loaded for a specific logged in user. This resource could then be tracked across page loads to build a partial user history for this particular site.

This data is primarily used to automatically generate the list of tracking domains which Cliqz anti-tracking will work on. The side-effect is that this data can also be used to generate this census of trackers across the web.

Our methodology is outlined in the WhoTracksMe paper.