Super Cookies, Ever Cookies, Zombie Cookies, Oh My!

There have been a series of reports recently surrounding the use of the so-called "Super Cookies" as a result of the latest in a string of lawsuits from everyone's favorite privacy warrior Scott Kamber of Kamber Law?. That's right, this isn't Mr. Kamber's first rodeo; he has successfully sued several companies over privacy violations. It's likely that given his success, he will continue to lurk on sites like Ars Technica waiting for the next report of a privacy violation by another major company, or by a vendor that major companies use. In this latest incident, KISSMetrics was caught out, and the result was not just a class action lawsuit against their corporation, but also against all of the big names that were utilizing their platform. Is this just bad luck, or a lack of due diligence on the part of Hulu, AOL, Spotify, Spokeo, etc.? That's a whole different blog post for another day, but it's certainly worth considering if you're the one making decisions regarding what third party tools to implement on your website. As I noted earlier, Mr. Kamber has, for better or worse, filed these types of lawsuits in the past --- so why are people continuously getting caught up in this damaging mess? Last time around, people were being sued over the use of Flash Shared Objects (FSOs) to re-create cookies (with Quantcast, Clearspring, and Interclick being the targets of legal action); this time, people are being sued over the use of ETags to avoid loss of identification via deleted cookies. It turns out, these are just forms of the same thing: analytics companies trying to avoid people removing the data they need to uniquely identify a given user. This calls for some quick education!

Analytics companies are keenly interested in identifying users consistently between visits to their clients' web pages --- their businesses are built around being able to do just that. The problem for many analytics companies is that the nature of the internet is such that each page by default is completely independent of every other page. In order to save "state" between page views, the website has to save data from one page to the next, and in order to save data between sessions, the website has realize if you're the same person that visited last week. Websites mainly do this by asking you to identify yourself (login), or by tagging the end user with a unique ID that they can later pick up. In the instance where you volunteer your identity by logging in, the website has to remember you throughout all of your page views. This is an integral part of creating "web applications" that have multiple pages, since forcing someone to type in their “username/password” on every page is simply a terrible user experience. The smart people that created the internet years ago realized this, and decided that there had to be a way to save some data about you (your unique ID) which they could later read. The answer to this was and remains cookies. It was immediately apparent that letting a website read and write data to your computer could be a major security and privacy issue, so cookies were carefully designed to live in a “security sandbox” that is completely under the control of the end-user. Each website (domain to be exact) has its own little “sandbox”, and the end-user is free to clean out that sandbox at any time, putting them in control of how they are tracked. Because of this domain based security restriction, analytics companies use "third party" cookies set on THEIR domain instead of on the domain of the website that uses them, so that they can track people throughout their sessions and various website visits. This is part of why Google Analytics is free: Google gets that data and learns how people move from site to site (again, a topic that deserves its own blog post entirely). So, why are "SUPER COOKIES" considered so insidious? Well, they essentially violate the contract between end-user and website, which guarantees that the user has total control over what's stored on their computer, and the ways it can be used. How?

"Super Cookie" is just a new name for an old bag of tricks. The internet, as we noted earlier, was designed to allow for a very narrow allowance of data storage and retrieval on end-user systems. As companies build value around data collection, the motivation to break out of that narrow privacy oriented data protection scheme has also grown. The company that provides website owners with the most relevant and accurate information about how users interact with the website owner’s site has an advantage over other companies looking to do the same thing. This cat and mouse game of companies finding new ways to persist data has been going on for years. I noted earlier that Scott Kamber has sued people in the past for using FSOs to store/persist user data regardless of the user cleaning out their "data sandbox" (cookies), but FSOs are just one of many ways people have created the loosely used term "zombie cookies" in the past. A "zombie cookie" is a cookie that, when deleted, finds a way to rise from the dead. FSOs were a great way to do this in the past, because deleting FSOs is a pain in the butt. Slick companies would copy their important cookie data into an FSO, and if it noticed that its cookie wasn't there, it would check the Flash data to see if one used to be there, and if it found a copy, it would simply re-create the cookie. This latest incident simply used another location to store the important cookie (ETag instead of FSO). So we've now heard of two ways that companies have violated the unsaid contract with end-users regarding a walled garden for data storage/access, what else is out there? It turns out that companies have been investigating and using these types of tricks for years. One developer that might be a little offended by the re-branding of these zombie cookies as "Super cookies" might be Samy Kamkar (famous for writing a JavaScript based virus that took down MySpace years ago). A few years ago, Samy created a proof of concept he called “EverCookie” that included all of the old methods of persisting cookies, but also investigated several new and extremely clever ways of accomplishing the same thing. Samy's “EverCookie” is in fact the true "Super Cookie", in that it incorporates more than 10 ways to persist data outside of cookies. What people are calling "Super Cookie" now is just one of the many methods (and one of the oldest) that Samy implements. As we move forward, companies will forget Samy and the lawsuits and "discover" (re-discover) what they think are cool new ways to get higher quality analytics data. Somewhere there is a CTO whose eyes will light up as a young engineer brings this new/old methodology to light. The cycle of discovery/lawsuit will repeat, but zombie cookies aren't the only way that privacy is being violated. For example, the EFF's Panopticlick project had sought to uniquely identify users based on what is known about their particular browser (such as installed fonts and plugins). This essentially is an unblockable way of identifying individual hardware/browser setups. Check it out for yourself --- disable cookies and JavaScript and see how good Panopticlick is at identifying you. Scary!

Now that we know a little bit more about “super/ever/zombie cookies” --- what can website owners do to ensure their organizations are protected from a liability standpoint and that their users are protected from a privacy standpoint? As the web moves more toward traditional “stateful applications” - you simply must be able to trust either the web sites you're going to, or you have to rely on the combination of someone else (like Ars Technica) discovering the violation and lawyers like Mr. Kamber providing the punishment/motivation for site owners to be more responsible. For a website owner looking to protect themselves from unwittingly using tools that employ zombie cookies, there are a few approaches. First, you have to recognize that third-party applications which are hosted outside of your internal servers can change at any time. Using Google Analytics? Yea, they can change that code any time. You have a few options:

  1. Continuous external monitoring/web crawling
  2. Continuous internal monitoring/specialized JavaScript based solution (Ensighten Privacy Sentinel for example)

Each option requires not only constant monitoring, but also that the way you monitor grows in sophistication as fast as or faster than the methods people are continuously finding and exposing. Alarmingly, there's not much that can be done to prevent people from stealthily using advanced information theory combined with available data to identify hardware (Panopticlick). Again, this leaves us with the need to build and protect trust in a website and brand. SITE OWNERS BEWARE: as end users become more informed and sophisticated, so does the importance of your ability to build and maintain a clean image. Being proactive is the only way to do this. Protect yourself or risk years of mistrust. Will you be signing up for KISSMetrics anytime soon (now that they've stopped using ETags)?