How To Develop Websites With User Privacy
Privacy is important! You don't necessarily know who the visitors to your site are and what their threat model is — what situations they're in, what consequences they may face if their privacy is violated. i.e. you run a website about protest tactics and someone visits from a country where protest is illegal.
If you include a “share” or “like” button on your site, Facebook gets to see everyone that comes to your site, and they get to set a cookie on that person and track what they visit around the web.
Primary tracking method is through cookies: unique identifying piece of text that site can store in your browser. If you're logged into your Facebook, they can look at your real name and know exactly who you are.
Questions: Does it matter if I clear my browser history?
- Probably not. If you clear history and cookies, this can help, but there are ways other than cookies that can track people around the web.
Locally hosting third-party resources rather than having them be third-party. Get user's consent at least by a click before you load that content.
Embeds: Embedded from Google maps, or from Youtube. Don't actually embed the thing directly, from Youtube. On EFF's website, they put up an image that says “click here to play Youtube video” and warning that says “clicking this will let Youtube see you're on EFF.org's website”. User can make informed decision about whether they want to do that. When they click the button content gets replaced with actual Youtube embed code.
Analytics: Tracking where people go and what they do on your website. Lots of people use Google analytics, which in theory allows Google to track your users across the web (in addition to fonts and adds). There are a lot of open source analytics, lots of ways to self-host analytics. EFF uses “Piwik”, a set of tools that's really good. Equivalent to the feature set of Google Analytics, but is self-hosted on your own server so data doesn't get sent to Google or anybody else.
Logs: The logs of your site may become interesting to other people someday. Government went to Riseup.net to look for logs. Riseup didn't have logs, and emails were encrypted on the server, so it was fruitless, but you don't know who the logs may become interesting to. Instead of keeping logs forever, figure out a reasonable amount of time after which you can throw away your old logs. Most hosting servers don't let you have control over that matter. May be able to ask hosting provider directly, but it may be out of your power.
Advertisements: If you have advertisements, consider hosting the ads yourself, so advertising company doesn't have access to all sorts of data on your user.
Comes down to: Keep everything on your own server. That way you have tools to control your users' privacy, aren't giving it away to Facebook, Google and all these other companies. If using Google fonts, go download the fonts and load them yourself. Has advantage of making your website faster, won't be making calls to external services, and will shrink the size of what has to be loaded.
Make sure your website loads on Tor. Tor helps anonymize your browsing through hiding your IP address. It's the best way we have to browse the web anonymously. Download Tor through Tor Project.org and make sure your site works through the Tor Browser.
There's a thing called Tor hidden services where you can make a site that can only be accessed through Tor. People can be sure they're using Tor when they visit that site, and they get some extra layers of encryption. For the extra-ambitious: setting up a Tor hidden service. Facebook even has one.
Question: What breaks in Tor?
- Tor people are discouraging people from just using the Tor router, since the Tor browser does a lot like discarding cookies and combating browser fingerprinting.
Question: What is the difference between the Tor browser and the Tor router?
- Tor router is an onion router that bounces your IP through several hops so middle computer doesn't know anything and endpoint knows someone is looking at a thing but not who. Tor browser is a special build for Firefox that has extra privacy enhancing features. You don't want to put add-ons in your Tor browser because that makes your browser different from other Tor browsers which means something could track you based on the uniqueness of your browser, aka browser fingerprinting. One of the advantages of the Tor browser is they all look the same so they can't be identified.
- Browser fingerprinting is a way to track people based on the unique characteristics of their browser: What version of it, what fonts it has and in what order it was installed, what size your screen is, whether you have Java or Flash is installed... Instead of cookies, trackers could follow the uniqueness of your browser. Privacy Badger should not be added to Tor browser because Privacy Badger would make each individual Tor browser different, since Privacy Badger works by learning about what trackers to block as it goes, thus each individual's installation is slightly different.
Question: Why do some websites ban the Tor browser? Yelp, Tor... Why if you go to a bunch of sites hosted on Cloudflare, they make you jump through a bunch of hoops and solve CAPTCHAs.
- This is because some malicious users use the Tor browser for spam and abuse, DDOS attacks, etc. So some websites block Tor IP addresses. Unfortunate, and legitimate users get swept up in it. We want to convince some companies to not block Tor users with such a heavy hand.
- Private browsing mode will stop cookies and things called supercookies (ways of storing data in your browser like a cookie that are harder to remove) but it won't stop browser fingerprinting.
- It is possible to track you across browsers, by IP address (although all users on a home router would be on the same IP address). Another way is through Flash – the web technology that lets you watch videos and stuff like this. Flash has its own version of the cookie called a local storage object (a supercookie), which can be read by any browser you're running it in.
Encrypts the data between the user and your server. Verifies that they're actually communicating with your server, and not intercepted by another. Without HTTPS, anyone between the user and you, on the same router as the user, could read their passwords or anything else they're entering on your site.
- Right now HTTPS certificates are kind of expensive. EFF is working on a project right now to give away free HTTPS certificates and set them up on your server. Once that's out it's everybody's responsibility to have HTTPS on their website at that point since all the pain and all the cost is gone. Compared to the rest of this, HTTPS is the lowest hanging fruit to protect your users privacy.
- When I mention this to clients, they worry about their SEO. Actually not using SEO drops your search engine rating. Google prioritizes sites that use HTTPS, and rewards the use of good HTTPS, with the most secure settings. People worry about it increasing the load on their server, but computers are so powerful now that the effects of HTTPS are negligible. Netflix uses HTTPS for some video streaming, and they deliver something like over 2/3 of the world's bandwidth.