Data privacy for organizations

From DevSummit
Jump to navigation Jump to search

Tomas's notes

Collecting data

Facilitator: Aman

Aman consults with orgs about how they can make effective use of the data they collect. Runs a non-profit that helps orgs deal with the same questions

Intro Questions

  • What data is the right data to collect?
  • Collaborative data-modelling with a standard datamodel (Lisa)
  • How to explain to on-the ground organisers what data they do and should collect, and the impact of that
    • Do you really need that data


  • Using data to make evaluations of long-term impact
  • Data minimisation
  • Data collection in general
  • Data-modelling (semantic data)
  • Security/Privacy of collected data (we won't spend time on this)
  • Programme evaluation

Long-term Impact

Participant example:

  • Using statistics to prove impact of student programmes in Haiti
    • Including comparative studies of people who did not enter the programme

The main challenge is in data collection, because you need to track participants over time. Control groups (ie people who do not get into program) is even harder.

They work with CiviCRM and Openflows. Collecting data, ensuring the validity of data and also using that data for analysis/statistics.The data collection is working, but the analysis is not yet operative.

There are tools to allow program alumni to add data to the existing data.

Q: What is the purpose of collecting the data

* To evaluate and study why people complete or do not complete the program. 
* Also studying long.term outcomes, eg do alumni win awards, have careers etc. 
* In general it is an attempt at proving the value of investing in higher education

Challenge: How do you measure indirect effects, eg what other benefits do the constituents get.

Q: Are you also trying to inform decisions about program design

A: We try to identify why people leave the program eg. undesirable outcomes. Including feedback from employers

Aman - When you want to map data to desired outcomes, the first question is obviously identifying the desired outcome. Then collect data to achieve that outcome. In the example case, the desired outcome is to prove that investment in higher education works.

The value of a control group is in helping to figure out the causality between the program and the outcome.

  • Re-using existing research about correlation between outcomes

Datakind (Aman)

Having conversations with organisations they try to break up the data usage and analytics into 3 categories

  1. Collecting data for funders
  2. Collecting data to prove something they already suspect/know
  3. Collecting data to make better operational decisions (this one is rare)

There are other reasons to collect data, but these are not focused on performing analytics on the data:

  1. Organisations that collect data as part of building institutional memory. Supporting the process of the work rather than the work itself.
  2. Organisations collect data to increase the base set

Decision Theory

  • The belief that quantitative data is objective - NOT TRUE
  • The belief that quantitive data convinces people to make different decisions - NOT TRUE

The dichotomy that funders demand quantitative data but do not actually let the data influence decision.

In this case you need data to tell a specific story that we know in advance.

What's Data Kinds process for helping

Starts with asking questions about the theory of change behind the data collection

There's a map that's part technical and part statistical that helps map the data to the objectives/purpose.

To minimise data you need to understand the Why.

To figure out what you need to collect you need to also understand statistical bias.

Example: You ask people how satisfied they are with a product at the moment they quit. They will likely be dissatisfied. If you ask a broader group including people who are not quiting you will get a better understanding of why people are quitting.

Life cycle length

The challenge is that if you're looking at very long.term outcomes, eg if your outcomes is 10 years in the future, it is difficult to understand the impact of operational changes, because you have fewer life-cycles to compare with. Sometimes it is with measuring shorter-term outcomes that can be proven to reliable proxy longer-term outcomes.

The idea of state-progressions in statistics. Breaking a life-cycle into individual states, and looking only at correlation between individual states, not across the whole program. You can make meaningful statistical claims that the correlation exists between individual state changes, but not through the entire chain.

This can be used effectively for story-telling, but may not have the necessary rigor to make effective operational decisions.


Make sure that the outcome your are looking for (the why?) drives the decisions about what you are collecting.

There is a clear distinction between orgs that collect data for operational reasons and orgs that collect data to tell stories and understand operational outcomes through analytics.

Eric's notes

Data privacy

Themes from intros:

  • Improving privacy unintended bad results
  • Technology challenges, beyond CRM or website
  • Security of data for community orgs
  • Anonymizing tracking information

Ideas/guiding questions:

How we can do both? Be respectful of people’s data & do business in a viable way What conversations must we have in order to do this well?

Basic needs of data privacy, user agreements

Individual privacy – only protected type of privacy – but creates “small harms to large groups,” (e.g. differential privacy = de-identifying data. Can’t use race or gender for certain types of analysis (e.g. health).

How can we do analytics on data in safe way?)

Some of these things already happen…

Advertiser cannot target “African Americans,” but they might target a particular neighborhood, income bracket, etc. “I can look at a population and determine a connection between smoking and cancer, without knowing whether any particular individual smokes or has cancer.”

EU law forbids storage and collection of information with personal identifyers – potential starting point for discussion

How does anonymity effect equity?

Aggregation is one way to de-personalize data

Relational database

What do we want to do with data? What are threats to storing data?

  1. can sell them
  2. can be stolen
  3. company can use data in malicious ways

Data minimization

open-whisper systems : app developer, were asked by federal govt for IP addresses, but they did not have that info on their server.

Data travels – has a journey – need to consider threats at different moments in that journey. When doing a survey, sharing data, etc.

Storing data in encrypted form, encryption key stores separately, data is only un-encrypted when its going to be used.

In EU, many companies having these conversations – impact assessments, data flow tools, We should all be analyzing our use of data, privacy threats, etc.

Privacy badger – EFF browser anynomization tool

Will be interesting to see how corporations will conform to new regulations

Non-profits have cause for concern in protecting the privacy of their data, especially if they work with vulnerable populations

Can we use this event to have new discussions about the ethics of data privacy, there are some large companies that all of a sudden have interest and resources.

Development of new tools that are being used by companies – some are crap, but there are some good tools also. Non-profits can use these tools as well, should also assess threats

Data brokers – share data amongst companies


Website developers offer it, don’t want to use Google Analytics, but there are few other options

Many forces are pushing nonprofits to risky practices (such as Google)

Some might that storing data without a clear agreement should be illegal…at least start the conversation here Foundations and funders should also understand how they might be compromising the communities they are wanting to support – pushing for greater data gathering, analystics, etc.

Sharing best practices by nonprofits: e.g. Archive the Internet, bay area nonprofit, anonymizes their data, lots of it

Not every nonprofit can hire someone with a professional understanding of data, but there should be a list of best practices, including risk assessments, etc.

Do we trust large companies who are making promises about data privacy?

MailChimp, for example, have expressed that they will voluntarily comply with GDPR (General Data Protection Regulation), but will small organizations be able to legally challenge if they don’t??

What do we do to remain functional as nonprofits?

In EU there are email providers that are showing up as viable alternatives

List of basic resources for data privacy

  • Data Ethics – nonprofit that consults around ethical data storage
  • EFF – “Who Has Your Back” Report: 5 questions for service providers that rate their stance on data privacy. Also * Report on Data anonymization
  • Privacy Badger
  • “Road map” webinars and toolkits for nonprofits and grassroots groups, how to create threat assessment, etc.
  • Data Ethics canvas (similar to business model canvass) tool for thinking about what data you have, how you store it, what is the data’s life cycle, etc. Creative Commons licensed.
  • Connecting more nonprofits to resources like Capital One?
  • Tactical Tech (Europe), Engine Room, consultants that will help non profits with issues of data, analytics, etc.
  • Digital Society Lab, Stanford, help for organizations that want to set up ethical data usage. E.g. use agreement templates, etc.
  • British govt has survey on data usage that provides recommendations