Data privacy for organizations

From DevSummit
Revision as of 23:04, 21 November 2017 by Josh (talk | contribs) (→‎Eric's notes)
Jump to navigation Jump to search

Tomas's notes

Collecting data

Facilitator: Aman

Aman consults with orgs about how they can make effective use of the data they collect. Runs a non-profit that helps orgs deal with the same questions

Intro Questions

  • What data is the right data to collect?
  • Collaborative data-modelling with a standard datamodel (Lisa)
  • How to explain to on-the ground organisers what data they do and should collect, and the impact of that
    • Do you really need that data


  • Using data to make evaluations of long-term impact
  • Data minimisation
  • Data collection in general
  • Data-modelling (semantic data)
  • Security/Privacy of collected data (we won't spend time on this)
  • Programme evaluation

Long-term Impact

Participant example:

  • Using statistics to prove impact of student programmes in Haiti
    • Including comparative studies of people who did not enter the programme

The main challenge is in data collection, because you need to track participants over time. Control groups (ie people who do not get into program) is even harder.

They work with CiviCRM and Openflows. Collecting data, ensuring the validity of data and also using that data for analysis/statistics.The data collection is working, but the analysis is not yet operative.

There are tools to allow program alumni to add data to the existing data.

Q: What is the purpose of collecting the data

* To evaluate and study why people complete or do not complete the program. 
* Also studying long.term outcomes, eg do alumni win awards, have careers etc. 
* In general it is an attempt at proving the value of investing in higher education

Challenge: How do you measure indirect effects, eg what other benefits do the constituents get.

Q: Are you also trying to inform decisions about program design

A: We try to identify why people leave the program eg. undesirable outcomes. Including feedback from employers

Aman - When you want to map data to desired outcomes, the first question is obviously identifying the desired outcome. Then collect data to achieve that outcome. In the example case, the desired outcome is to prove that investment in higher education works.

The value of a control group is in helping to figure out the causality between the program and the outcome.

  • Re-using existing research about correlation between outcomes

Datakind (Aman)

Having conversations with organisations they try to break up the data usage and analytics into 3 categories

  1. Collecting data for funders
  2. Collecting data to prove something they already suspect/know
  3. Collecting data to make better operational decisions (this one is rare)

There are other reasons to collect data, but these are not focused on performing analytics on the data:

  1. Organisations that collect data as part of building institutional memory. Supporting the process of the work rather than the work itself.
  2. Organisations collect data to increase the base set

Decision Theory

  • The belief that quantitative data is objective - NOT TRUE
  • The belief that quantitive data convinces people to make different decisions - NOT TRUE

The dichotomy that funders demand quantitative data but do not actually let the data influence decision.

In this case you need data to tell a specific story that we know in advance.

What's Data Kinds process for helping

Starts with asking questions about the theory of change behind the data collection

There's a map that's part technical and part statistical that helps map the data to the objectives/purpose.

To minimise data you need to understand the Why.

To figure out what you need to collect you need to also understand statistical bias.

Example: You ask people how satisfied they are with a product at the moment they quit. They will likely be dissatisfied. If you ask a broader group including people who are not quiting you will get a better understanding of why people are quitting.

Life cycle length

The challenge is that if you're looking at very long.term outcomes, eg if your outcomes is 10 years in the future, it is difficult to understand the impact of operational changes, because you have fewer life-cycles to compare with. Sometimes it is with measuring shorter-term outcomes that can be proven to reliable proxy longer-term outcomes.

The idea of state-progressions in statistics. Breaking a life-cycle into individual states, and looking only at correlation between individual states, not across the whole program. You can make meaningful statistical claims that the correlation exists between individual state changes, but not through the entire chain.

This can be used effectively for story-telling, but may not have the necessary rigor to make effective operational decisions.


Make sure that the outcome your are looking for (the why?) drives the decisions about what you are collecting.

There is a clear distinction between orgs that collect data for operational reasons and orgs that collect data to tell stories and understand operational outcomes through analytics.