Difference between revisions of "Facilitating data analysis"

Latest revision as of 23:01, 28 November 2017

Collecting data

Facilitator: Aman

Aman consults with orgs about how they can make effective use of the data they collect. Runs a non-profit that helps orgs deal with the same questions

Intro Questions

What data is the right data to collect?
Collaborative data-modelling with a standard datamodel (Lisa)
How to explain to on-the ground organisers what data they do and should collect, and the impact of that
- Do you really need that data

Topics

Using data to make evaluations of long-term impact
Data minimisation
Data collection in general
Data-modelling (semantic data)
Security/Privacy of collected data (we won't spend time on this)
Programme evaluation

Long-term Impact

Participant example:

Using statistics to prove impact of student programmes in Haiti
- Including comparative studies of people who did not enter the programme

The main challenge is in data collection, because you need to track participants over time. Control groups (ie people who do not get into program) is even harder.

They work with CiviCRM and Openflows. Collecting data, ensuring the validity of data and also using that data for analysis/statistics.The data collection is working, but the analysis is not yet operative.

There are tools to allow program alumni to add data to the existing data.

Q: What is the purpose of collecting the data

* To evaluate and study why people complete or do not complete the program. 
* Also studying long.term outcomes, eg do alumni win awards, have careers etc. 
* In general it is an attempt at proving the value of investing in higher education

Challenge: How do you measure indirect effects, eg what other benefits do the constituents get.

Q: Are you also trying to inform decisions about program design

A: We try to identify why people leave the program eg. undesirable outcomes. Including feedback from employers

Aman - When you want to map data to desired outcomes, the first question is obviously identifying the desired outcome. Then collect data to achieve that outcome. In the example case, the desired outcome is to prove that investment in higher education works.

The value of a control group is in helping to figure out the causality between the program and the outcome.

Re-using existing research about correlation between outcomes

Datakind (Aman)

Having conversations with organisations they try to break up the data usage and analytics into 3 categories

Collecting data for funders
Collecting data to prove something they already suspect/know
Collecting data to make better operational decisions (this one is rare)

There are other reasons to collect data, but these are not focused on performing analytics on the data:

Organisations that collect data as part of building institutional memory. Supporting the process of the work rather than the work itself.
Organisations collect data to increase the base set

Decision Theory

The belief that quantitative data is objective - NOT TRUE
The belief that quantitive data convinces people to make different decisions - NOT TRUE

The dichotomy that funders demand quantitative data but do not actually let the data influence decision.

In this case you need data to tell a specific story that we know in advance.

What's Data Kinds process for helping

Starts with asking questions about the theory of change behind the data collection

There's a map that's part technical and part statistical that helps map the data to the objectives/purpose.

To minimise data you need to understand the Why.

To figure out what you need to collect you need to also understand statistical bias.

Example: You ask people how satisfied they are with a product at the moment they quit. They will likely be dissatisfied. If you ask a broader group including people who are not quiting you will get a better understanding of why people are quitting.

Life cycle length

The challenge is that if you're looking at very long.term outcomes, eg if your outcomes is 10 years in the future, it is difficult to understand the impact of operational changes, because you have fewer life-cycles to compare with. Sometimes it is with measuring shorter-term outcomes that can be proven to reliable proxy longer-term outcomes.

The idea of state-progressions in statistics. Breaking a life-cycle into individual states, and looking only at correlation between individual states, not across the whole program. You can make meaningful statistical claims that the correlation exists between individual state changes, but not through the entire chain.

This can be used effectively for story-telling, but may not have the necessary rigor to make effective operational decisions.

Summary

Make sure that the outcome your are looking for (the why?) drives the decisions about what you are collecting.

There is a clear distinction between orgs that collect data for operational reasons and orgs that collect data to tell stories and understand operational outcomes through analytics.

@@ Line 1: / Line 1: @@
-Data discussion.
+==Collecting data==
-Intro: there 's a lot of data that is not quantified (how they value
+Facilitator: Aman
-aspects of a community; the history of a town's power dynamic)
-So there's important data that is important but can't be modeled.
+Aman consults with orgs about how they can make effective use of the data they collect. Runs a non-profit that helps orgs deal with the same questions
-There's a wave of fetishization of data and data reporting but the
-source of that data is limited.
-Mapping can be a value community outreach tool. To start a discussion.
+==Intro Questions==
-IE: "where do we feel unsafe" and map that. That can drive discussions.
-Why do you feel unsafe? Maybe it's a store selling Confederate Civil War
-nostalgia makes a part of the community unsafe and others did not notice
-that
-How do we quantify "missing data"
+* What data is the right data to collect?
+* Collaborative data-modelling with a standard datamodel (Lisa)
+* How to explain to on-the ground organisers what data they do and should collect, and the impact of that
+** Do you really need that data
-example: County that was having problems getting an early voting place
+==Topics==
-in a location that would be accessible by most people.
-With various orgs, make a map about where it will be difficult to get
-to. Digitize those and it was clear that there was a major barrier to
-participation given current poll locations. Of course the Board of
-Elections rejected the complaint but a community discussion had started
-which itself has power and has continued. Taking lived experience and
-finding a way to quantify it. (Process of mapping: changing scale of map
-based on input to show the difference between physical distance and real
-lived experience of how hard it was to get there.)
-Even if you know how a political process like a campaign works, it's
+* Using data to make evaluations of long-term impact
-hard to make some points if the data is not published. Example of a lack
+* Data minimisation
-of a published Ethnography of a campaign.
+* Data collection in general
+* Data-modelling (semantic data)
+* Security/Privacy of collected data (we won't spend time on this)
+* Programme evaluation
-how can gathering oral history be useful in the process of creating data
+==Long-term Impact==
-visualizations.
+Participant example:
-BUT:
+* Using statistics to prove impact of student programmes in Haiti
+** Including comparative studies of people who did not enter the programme
-when collecting data we must ask ourselves what data should be recorded,
+The main challenge is in data collection, because you need to track participants over time. Control groups (ie people who do not get into program) is even harder.
-what the sharing of that data mean to people's safety or security. What
-responsibilities do those that gather data have to the subjects of these
-explorations.
+They work with CiviCRM and Openflows. Collecting data, ensuring the validity of data and also using that data for analysis/statistics.The data collection is working, but the analysis is not yet operative.
-Complication: when different parts of the community want different
+There are tools to allow program alumni to add data to the existing data.
-things or are concerned about the impact of storing some data. Sometimes
-this can stall a project.
-When datasets are released by community research (often not by
-academics) it tends to be viewed as problematic (not on same tier of
-'officially' gathered numbers which can also be bad but gets an assumed
-credibility)
-Sometimes gathered data sets do not turn into something solid, but
+Q: What is the purpose of collecting the data
-that's okay because the process of gathering it in a community centric
-way has its own benefits and outcomes.
+ * To evaluate and study why people complete or do not complete the program.
+ * Also studying long.term outcomes, eg do alumni win awards, have careers etc.
+ * In general it is an attempt at proving the value of investing in higher education
-=====
+Challenge: How do you measure indirect effects, eg what other benefits do the constituents get.
-Balancing anonymity vs credit for ideas.
+Q: Are you also trying to inform decisions about program design
-  it is okay to state that 'some of data is anon but parts come from
-(credit people or orgs)'
-What are we going to ask
+A: We try to identify why people leave the program eg. undesirable outcomes. Including feedback from employers
-who will be credited
-how will it be gathered
-how will it be stored
-who owns it afterwards
+Aman - When you want to map data to desired outcomes, the first question is obviously identifying the desired outcome. Then collect data to achieve that outcome. In the example case, the desired outcome is to prove that investment in higher education works.
-======
+The value of a control group is in helping to figure out the causality between the program and the outcome.
+* Re-using existing research about correlation between outcomes
-eco org, ongoing discussion about the content of their outreach list.
+==Datakind (Aman)==
-It's been gathered in ways that are not diverse and using that list to
-ask questions can be difficult since the answers come from non diverse
-community.
-are there ways to mitigate problematic answers that can restrict
+Having conversations with organisations they try to break up the data usage and analytics into 3 categories
-organization from doing work that expands out the debate. (such as how
-to bring racial justice into discussions of environmental issues)
-it's possible to ask a set of questions where some are disguised as ways
+# Collecting data for funders
-of exposing bias in order to figure out how to interpret data results.
+# Collecting data to prove something they already suspect/know
-(proxy questions or a push poll of sorts)
+# Collecting data to make better operational decisions (this one is rare)
+There are other reasons to collect data, but these are not focused on performing analytics on the data:
-Campaign to build a community of formerly incarcerated people to take
+# Organisations that collect data as part of building institutional memory. Supporting the process of the work rather than the work itself.
-action. Got people to come to a meeting and listen to people in order to
+# Organisations collect data to increase the base set
-get people to come together and realize their issues are shared by
-others, but that can be difficult to quantify for funders ("like we
-built community trust"). Was able to show that even people that were not
-served by the org can be active in the issue anyway.
-Ideas for how to visualize data when these issues exist were discussed.
+Decision Theory
-  analyse text of interviews to pull out tendencies not being directly
-discussed. Might require tech that we don't have easy access to but
-could be useful if you can parse this out of your source data.
-Color/hue/saturation can be used effectively for mapping data.
+* The belief that quantitative data is objective - NOT TRUE
+* The belief that quantitive data convinces people to make different decisions - NOT TRUE
-most important: you have a commitment to create as accurate a
+The dichotomy that funders demand quantitative data but do not actually let the data influence decision.
-representation as possible, even if the conclusion is not what you want.
-whatthefuckviz.com is a resource to show great examples of what you
+In this case you need data to tell a specific story that we know in advance.
-should not do.
+==What's Data Kinds process for helping==
-even seemingly objective data still reflects the bias of those that
+Starts with asking questions about the theory of change behind the data collection
-gathered it
+There's a map that's part technical and part statistical that helps map the data to the objectives/purpose.
+To minimise data you need to understand the Why.
+To figure out what you need to collect you need to also understand statistical bias.
+Example: You ask people how satisfied they are with a product at the moment they quit. They will likely be dissatisfied. If you ask a broader group including people who are not quiting you will get a better understanding of why people are quitting.
+==Life cycle length==
+The challenge is that if you're looking at very long.term outcomes, eg if your outcomes is 10 years in the future, it is difficult to understand the impact of operational changes, because you have fewer life-cycles to compare with. Sometimes it is with measuring shorter-term outcomes that can be proven to reliable proxy longer-term outcomes.
+The idea of state-progressions in statistics. Breaking a life-cycle into individual states, and looking only at correlation between individual states, not across the whole program. You can make meaningful statistical claims that the correlation exists between individual state changes, but not through the entire chain.
+This can be used effectively for story-telling, but may not have the necessary rigor to make effective operational decisions.
+==Summary==
+Make sure that the outcome your are looking for (the why?) drives the decisions about what you are collecting.
+There is a clear distinction between orgs that collect data for operational reasons and orgs that collect data to tell stories and understand operational outcomes through analytics.