Difference between revisions of "Facilitating data analysis"

From DevSummit
Jump to navigation Jump to search
(Created page with "Data discussion. Intro: there 's a lot of data that is not quantified (how they value aspects of a community; the history of a town's power dynamic) So there's important dat...")
 
 
(One intermediate revision by one other user not shown)
Line 1: Line 1:
Data discussion.
+
==Collecting data==
  
Intro: there 's a lot of data that is not quantified (how they value
+
Facilitator: Aman
aspects of a community; the history of a town's power dynamic)
 
  
So there's important data that is important but can't be modeled.
+
Aman consults with orgs about how they can make effective use of the data they collect. Runs a non-profit that helps orgs deal with the same questions
There's a wave of fetishization of data and data reporting but the
 
source of that data is limited.
 
  
Mapping can be a value community outreach tool. To start a discussion.
+
==Intro Questions==
IE: "where do we feel unsafe" and map that. That can drive discussions.
 
Why do you feel unsafe? Maybe it's a store selling Confederate Civil War
 
nostalgia makes a part of the community unsafe and others did not notice
 
that
 
  
How do we quantify "missing data"
+
* What data is the right data to collect?
 +
* Collaborative data-modelling with a standard datamodel (Lisa)
 +
* How to explain to on-the ground organisers what data they do and should collect, and the impact of that
 +
** Do you really need that data
  
example: County that was having problems getting an early voting place
+
==Topics==
in a location that would be accessible by most people.
 
With various orgs, make a map about where it will be difficult to get
 
to. Digitize those and it was clear that there was a major barrier to
 
participation given current poll locations. Of course the Board of
 
Elections rejected the complaint but a community discussion had started
 
which itself has power and has continued. Taking lived experience and
 
finding a way to quantify it. (Process of mapping: changing scale of map
 
based on input to show the difference between physical distance and real
 
lived experience of how hard it was to get there.)
 
  
Even if you know how a political process like a campaign works, it's
+
* Using data to make evaluations of long-term impact
hard to make some points if the data is not published. Example of a lack
+
* Data minimisation
of a published Ethnography of a campaign.
+
* Data collection in general
 +
* Data-modelling (semantic data)
 +
* Security/Privacy of collected data (we won't spend time on this)
 +
* Programme evaluation
  
how can gathering oral history be useful in the process of creating data
+
==Long-term Impact==
visualizations.
 
  
 +
Participant example:
  
BUT:
+
* Using statistics to prove impact of student programmes in Haiti
 +
** Including comparative studies of people who did not enter the programme
  
when collecting data we must ask ourselves what data should be recorded,
+
The main challenge is in data collection, because you need to track participants over time. Control groups (ie people who do not get into program) is even harder.
what the sharing of that data mean to people's safety or security. What
 
responsibilities do those that gather data have to the subjects of these
 
explorations.
 
  
 +
They work with CiviCRM and Openflows. Collecting data, ensuring the validity of data and also using that data for analysis/statistics.The data collection is working, but the analysis is not yet operative.
  
Complication: when different parts of the community want different
+
There are tools to allow program alumni to add data to the existing data.
things or are concerned about the impact of storing some data. Sometimes
 
this can stall a project.
 
  
  
When datasets are released by community research (often not by
 
academics) it tends to be viewed as problematic (not on same tier of
 
'officially' gathered numbers which can also be bad but gets an assumed
 
credibility)
 
  
Sometimes gathered data sets do not turn into something solid, but
+
Q: What is the purpose of collecting the data
that's okay because the process of gathering it in a community centric
 
way has its own benefits and outcomes.
 
  
 +
* To evaluate and study why people complete or do not complete the program.
 +
* Also studying long.term outcomes, eg do alumni win awards, have careers etc.
 +
* In general it is an attempt at proving the value of investing in higher education
  
=====
+
Challenge: How do you measure indirect effects, eg what other benefits do the constituents get. 
  
Balancing anonymity vs credit for ideas.
+
Q: Are you also trying to inform decisions about program design
  it is okay to state that 'some of data is anon but parts come from
 
(credit people or orgs)'
 
  
What are we going to ask
+
A: We try to identify why people leave the program eg. undesirable outcomes. Including feedback from employers
who will be credited
 
how will it be gathered
 
how will it be stored
 
who owns it afterwards
 
  
  
  
 +
Aman - When you want to map data to desired outcomes, the first question is obviously identifying the desired outcome. Then collect data to achieve that outcome. In the example case, the desired outcome is to prove that investment in higher education works.
  
======
+
The value of a control group is in helping to figure out the causality between the program and the outcome.
 +
* Re-using existing research about correlation between outcomes
  
eco org, ongoing discussion about the content of their outreach list.
+
==Datakind (Aman)==
It's been gathered in ways that are not diverse and using that list to
 
ask questions can be difficult since the answers come from non diverse
 
community.
 
  
are there ways to mitigate problematic answers that can restrict
+
Having conversations with organisations they try to break up the data usage and analytics into 3 categories
organization from doing work that expands out the debate. (such as how
 
to bring racial justice into discussions of environmental issues)
 
  
it's possible to ask a set of questions where some are disguised as ways
+
# Collecting data for funders
of exposing bias in order to figure out how to interpret data results.
+
# Collecting data to prove something they already suspect/know
(proxy questions or a push poll of sorts)
+
# Collecting data to make better operational decisions (this one is rare)
  
 +
There are other reasons to collect data, but these are not focused on performing analytics on the data:
  
Campaign to build a community of formerly incarcerated people to take
+
# Organisations that collect data as part of building institutional memory. Supporting the process of the work rather than the work itself.
action. Got people to come to a meeting and listen to people in order to
+
# Organisations collect data to increase the base set
get people to come together and realize their issues are shared by
 
others, but that can be difficult to quantify for funders ("like we
 
built community trust"). Was able to show that even people that were not
 
served by the org can be active in the issue anyway.
 
  
Ideas for how to visualize data when these issues exist were discussed.
+
Decision Theory
  analyse text of interviews to pull out tendencies not being directly
 
discussed. Might require tech that we don't have easy access to but
 
could be useful if you can parse this out of your source data.
 
  
Color/hue/saturation can be used effectively for mapping data.
+
* The belief that quantitative data is objective - NOT TRUE
 +
* The belief that quantitive data convinces people to make different decisions - NOT TRUE
  
most important: you have a commitment to create as accurate a
+
The dichotomy that funders demand quantitative data but do not actually let the data influence decision.
representation as possible, even if the conclusion is not what you want.
 
  
whatthefuckviz.com is a resource to show great examples of what you
+
In this case you need data to tell a specific story that we know in advance.
should not do.
 
  
 +
==What's Data Kinds process for helping==
  
even seemingly objective data still reflects the bias of those that
+
Starts with asking questions about the theory of change behind the data collection
gathered it
+
 
 +
There's a map that's part technical and part statistical that helps map the data to the objectives/purpose.
 +
 
 +
To minimise data you need to understand the Why.
 +
 
 +
To figure out what you need to collect you need to also understand statistical bias.
 +
 
 +
Example: You ask people how satisfied they are with a product at the moment they quit. They will likely be dissatisfied. If you ask a broader group including people who are not quiting you will get a better understanding of why people are quitting.
 +
 
 +
==Life cycle length==
 +
 
 +
The challenge is that if you're looking at very long.term outcomes, eg if your outcomes is 10 years in the future, it is difficult to understand the impact of operational changes, because you have fewer life-cycles to compare with. Sometimes it is with measuring shorter-term outcomes that can be proven to reliable proxy longer-term outcomes.
 +
 
 +
The idea of state-progressions in statistics. Breaking a life-cycle into individual states, and looking only at correlation between individual states, not across the whole program. You can make meaningful statistical claims that the correlation exists between individual state changes, but not through the entire chain.
 +
 
 +
This can be used effectively for story-telling, but may not have the necessary rigor to make effective operational decisions.
 +
 
 +
==Summary==
 +
 
 +
Make sure that the outcome your are looking for (the why?) drives the decisions about what you are collecting.
 +
 
 +
There is a clear distinction between orgs that collect data for operational reasons and orgs that collect data to tell stories and understand operational outcomes through analytics.

Latest revision as of 23:01, 28 November 2017

Collecting data

Facilitator: Aman

Aman consults with orgs about how they can make effective use of the data they collect. Runs a non-profit that helps orgs deal with the same questions

Intro Questions

  • What data is the right data to collect?
  • Collaborative data-modelling with a standard datamodel (Lisa)
  • How to explain to on-the ground organisers what data they do and should collect, and the impact of that
    • Do you really need that data

Topics

  • Using data to make evaluations of long-term impact
  • Data minimisation
  • Data collection in general
  • Data-modelling (semantic data)
  • Security/Privacy of collected data (we won't spend time on this)
  • Programme evaluation

Long-term Impact

Participant example:

  • Using statistics to prove impact of student programmes in Haiti
    • Including comparative studies of people who did not enter the programme

The main challenge is in data collection, because you need to track participants over time. Control groups (ie people who do not get into program) is even harder.

They work with CiviCRM and Openflows. Collecting data, ensuring the validity of data and also using that data for analysis/statistics.The data collection is working, but the analysis is not yet operative.

There are tools to allow program alumni to add data to the existing data.


Q: What is the purpose of collecting the data

* To evaluate and study why people complete or do not complete the program. 
* Also studying long.term outcomes, eg do alumni win awards, have careers etc. 
* In general it is an attempt at proving the value of investing in higher education

Challenge: How do you measure indirect effects, eg what other benefits do the constituents get.

Q: Are you also trying to inform decisions about program design

A: We try to identify why people leave the program eg. undesirable outcomes. Including feedback from employers


Aman - When you want to map data to desired outcomes, the first question is obviously identifying the desired outcome. Then collect data to achieve that outcome. In the example case, the desired outcome is to prove that investment in higher education works.

The value of a control group is in helping to figure out the causality between the program and the outcome.

  • Re-using existing research about correlation between outcomes

Datakind (Aman)

Having conversations with organisations they try to break up the data usage and analytics into 3 categories

  1. Collecting data for funders
  2. Collecting data to prove something they already suspect/know
  3. Collecting data to make better operational decisions (this one is rare)

There are other reasons to collect data, but these are not focused on performing analytics on the data:

  1. Organisations that collect data as part of building institutional memory. Supporting the process of the work rather than the work itself.
  2. Organisations collect data to increase the base set

Decision Theory

  • The belief that quantitative data is objective - NOT TRUE
  • The belief that quantitive data convinces people to make different decisions - NOT TRUE

The dichotomy that funders demand quantitative data but do not actually let the data influence decision.

In this case you need data to tell a specific story that we know in advance.

What's Data Kinds process for helping

Starts with asking questions about the theory of change behind the data collection

There's a map that's part technical and part statistical that helps map the data to the objectives/purpose.

To minimise data you need to understand the Why.

To figure out what you need to collect you need to also understand statistical bias.

Example: You ask people how satisfied they are with a product at the moment they quit. They will likely be dissatisfied. If you ask a broader group including people who are not quiting you will get a better understanding of why people are quitting.

Life cycle length

The challenge is that if you're looking at very long.term outcomes, eg if your outcomes is 10 years in the future, it is difficult to understand the impact of operational changes, because you have fewer life-cycles to compare with. Sometimes it is with measuring shorter-term outcomes that can be proven to reliable proxy longer-term outcomes.

The idea of state-progressions in statistics. Breaking a life-cycle into individual states, and looking only at correlation between individual states, not across the whole program. You can make meaningful statistical claims that the correlation exists between individual state changes, but not through the entire chain.

This can be used effectively for story-telling, but may not have the necessary rigor to make effective operational decisions.

Summary

Make sure that the outcome your are looking for (the why?) drives the decisions about what you are collecting.

There is a clear distinction between orgs that collect data for operational reasons and orgs that collect data to tell stories and understand operational outcomes through analytics.