Using Public Data Sources

From DevSummit
Revision as of 22:11, 15 May 2015 by Vivian (talk | contribs) (1 revision imported)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search


combining public data from govs and combining with other sources

when the source website makes it more difficult to access the data

federal data is somewhat easier to deal with, state data is awful, xls spreadsheets, intermigled types of data

working in goverment and transparency, political aspects of this work

newyork MTA bus tracking info, is still a struggle

do they follow documentation, do they follow the standards that they declare?

talking to actual engineers within the bureocracy

large sets of language data, issues of glanularity, good quality at individual record level loses quality in integration or the other way around

do you maintain the integrity of the data if it's wrong? how do you communicate that this is not your data

youth data. organizations don't like to share their data, lots of works of cleaning it up

state by state data of schools of the US

integration, overlays, what people have used to do that

more goverment data should be public, how can we help that reality happen?

attempting to collect data from cities all over the world -> ibm

collect data to process, publish and create simulations

limitations of the data sets, inconsistencies, gaps


have we gotten in trouble by using public data?

you scrape the data and give it away, is it yours?

shaming agencies into giving you their data

agencies are reluctant to realease it because they're embarrased about the quality of the data

position it as an opportunity to make the data more accurate. if you do it right, you can make it look good

Chicago Tribune team that works with data. cleaning the data inspired the state to take enforcement actions.

Taking the philosophy of the open source world and bringing it to data. build a feedback mechanism to allow people to feed back in to the data, clening it in the process.

Legitimate reasons to keep control of the data. Sensitive, representing a complex reality.

8 principles of open government data

data.gov is just a compilation of agencies that offer databases, the metadata is not standardize

for now is just linking, we've seen apps.gov trying to engage the technical community to put and share applications that allow agencies to share their data in a better way

there are encouraging signs at the assistant secretary level

there is not a lot of understanding of what you will do with the data, sell them the benefits to release their data

bigger cities are more complicated to work with, less integreated, more fierce politics, older systems


federal vs state level campaign contributions for example. entity matching! everybody agrees we should develop interoperability. nobody's managed to pull it off.

have we managed to collaborate? sucessful cases of interoperation.

ARIS example. United Way. At least they are at the point where agencies talk to each other, collaboration. Is a start, not perect.

Fair housing enforcement agencies around the country. Created a common language, started submitting reports.

IDML. International development markup language. many years in the works.

Microformats. Foundations using microformats for all their grants? Breaks down really easily.

Different agenies want to share a set of indicators across the regions they're working with, 5 years working on it.

Census. Data web and data ferret. build your own tables from census data.

American community survey supposedly will realease info every year after next year's data.


What are we using to overlay data? Being able to take basic geographic data and overlay things on it without designing something new.

Getting OpenGIS GRas etc to work together with your existing system.

Geoserver, Openlayers. Google maps. OpenGeo is the packaging of the stack.

Open Indicators project.

Policy map. Reinvestment Fund Philly.

There are people you can talk to on consulting, TOP.


Repackaging data, ading value to public data. How do we GPL it? I want people to use it, but how do you get people to contribute back to it? How do we license it?

If you build an API, you can lciense the acceptable uses of the API.

Can we copyright the data?

Creative commons is not appropriate for data - article on the blog. Data commons project. There a people trying to figure it out.


Communicating that the data you have is not information or knowledge, you have to process for meaning.

Added value for you. A reporting function they didn't have before. You have to sell it and make it valuable to them.

Being a salesman.

While we all want to move these agencies towards transparecy, this is going to be this way for 5, 10, 15 years. There are opportunities to do great work. Build organizations to process data, normalize it, being a clearinghouse. It provides a lot of value.


Other convenings of people thinking about data.

TOP open government data meetups.

Sunlight foundation.

There's a lot of hype.


See the value of it. Give people the tools to analyze and use the data. Because of relatively new things, people are not used to base their actions on hard data. How do you integrate data in people's process to make it valuable along the way.

For some organizations that have an enormeous amount of data, transparency is a challenge.

Personal relationships are really important, finding the right people inside the agencies. Find the openings.

Once you get to the person who is actually running code on the database, things get much easier. They know the value and limitations of the data.


tools:

http, everyblock code base, text parsing looking for addresses, in phyton. available in google apps.

scraping: phyton, http,

dumpsterdiving!


how much do you divulge about what you're showing people?

openess? if you're trying to build a revenue stream around it, it gets trickier


what is needed?

visualization

google charts, maps

project many eyes, ibm - visualization stuff. not just numbers but text

what types are we doing? for different levels of literacy and sofistication. the audience is huge.

what is the knowledge base needed to see information on a visualization?

there's always a tradeoff on choosing a design and a representation

relational data is a challenge, we don't really have 100 years of best practices (such as maps)


think of it as a system

what does the system look like:

people relationships with agencies to get the data

technical people to crunch, clean, process

what data to present, and how?

people who take this back to the community


who do we design for? planner or general user?

who is our audience? trying to get everyone is a nightmare

you have the same data and offer different options for visualizations

make them interactive, allow the user to decide what they want to see and how


if I want a general public, people like faces and people like stories.

that complicates being "impartial" with data representation