Using Public Data Sources
combining public data from govs and combining with other sources
when the source website makes it more difficult to access the data
federal data is somewhat easier to deal with, state data is awful, xls spreadsheets, intermigled types of data
working in goverment and transparency, political aspects of this work
newyork MTA bus tracking info, is still a struggle
do they follow documentation, do they follow the standards that they declare?
talking to actual engineers within the bureocracy
large sets of language data, issues of glanularity, good quality at individual record level loses quality in integration or the other way around
do you maintain the integrity of the data if it's wrong? how do you communicate that this is not your data
youth data. organizations don't like to share their data, lots of works of cleaning it up
state by state data of schools of the US
integration, overlays, what people have used to do that
more goverment data should be public, how can we help that reality happen?
attempting to collect data from cities all over the world -> ibm
collect data to process, publish and create simulations
limitations of the data sets, inconsistencies, gaps
have we gotten in trouble by using public data?
you scrape the data and give it away, is it yours?
shaming agencies into giving you their data
agencies are reluctant to realease it because they're embarrased about the quality of the data
position it as an opportunity to make the data more accurate. if you do it right, you can make it look good
Chicago Tribune team that works with data. cleaning the data inspired the state to take enforcement actions.
Taking the philosophy of the open source world and bringing it to data. build a feedback mechanism to allow people to feed back in to the data, clening it in the process.
Legitimate reasons to keep control of the data. Sensitive, representing a complex reality.
8 principles of open government data
data.gov is just a compilation of agencies that offer databases, the metadata is not standardize
for now is just linking, we've seen apps.gov trying to engage the technical community to put and share applications that allow agencies to share their data in a better way
there are encouraging signs at the assistant secretary level
there is not a lot of understanding of what you will do with the data, sell them the benefits to release their data
bigger cities are more complicated to work with, less integreated, more fierce politics, older systems
federal vs state level campaign contributions for example. entity matching! everybody agrees we should develop interoperability. nobody's managed to pull it off.
have we managed to collaborate? sucessful cases of interoperation.
ARIS example. United Way. At least they are at the point where agencies talk to each other, collaboration. Is a start, not perect.
Fair housing enforcement agencies around the country. Created a common language, started submitting reports.
IDML. International development markup language. many years in the works.
Microformats. Foundations using microformats for all their grants? Breaks down really easily.
Different agenies want to share a set of indicators across the regions they're working with, 5 years working on it.
Census. Data web and data ferret. build your own tables from census data.
American community survey supposedly will realease info every year after next year's data.
What are we using to overlay data? Being able to take basic geographic data and overlay things on it without designing something new.
Getting OpenGIS GRas etc to work together with your existing system.
Geoserver, Openlayers. Google maps. OpenGeo is the packaging of the stack.
Open Indicators project.
Policy map. Reinvestment Fund Philly.
There are people you can talk to on consulting, TOP.
Repackaging data, ading value to public data. How do we GPL it? I want people to use it, but how do you get people to contribute back to it? How do we license it?
If you build an API, you can lciense the acceptable uses of the API.
Can we copyright the data?
Creative commons is not appropriate for data - article on the blog. Data commons project. There a people trying to figure it out.
Communicating that the data you have is not information or knowledge, you have to process for meaning.
Added value for you. A reporting function they didn't have before. You have to sell it and make it valuable to them.
Being a salesman.
While we all want to move these agencies towards transparecy, this is going to be this way for 5, 10, 15 years. There are opportunities to do great work. Build organizations to process data, normalize it, being a clearinghouse. It provides a lot of value.
Other convenings of people thinking about data.
TOP open government data meetups.
There's a lot of hype.
See the value of it. Give people the tools to analyze and use the data. Because of relatively new things, people are not used to base their actions on hard data. How do you integrate data in people's process to make it valuable along the way.
For some organizations that have an enormeous amount of data, transparency is a challenge.
Personal relationships are really important, finding the right people inside the agencies. Find the openings.
Once you get to the person who is actually running code on the database, things get much easier. They know the value and limitations of the data.
http, everyblock code base, text parsing looking for addresses, in phyton. available in google apps.
scraping: phyton, http,
how much do you divulge about what you're showing people?
openess? if you're trying to build a revenue stream around it, it gets trickier
what is needed?
google charts, maps
project many eyes, ibm - visualization stuff. not just numbers but text
what types are we doing? for different levels of literacy and sofistication. the audience is huge.
what is the knowledge base needed to see information on a visualization?
there's always a tradeoff on choosing a design and a representation
relational data is a challenge, we don't really have 100 years of best practices (such as maps)
think of it as a system
what does the system look like:
people relationships with agencies to get the data
technical people to crunch, clean, process
what data to present, and how?
people who take this back to the community
who do we design for? planner or general user?
who is our audience? trying to get everyone is a nightmare
you have the same data and offer different options for visualizations
make them interactive, allow the user to decide what they want to see and how
if I want a general public, people like faces and people like stories.
that complicates being "impartial" with data representation