Weaponizing Big Data

From DevSummit
Jump to navigation Jump to search

Personal data for good or evil?

Cambridge Analytica – more information in Guardian articles

UK company using data on US citizens

Farming Facebook likes and other data to determine your preferences

Profiles of voters -> targeted/personalized advertising to specific groups

Targeted less well informed people

Emotional decision making style rather than information based

Advertising to create rifts and conflict

CA involved in PsyOps, behavioral targeting

Persionaldata.io – extract what personal data CA/others? has on you

Links to explainer videos

Examples of what to write to CA to reclaim your data

Mercer – owner/funder of CA, also Breitbart

Our personal information can be used to shape our behavior

Topics in mainstream media can be manipulated to increase polarization

Already being done by both sides, but possibly more money/resources on the other side

Progressive side – lots of ethical concerns that inhibit use of this technology

What is allowed from an ethical perspective?

More data gives you better models, but should you collect that data

Rights vs ethics

Rights determined by laws, ethics are more expansive

GDPR increasing awareness of rights/developing new rights

We might have a right to the underlying information, but not the model/results

What did Cambridge Analytica actually accomplish? Is it just better marketing?

FB like data was at least 2-3 years old, no longer available

Is there something unique in the machine learning they used?

Cambridge Analytica did a better job of leveraging the moment

Are their algorithms better or are they just using their models more aggressively/less ethically?

No patents on algorithms, others can develop comparable models

Without laws against it, why wouldn’t actors use data-heavy models to influence behavior?

Is the bigger/underlying problem Facebook’s ad targeting?

How accessible is machine learning for groups with fewer resources?

Cambridge Analytica might not have been using machine learning – we don’t know

Takes relatively little money to use microtargeted ads on FB

How much influence/reach do you need?

How disturbed are we by FB? Would we stop using it?

Analogy to concerns on climate changing but still driving a car

Some countries have very strict campaigning and messaging laws – elections in France

FB able to implement those rules

Current hearings, showing bipartisan anger against FB

Possibility of regulation

When there were more limits on media/distribution, fewer people had access to that media

FB democratized access to media/influence, but is currently less regulated

Transparency – what do the companies know?

Right to opt-in/opt-out of data use

Need stronger rights in that area, without requiring that you leave the platform

Without changing the tech companies or regulating, are there things we can do to use these tools for good?

Toolkit for nonprofits

Maintain a strong social media presence, don’t let your nonprofit get impersonate

How do we use technical tools without violating ethics?

Google gives ad money to organizations for good – may not be the best model

Other models for ethics

Pharma bioethics

Organic farming – growing success, considering not just the price


How to call someone out when they aren’t behaving

We don’t know what the ethical issues are because the tech is a black box

Machine learning

What is the application for ML in advocacy?

Big data on users/members

Theory of change is collective action not microtargeted

Factured media is counter to building shared experience

Microtargeting to introduce people to new narratives

Find what they have in common to frame stories

Running campaigns targeting the opposition voters

Tools work better on transactional behaviors/relationships

Tools may not work as well on persuasion

Don’t limit ourselves to corporate tactics


Knowing your rights about data

Communicate what data is being collected – without being too scary

People unregistering to vote because they’re made aware of voter registration data

What can the companies do?

Transparency – Twitter showing who is behind political advertising

FB struggling to know what to do? Lost internally

Even if regulated, still don’t know what to do to fix it

FB may become unmanageable, start driving users away

Bots talking to bots talking to bots, with lower impact on humans

Internal advocacy at these tech companies

Personally – what can you do?

Checking oppo news etc to both get a broader view and also throw off the algorithms

Companies that scrub your data - Any personal experience

Ex: UC Davis chancellors – on pepper spray incident – got called out

There are proven cases where it works

Germany – you can remove them – Right to be Forgotten in EU

Most cases, paying to push results down to page 40 on Google

Technology exists to remove information

Hacking – what if people are deleting someone else’s data with identity theft

Digi.me – extract personal data from data silos

Mydata.org – Finish website, conference recommended

If you erase things, that sets off another flag

Why can’t we find you? What are you hiding?

Hiring – recommend not Googling

Confusion bot – chrome extension to mess with your data on purpose

Problems data science can solve Early detection Resource allocation Propensity models More info: Problem Templates... kinds of problems that data science can be used to solve:

1. Early detection to trigger a preventative intervention

Example: infectious disease or lead poisoning)

Question you need to ask: how early is possible and useful?

2. How to allocate a limited number of resources to a large number of entities

Example: EPA inspections of hazardous waste facilities)

We don't if a location that is flagged for an inspection has a problem unless we go out and find a violation or fraud.

Note: The goal is not to find violations, but to decrease fraud. Less violations are better if we're looking at the right entities and our inspection is effective.

3. Scheduling and routing mobile resources that respond to incidents

Example: where to place ambulances so they can respond to 911 calls as quickly as possible

4. Routing a large number of requests to the right placed for a response

Example: Automatically triaging a 311 line

5. What procedures/policies should I implement to improve a situation?

Example: improve maternal morality

Process: identify a few possible "levers" and run randomized control trial

6. How much impact is a policy procedure having (after already implementing it)?

7. What data should I collect that will help me do something?

Example: in matching employees and employers, what information is helpful for doing that (sometimes called dataification)

We didn’t talk about actually using psyops / disinformation

Main points: 1. Rights and awareness 2. Regulations 3. Transparency 4. Media literacy