What are the implications of using hacked data for research?

A short thread inspired by the fact that, before AWs took it down, #Parler was extensively hacked and user data was leaked.


The #Parler dataset seems crazy interesting for doing research, and my first reaction after the breach was to shre it with other #CompSocSci ppl.

However, I started having second thoughts, so what follows is to organize ideas and have it somewhere I can look back to.

Generally speaking, as far as the ethics of research goes a good advice would be to handle hacked data with caution.

First of all, there's an issue of quality. Data might be altered or incomplete, and the source cannot be considered accountable (assuming src is anonymous).

Secondly and more importantly, a researcher using the data would probably be violating users’ consent and acting against the data collector's will.

Finally, users’ privacy is at stake, since researchers could see material that users didn’t agree for other people to see.

Sharing private information without consent might put people at risk of harm.

This is all the more true in cases such as the #ParlerHack, where the leaked information is of particularly sensitive nature, and there’s a high risk of unintended consequences.

However, it can be argued that in many cases the milk is already spilled.

After all the data is out there, users are already exposed, and using the leaked information for rsrch (with some precautions) might not cause any additional harm.

Does this mean free for all then?

Short answer, I am not sure.

On practical grounds, there might be legal boundaries in place (depending on the context).

But more generally, from a deontology perspective I think that (as long as the resercher is not responsible for the hack) the picture is blurred.

Sure, the issue of privacy when data is out in the open becomes secondary. Plus, data can be anonymized by the researcher, so that private information is not furtherly disseminated.

On the other hand, I think the problem of users’ consent should not be bypassed as easily.

There's also another issue.

In fact it can be argued that using illegally obtained data for research purposes might legitimize (or even encourage) illegal or unethical behavior.

Ultimately, the fact that data is publicly available data it doesn't mean neacessarily that it is available for research, and some of the arguments against its use are hard to dismiss.

Do you know of any explicit guidelines in poli /soc sciences that address this issue?


• • •

Missing some Tweet in this thread? You can try to force a refresh

Keep Current with Giovanni Pagano

Giovanni Pagano Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!


Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!