The #DataFeminism book also made me look inward and examine my own biases, which I am exceedingly grateful for.
Namely, it forced me to reckon with some of my fundamental operating assumptions as a statistician & data scientist.
Examples threaded below...
In chapter 3, the authors discuss the role of emotion in data visualization, specifically calling out giants in the field like Edward Tufte and Alberto Cairo (no snitch tagging, please) for what is presented as an anti-emotion stance.
On Tufte: "Any ink devoted to something other than the data themselves ... is a suspect and intruder to the graphic. Visual minimalism, according to this logic, appeals to reason first. ... Decorative elements ... are associated with messy feelings ... and emotional persuasion."
"The logic that sets up this false binary between emotion and reason is gendered, of course, because the belief that women are more emotional than men (and, by contrast, that men are more reasoned than women) is one of the most persistent stereotypes across many Western cultures"
In Chapter 5 (principle: embrace pluralism), the authors spend a good amount of time unpacking the idea of "tidy data" and discuss the tidyverse.
This section was really challenging for me, a certified tidyverse instructor and lover of the Tidy Data paper they cite, to read.
Again, the authors identify a giant of the field, Hadley Wickham, as promoting the concept of tidiness, cleanliness, & control of data. Which, as they point out in the text, is language that "contains troubling traces of ... eugenics, the ... source of much of modern statistics"
(No snitch tagging please!)
Seeing this name was particularly hard, but the juxtaposition in the text of "tidiness" and "cleanliness" was really striking, and I feel, important.
The authors go on to make several important points about what is lost when data is "cleaned" and "tidied."
The authors say...
"To be clear: the point here is not that anyone who cleans their data is perpetuating eugenics. The point, rather, is that the ideas underlying the belief that data should always be clean and controlled have tainted historical roots."
Ultimately, data cleaning/tidying is "an act that irreversibly separates the data from their context."
The authors also take a critical look at the idea of "data for good," a concept which is imprecise and leaves a lot of questions unanswered. Whose good? Who pays? Who maintains the work?
Again, this forced me to look inward: I participated in a "data for good" program in 2016.
Instead of "Data for good" the authors ask data scientists to embrace "Data for Co-Liberation," and present principles in Table 5.1
For #ThrowbackThursday I thought I'd highlight some of the amazing women who have been mentors (and friends) to me. Without support from an amazing community of women in mathematics & statistics I would not be where I am today! #WomenInSTEM
As we practice and teach Data Science, we continuously learn, unlearn and revise old and new concepts.
What are some freely available reading lists that give that help this or give a great intro to Data Science?
Another great one which details specific vital segments like clustering and dimensionality is this book/course from University of Utah: cs.utah.edu/~jeffp/teachin…
For some #MondayMotivation, let's create a great resource of fellowships, workshops and communities in Data Science.
I'll start with some!
(1/n)
The Women in Data Science Conference (widsconference.org) is a great place to learn, network and grow.
2/n
The ACM SIGHPC Computational & Data Science Fellowships(sighpc.org/fellowships), with an upcoming deadline fosters diversity in Data Science and allied fields.
3/n
Happy Friday!! Today I'd like to describe two important approaches to data privacy research and applications: synthetic data and differential privacy. I hope to generate more interests in this area among researchers and practitioners!
1/n Data privacy and data confidentiality are important topics for statisticians, computer scientists, and really, anyone offers their own data and consume data!
2/n Statistical agencies, in particular, are under legal obligations to protect the privacy and confidentiality of survey and census respondents, e.g. U.S. Title 26.