Tweet

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @WomenInStat

Women in Statistics and Data Science

@WomenInStat

24 Mar

Let's talk data visualizations today! Best practices, ideas, tools, resources or even some really neat visualizations - what are your recommendations?

I found this visualization of at-risk workers in COVID times very good at expressing key points, though I did not like the scroll feature too much!

nytimes.com/interactive/20…

Quite unlike the wealth disparity visualization where the scrolling was on point made all the difference:

mkorostoff.github.io/1-pixel-wealth/

Read 4 tweets

Women in Statistics and Data Science

@WomenInStat

23 Mar

As we practice and teach Data Science, we continuously learn, unlearn and revise old and new concepts.
What are some freely available reading lists that give that help this or give a great intro to Data Science?

(1/n)

This one is from University of Washington and goes over some basic concepts: students.washington.edu/bxie/info370/

(2/n)

Another great one which details specific vital segments like clustering and dimensionality is this book/course from University of Utah: cs.utah.edu/~jeffp/teachin…

(3/n)

Read 8 tweets

Women in Statistics and Data Science

@WomenInStat

5 Mar

Happy Friday!! Today I'd like to describe two important approaches to data privacy research and applications: synthetic data and differential privacy. I hope to generate more interests in this area among researchers and practitioners!

1/n Data privacy and data confidentiality are important topics for statisticians, computer scientists, and really, anyone offers their own data and consume data!

2/n Statistical agencies, in particular, are under legal obligations to protect the privacy and confidentiality of survey and census respondents, e.g. U.S. Title 26.

Read 39 tweets

Women in Statistics and Data Science

@WomenInStat

4 Mar

Happy Thursday! Today, I'd like to introduce and discuss various approaches, innovations, and resources for introducing Bayesian statistics to the undergraduates! I am sure I will miss something good, so feel free to add yours or the ones you know.

First, a little bit history. Bayesian methods became widely used, thanks to the computational advances in early 1990s, including the Gibbs sampler and Metropolis Hastings algorithms (e.g. Gelfand and Smith (1990)).

However, even before that revolutionary advance, innovative educators had designed ways to introduce Bayes to students: e.g. emphasizing the intuition on specifying prior for a data analysis problem while relying on numerical integration, Franck et al. (1988).

Read 42 tweets

Women in Statistics and Data Science

@WomenInStat

13 Jan

Let’s talk vectorization! You may have heard about or experienced how simple NumPy array ops (such as dot product) run significantly faster than for loops or list comprehension in Python. How? Why? Thread incoming.

Suppose we are doing a dot product on two n-dim vectors. In a Python for loop, scalars are individually loaded into registers, and operations are performed on the scalar level. Ignoring the sum, this gives us n multiplication operations.

NumPy makes this faster by employing vectorization, where you can load multiple scalars into registers and get many products for the price of one operation (SIMD). SIMD — single instruction, multiple data — is a backbone of NumPy vectorization.

Read 8 tweets

Women in Statistics and Data Science

@WomenInStat

13 Jan

Today I will be talking about some of the data structures we use regularly when doing data science work. I will start with numpy's ndarray.

What is an ndarray? It's numpy's abstraction for describing an array, or a group of numbers. In math terms, arrays are a "catch all" term used to describe matrices or vectors. Behind the scenes, it essentially describes memory using several key attributes:

* pointer: the memory address of the first byte in the array
* type: the kind of elements in the array, such as floats or ints
* shape: the size of each dimension of the array (ex: 5 x 5 x 5)
* strides: number of bytes to skip to proceed to the next element
* flags

Read 6 tweets

Share this page!

Women in Statistics and Data Science

Try unrolling a thread yourself!

More from @WomenInStat

Women in Statistics and Data Science

Women in Statistics and Data Science

Women in Statistics and Data Science

Women in Statistics and Data Science

Women in Statistics and Data Science

Women in Statistics and Data Science

Did Thread Reader help you today?

Like this author's thread?