As we practice and teach Data Science, we continuously learn, unlearn and revise old and new concepts.
What are some freely available reading lists that give that help this or give a great intro to Data Science?

(1/n)
This one is from University of Washington and goes over some basic concepts: students.washington.edu/bxie/info370/

(2/n)
Another great one which details specific vital segments like clustering and dimensionality is this book/course from University of Utah: cs.utah.edu/~jeffp/teachin…

(3/n)
In addition to big data, this also goes into data visualization which is a big part of data understanding and communication: dan.bjorkegren.com/bigdata/

(4/n)
Data Science for Economists!
github.com/uo-ec607/lectu…

(5/n)
Data Science for language modeling!
Includes basics like intro to jupyter notebooks as well.
web.stanford.edu/class/cs124/

(6/n)
Data Science for graphs and social networks: web.stanford.edu/class/cs224w/

(7/n)
And finally, the blog Towards Data Science on Medium is probably the best resource when in need of a brief recap on almost any topic!
(towardsdatascience.com)

Tell me what resources you find helpful! :)

(8/n=8)

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Women in Statistics and Data Science

Women in Statistics and Data Science Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @WomenInStat

24 Mar
Let's talk data visualizations today! Best practices, ideas, tools, resources or even some really neat visualizations - what are your recommendations?
I found this visualization of at-risk workers in COVID times very good at expressing key points, though I did not like the scroll feature too much!

nytimes.com/interactive/20…
Quite unlike the wealth disparity visualization where the scrolling was on point made all the difference:

mkorostoff.github.io/1-pixel-wealth/
Read 4 tweets
22 Mar
For some #MondayMotivation, let's create a great resource of fellowships, workshops and communities in Data Science.

I'll start with some!
(1/n)
The Women in Data Science Conference (widsconference.org) is a great place to learn, network and grow.

2/n
The ACM SIGHPC Computational & Data Science Fellowships(sighpc.org/fellowships), with an upcoming deadline fosters diversity in Data Science and allied fields.

3/n
Read 9 tweets
5 Mar
Happy Friday!! Today I'd like to describe two important approaches to data privacy research and applications: synthetic data and differential privacy. I hope to generate more interests in this area among researchers and practitioners!
1/n Data privacy and data confidentiality are important topics for statisticians, computer scientists, and really, anyone offers their own data and consume data!
2/n Statistical agencies, in particular, are under legal obligations to protect the privacy and confidentiality of survey and census respondents, e.g. U.S. Title 26.
Read 39 tweets
4 Mar
Happy Thursday! Today, I'd like to introduce and discuss various approaches, innovations, and resources for introducing Bayesian statistics to the undergraduates! I am sure I will miss something good, so feel free to add yours or the ones you know.
First, a little bit history. Bayesian methods became widely used, thanks to the computational advances in early 1990s, including the Gibbs sampler and Metropolis Hastings algorithms (e.g. Gelfand and Smith (1990)).
However, even before that revolutionary advance, innovative educators had designed ways to introduce Bayes to students: e.g. emphasizing the intuition on specifying prior for a data analysis problem while relying on numerical integration, Franck et al. (1988).
Read 42 tweets
13 Jan
Let’s talk vectorization! You may have heard about or experienced how simple NumPy array ops (such as dot product) run significantly faster than for loops or list comprehension in Python. How? Why? Thread incoming.
Suppose we are doing a dot product on two n-dim vectors. In a Python for loop, scalars are individually loaded into registers, and operations are performed on the scalar level. Ignoring the sum, this gives us n multiplication operations.
NumPy makes this faster by employing vectorization, where you can load multiple scalars into registers and get many products for the price of one operation (SIMD). SIMD — single instruction, multiple data — is a backbone of NumPy vectorization.
Read 8 tweets
13 Jan
Today I will be talking about some of the data structures we use regularly when doing data science work. I will start with numpy's ndarray.
What is an ndarray? It's numpy's abstraction for describing an array, or a group of numbers. In math terms, arrays are a "catch all" term used to describe matrices or vectors. Behind the scenes, it essentially describes memory using several key attributes:
* pointer: the memory address of the first byte in the array
* type: the kind of elements in the array, such as floats or ints
* shape: the size of each dimension of the array (ex: 5 x 5 x 5)
* strides: number of bytes to skip to proceed to the next element
* flags
Read 6 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!