For some #MondayMotivation, let's create a great resource of fellowships, workshops and communities in Data Science.
I'll start with some!
(1/n)
The Women in Data Science Conference (widsconference.org) is a great place to learn, network and grow.
2/n
The ACM SIGHPC Computational & Data Science Fellowships(sighpc.org/fellowships), with an upcoming deadline fosters diversity in Data Science and allied fields.
3/n
The IBM Social Good Fellowship (ibm.com/ibm/responsibi…) is a great venture that helps promote data science research for the benefit of humanity!
(4/n)
Another great fellowship for DS and society is the Data Science for Social Good (dssgfellowship.org) Fellowship, associated with #CarnegieMellonUniversity where you can get involved as a fellow, mentor or manager!
(5/n)
At the intersection of Data Science and national security is this unique 3 year IDA fellowship (ida.org/en/careers/stu…).
(6/n)
Sandia National Lab offers a postdoctoral fellowship (sandia.gov/careers/career…) for using data science in diverse challenges from security to energy systems.
(7/n)
A very great community and resource is Women Who Code Data Science (womenwhocode.com/datascience) where a variety of talks, seminars, opportunities are posted regularly.
(8/n)
And finally, the Women in Machine Learning or WiML org (wimlworkshop.org) which organizes workshops, socials and other events which helps womxn highlight their work and network.
(9/n=9)
• • •
Missing some Tweet in this thread? You can try to
force a refresh
As we practice and teach Data Science, we continuously learn, unlearn and revise old and new concepts.
What are some freely available reading lists that give that help this or give a great intro to Data Science?
Another great one which details specific vital segments like clustering and dimensionality is this book/course from University of Utah: cs.utah.edu/~jeffp/teachin…
Happy Friday!! Today I'd like to describe two important approaches to data privacy research and applications: synthetic data and differential privacy. I hope to generate more interests in this area among researchers and practitioners!
1/n Data privacy and data confidentiality are important topics for statisticians, computer scientists, and really, anyone offers their own data and consume data!
2/n Statistical agencies, in particular, are under legal obligations to protect the privacy and confidentiality of survey and census respondents, e.g. U.S. Title 26.
Happy Thursday! Today, I'd like to introduce and discuss various approaches, innovations, and resources for introducing Bayesian statistics to the undergraduates! I am sure I will miss something good, so feel free to add yours or the ones you know.
First, a little bit history. Bayesian methods became widely used, thanks to the computational advances in early 1990s, including the Gibbs sampler and Metropolis Hastings algorithms (e.g. Gelfand and Smith (1990)).
However, even before that revolutionary advance, innovative educators had designed ways to introduce Bayes to students: e.g. emphasizing the intuition on specifying prior for a data analysis problem while relying on numerical integration, Franck et al. (1988).
Let’s talk vectorization! You may have heard about or experienced how simple NumPy array ops (such as dot product) run significantly faster than for loops or list comprehension in Python. How? Why? Thread incoming.
Suppose we are doing a dot product on two n-dim vectors. In a Python for loop, scalars are individually loaded into registers, and operations are performed on the scalar level. Ignoring the sum, this gives us n multiplication operations.
NumPy makes this faster by employing vectorization, where you can load multiple scalars into registers and get many products for the price of one operation (SIMD). SIMD — single instruction, multiple data — is a backbone of NumPy vectorization.
Today I will be talking about some of the data structures we use regularly when doing data science work. I will start with numpy's ndarray.
What is an ndarray? It's numpy's abstraction for describing an array, or a group of numbers. In math terms, arrays are a "catch all" term used to describe matrices or vectors. Behind the scenes, it essentially describes memory using several key attributes:
* pointer: the memory address of the first byte in the array
* type: the kind of elements in the array, such as floats or ints
* shape: the size of each dimension of the array (ex: 5 x 5 x 5)
* strides: number of bytes to skip to proceed to the next element
* flags