Preparing for a technical interview for a #DataScience position? These are some of the questions that typically allow me as an interviewer to quickly distinguish between juniors and mediors, including some quick tips 🧵. #Python #pythonprogramming #DataScientist #Jobs
All questions about SQL. Not the hardest thing to learn, but many #DataScientists only start to learn the value of SQL when they actually become part of a dev team. I’m not only talking about SELECT * FROM table, but also about joins, truncates, partitions and constraints.
Interacting with an API. Make sure you know your requests (GET, POST, PUT, DELETE, PATCH), as well as the #Python requests library.
Yesterday I shared a small thread about getting into #DataScience. Today I’ll build on that and share a bit about my own journey into sports analytics, specifically as a #DataScientist in the #football industry. 🧵
My path began with a MSc in Sport & Movement Science @VU_FBW. It’s not computer science or anything, but it does involve quite some #Math, #Statistics and #Physics, as well as a course in programming. Mainly it learned me Science, and gave me a lot of domain knowledge in sports.
I wasn’t planning to become a #DataScientist, but I wanted to work in sports. I did various stints as an embedded sports scientist, mostly internships/part-time, before joining @ZZLEIDENBASKETB. Those jobs involved data & science, but it wasn’t anything close to #DataScience.
Geographic Data Science with R! 🚀🚀🚀

The R for geographic data science, by @maps4thought, provides an introduction to data science with applications for geographic data. 🧵👇

Images credit: from the book
#rstats #DataScientists #MachineLearning #Stats
The book follows the syllabus of the "R for Data Science" course at the School of Geography, @uniofleicester. The book is still a work in progress, and a draft version is available online.
The book covers to core concepts of data science such as:
✅ Intro to R programming
✅ Data wrangling
✨ Data manipulation
✨ Table operations
✨ Reproducibility
R For Beginners! 🚀🚀🚀

If you are new to #R or planning to learn R, I highly recommend checking Prof. @ajay_kolii course "R For Beginners". The course was organized by the Vishwakarma University - Puna, India. 🧵 👇🏼

#RStats #OpenSource #DataScientists #Datavisualization
The course covers the following topics:
✅ Basics of R & RStudio
✅ Dynamic Documents using R Markdown
✅ Data Visualisation using ggplot2
✅ Data Wrangling using dplyr
✅ Slide Crafting using xaringan
Resources 📚
Source code:…

Thanks to Prof. Ajay Koli for making and sharing this resource! 🙏🏼
H2O new release! 🚀🚀🚀

This week, H2O had a major release of their ML open-source library for #R and #Python, introducing two new algorithms, improvements, and bug fixing. ❤️👇🏼 🧵

#MachineLearning #ML #DeepLearning #rstats #DataScience #DataScientists
New algorithm (1/2):
✨ Distributed Uplift Random Forest (Uplift DRF) - The Uplift DRF is a tree-based algorithm that uses a Random Forecast classifier to estimate a treatment's incremental impact. See demo on the notebook ⬇️…
#randomforest #ML #UpLift
New algorithm (2/2):
✨ Infogram & Admissible Machine Learning - is a new tool for machine learning interpretability. More details are available on the algorithm doc ⬇️…
#machinelearning #ML
🚨🚨🚨BREAKING - $20k Analysis Effort🚨🚨🚨

Since my tweet last night, a donor has now come forward looking to fund 👉👉👉$20,000👈👈👈 toward this effort for publication!!! 🤯🤯🤯

Reaching out to any #DataScientist who is serious about making this happen.

(See next 👇)
👉👉👉 Here's our newly released post on this project:…

If you're like me and feel this is waaaaay overdue, please help us find the Rockstar(s) who can make this a reality!
Hey #UnitedKingdom docs like @lowcarbGP and @DrScottMurray -- know anyone data scientists on #UKBiobank who'd be interested in this project? 👆👆👆
How Does a #YouTube Video Go #Viral? We've collected Top Youtubers data using @ZenRowsHQ to see how it grows over time. We used @chartjs for the charts.…
#DataScientists #DataVisualization
We also publish a GitHub repository with a demo and the #dataset…
We collected all this data over several days running a recurring Task in Zenrows every 30 minutes. If anyone is interested in trying it out, do not hesitate to contact me. We offer a free tier.
#LOTI in collaboration with @ONS @ONSdigital to identify a common problem that London boroughs face that could benefit from #collaboration. We'll work together to develop a solution that demonstrates the value of #DataScience within London’s #LocalGovernment. LOTI & ONS Data Collaboration workshop
In the first exercise, #DataScientists & #DataAnalysts in London boroughs have been invited to share what skills they'd like to improve through this training opportunity in collaboration with @ONS @ONSdigital

#DataScienceCampus #DataCollaboration #DataAnalysis #LocalGovDigital LOTI & ONS Data Collaboration workshop - exercise 1
This #LOTI & @ONS @ONSdigital #DataCollaboration programme pilot is aimed at developing the capability & skills of #LocalGov officers working in #data teams. It's important that we understand their context and priorities:

#DataScience #DataAnalysis #Collaboration #Digital
As an aspiring #DataScientist, one way to showcase your skills is to build interesting portfolio projects.

Here’s a guide on how to develop interesting data science project ideas & implement them.

Step 1: Choose your passion topic that is relevant.

Step 2: Start Scraping together your own #dataset.

Step 3: Cleaning your dataset (here’s where #datascientists spend about 60% of their time).

Step 4: Data Exploration and Analysis.

Step 5: Share your work on a blog or a popular forum/community.
Read the full article by @FelixVemmer here.
Cc: @Websystemer

A step-by-step guide for creating an authentic data science portfolio project…
Please help us welcome our next curator Darryl Takudzwa Griffiths. @BlaqNinja completed his Bachelors Degree in Computer Engineering at DUT, graduated in 2011. Due to struggling to find suitable employment he went on to study multiple certificates from bodies such as Microsoft.
He has certificates in N+ (Computer Networking), A+ (Computer Technician & Technical Support), Certified Ethical Hacking V7 (CEH v7), Offensive Security Certified Professional (OSCP). Sadly even with these, he could not secure his desired post so in 2016 he moved to USA.
Darryl was able to secure a job in a corporation that owns casinos as a system analyst & security architect. Within the same year he embarked on a Masters degree in Robotics & Artificial Intelligence Engineering. In 2017 he resigned from his post and started his own company...
"The #OFQUAL #algorithm is a gross misuse of statistics... there is a special place in hell for those who use maths to hide rather than reveal the truth, this algorithm is hiding the achievement of this generation of young people." my @ChronicleLive blog…
Most people now think the #AlevelResults should not stand, whether the Govt will recognise that or not is another matter. But we shouldn't blame #algorithms for this huge injustice - like most technology they are neither good nor bad in themselves
And I don't blame those who wrote the #algorithm I do think #engineers (like me) & #datascientists should take a 'do no harm' oath like doctors, but currently there is no framework for that & they were presumably working to a spec
Kom op, @rivm, maak asjeblieft een public API met de #corona cijfers! Inzicht door #developers @Rijksoverheid #OpenSource #crowd @WouterWelling @AstridOosenbrug
Voorbeeld; door @rivm data te combineren met @Flitsmeister zijn er misschien wel voorspellende modellen te maken met "hot highways"?.. Geef het aan #datascientists en de #crowd..
Why mid-sized companies are behind the #datascience curve: A thread—
Mid-sized companies are behind in terms of being data-driven when compared to startups and large entities. There are several reasons for this which puts them at a disadvantage.
1/ The average age of a mid-sized company is c years. X years older than the average startup and large entities such as Facebook or Apple. Overall, their workforce is older as well which translates into greater domain knowledge and also little technical capability.
2/ There is often a greater majority of non-technical works which places a heavy burden on the few technical workers often leading to burnout, limitations and ultimately churn.
#Event201 begins. Check it out online to participate virtually! #pandemic #biosecurity
Global leaders in business and government have convened to address the pandemic situation. @T_Inglesby from @JHSPH_CHS Moderates. See the #Event201 website for details on the participants.
Global distribution of cases and deaths from #Event201
Love the simplicity of #Serverless #FaaS (Function-as-a-Service) but hate the setup process? Look to these 7 open source projects to ease #AWS Lambda deployments:
#BigData #StreamingAnalytics #Cloud #DataScience #MachineLearning #IFTTT #EventDriven #AI
13 free tools for #API design, development, & testing — for example, Amazon API Gateway allows you to build front-end APIs for applications built on Amazon EC2, #AWS Lambda, or any web application:
#microservices #cloud #serverless #FaaS #coding #IoT
#IFTTT alternatives for developers of #EventDriven workflows:
#IoT #EdgeAnalytics #EdgeComputing #Microservices #DataScience #BigData #FaaS #ML
+See the book “AWS Lambda in Action: Event-driven #Serverless Applications” at
Read 4 tweets
#Bigdata vs Machine Learning vs Artificial Intelligence
By Irene Aldridge
☝️author High-Frequency Trading: A Practical Guide to #Algorithmic Strategies& #Trading Systems
☝️co-author Real-Time Risk: What Investors Should Know About #Fintech,#highfrequencytrading and FlashCrashes
1. ☝️🧐➿🔢 In traditional #statistics or #econometrics, researchers make assumptions about #data distributions ahead of the analysis
2. 🥇💪#machinelearning = the 1st discipline to apply #efficiency to #problemsolving brought by #computers & their enhanced computational power
3. 🤗➿🔢#ML scientists try to reduce #assumptions about the data as much as possible&let the data (&computers) decide what fits best.
4. ♾➡️🔤🌐#Datascience identifies core characteristics of the data, summarized by what has been known as #characteristic values (#eigenvalues)
Read 6 tweets
Cal—a 30+ year CIA veteran—has made a career of discovering patterns in data.
#datastories Image
Cal’s current focus is on "training our data scientists & cultivating relationships with data experts outside CIA.”
Cal: “We have more data coming to us than ever before. We need to make sense of that data for sake of our national security.”
