Floris Goes-Smit, PhD Profile picture
Mar 28, 2022 10 tweets 5 min read Read on X
Many young people ask me how they can become a #DataScientist, specifically in #football. Lately I have also seen a lot of posts on how to get into #DataScience in (1)50 days or so, which is a joke imo. Here is my realistic take on it. Warning: it will be closer to 1500 days. 🧵
#DataScience is an umbrella of roles & fields that require different competencies. But they all have two things in common: you have to know #Science and you have to be able to work with #data. The first requires learning to do research, the second learning to do #programming.
Go to uni and get a masters degree that at least requires some #math skills. I’m not saying you need a #PhD and 5 publications before calling yourself a #DataScientist, nor that you can’t be one without a MSc, but is helps a lot in acquiring the right competencies.
Learn #programming. #python or #R are a good start, but learning any programming language is the first step of many. Becoming a programmer is easy: print(“Hello World!”), but becoming a good one takes time and effort, and you won’t become one by just following a course online.
Practice, a lot and daily: follow courses, do coding challenges, work on your own projects, compete in competitions like @worlddataleague or @kaggle. Also: review code and have your code reviewed, it will greatly help your learning process. No one said it would be easy.
Find one or more domains that interest you, and become knowledgeable in it. #DataScience is about solving real world problems through the use of data. It’s okay to rely on domain experts to help you out, but you should have detailed understanding of the problems you help solve.
Do an #internship, preferably in a development team. It will teach you what being a #dataScientist really means. Spoiler alert: it’s about a lot more than programming jupyter notebooks. It will also allow you to learn about things like architecture, CI/CD, scalability etc.
Acquire technical skills that are essential in implementing #DataScience in practice, especially in a product. The essentials include SQL, Spark, Cloud technology (Azure / AWS) and the likes. Most of it isn’t to hard to learn, and it will get you a long way.
Work on your communication skills. If you are capable of explaining complex things to non technical people in a simple way, you can be extremely valuable as a data scientist. If you can talk to the business and development side of a company, you’ll be a key asset.
Done all that? Now you are ready to start a career as a data scientist, which probably means your learning path is just starting. The road is long and hard, and rewarding, as it should be.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Floris Goes-Smit, PhD

Floris Goes-Smit, PhD Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @FlorisgoesF

Feb 7, 2023
In my PhD thesis defense, I will defend the following 10 propositions based on my research into tactical behavior in professional soccer:

#football #Analytics #research #DataScience #MachineLearning
Not every pass can be an assist - my research
Succes is the product of interaction, not action - my research
Read 12 tweets
Aug 12, 2022
In #football #Analytics we often refer to the analysis of attacks as sequence analysis, implying an attack can be modeled as a cascading sequence of events. A thread on why the term “sequence analysis” can be misleading.

#DataScience #Research
Despite the popular analogy (often used to describe strategic moves), #football is actually nothing like a game of chess. In chess, one piece can be moved at a time, after which the opponent reacts. In #football, 22 pieces can be moved simultaneously.
The movements of the 22 pieces are strongly bound together, and #Science has shown that attacking play is not the result of individual, sequential, actions, but rather the product of inter-player & inter-team interaction.
Read 15 tweets
May 30, 2022
“Data don’t lie”. But it typically requires a process of defining #research questions, hypotheses, methodology, interpreting and #dataviz that can introduce subjectivity and #bias. Scientific rigor and objectivity are key in #DataScience. Some #Tips for #DataScientists 🧵
Don’t dive straight into a dataset, domain knowledge is critical. Good #Science requires a theoretical understanding of a topic while #ignorance introduces bias. Sound domain knowledge enables you to ask the right questions and give relevant answers with #DataScience
Investigate the alternate hypothesis. Business questions asked to #DataScientists are often directive, as there already is a hypothesis. Don’t confirm this hypothesis without properly investigating the alternate option.
Read 11 tweets
Apr 5, 2022
Tactical behavior in #Football has a spatial and a temporal component, and results from interaction with the opponent. It’s key to account for all these aspects in data-driven tactical analysis, as well as to respect the complexity of the temporal and spatial dimensions 🧵
Two years ago I published a systematic review in @EurJSportSci on using big data in #soccer for tactical performance analysis that illustrates the associated challenges and provides a data-driven scientific framework. #DataScience tinyurl.com/mrxky6ca
The most common analysis issue is the fact that spatial and/or temporal complexity is not respected. For example by aggregating data over multiple minutes, or constructing spatial features aggregating 11 player positions into a single variable.
Read 9 tweets
Apr 4, 2022
Preparing for a technical interview for a #DataScience position? These are some of the questions that typically allow me as an interviewer to quickly distinguish between juniors and mediors, including some quick tips 🧵. #Python #pythonprogramming #DataScientist #Jobs
All questions about SQL. Not the hardest thing to learn, but many #DataScientists only start to learn the value of SQL when they actually become part of a dev team. I’m not only talking about SELECT * FROM table, but also about joins, truncates, partitions and constraints.
Interacting with an API. Make sure you know your requests (GET, POST, PUT, DELETE, PATCH), as well as the #Python requests library.
Read 10 tweets
Apr 3, 2022
#DataScientist in a software dev team and #pythonprogramming code for production pipelines? You should think carefully about scalability and integration. One of the things to consider is datatypes, here are some helpful tips 🧵
#Python is a dynamically typed language, but that doesn't mean you shouldn't care about types. Know you dtypes, from "str" to "bool" to "int8" to "float64", and understand their memory footprint and restrictions. Especially when working with larger objects, choose wisely.
Loose the strings. 9/10 times strings can be replaced by categoricals (Pandas) or even better by Enums (docs.python.org/3/library/enum…). This can reduce memory footprint of large dataframes with >30%, and improves performance.
Read 8 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(