#cricket #stats #geeky thread alert - don't bother reading this thread if you're not interested in either cricket or statistics.

A few people have asked me why I never talk about a batter's average in particular countries. There's a really simple reason for that.
There just isn't enough international cricket played - particularly not international test cricket - to make most of those "averages in a country" statistics meaningful in any way.

There's so much randomness in results that looking at a sample size of (say) 12 innings is nuts.
What do I mean "randomness?" Well let's think about it this way. Each ball there's a chance that a batter will get runs, not get runs or get out. For the best batters, that risk of getting out will generally be lower, and it will fluctuate throughout an innings.
If you're starting out in data science (or if your wondering what you need to learn), don't believe everything you read. 🧵

Spot BS and focus on these 4 steps to grow your career.

#datascience #rstats #career Image
My friend Rafael Nicolas Fermin Cota (Nico) pointed me to this modified graphic from a Harvard Business Review Article on "Prioritizing Which Data Science Skills Your Company Needs".
With ChatGPT, AI, and the "trendiness" of buzzwords, this graphic becomes even more dangerous.…
1/7 🧵: What is wRC+ in Baseball? ⚾️

Ever heard of the #baseball stat wRC+? It stands for Weighted Runs Created Plus, and it's a great way to evaluate a hitter's overall offensive performance. Let's break it down in this thread!👇 #MLB #Stats
2/7 🧵: wRC Basics

Before understanding wRC+, we need to grasp wRC (Weighted Runs Created). wRC estimates the number of runs a player contributes to their team. It does so by weighting each offensive action (like hits, walks, etc.) based on their run-scoring potential.
3/7 🧵: Why wRC+?

wRC is a useful stat, but it's not perfect. wRC+ takes it a step further by accounting for factors like ballpark dimensions and league-wide scoring. This makes wRC+ a more accurate representation of a hitter's performance. #AdvancedStats
If you're struggling to become a data scientist...

There's a 95% chance you are doing it wrong.

Master these 5 things to escape career boredom (and make 6-figures).🧵

#datascience #career Image
Would you believe me if I told you it took me 5 years to become a data scientist?

I've written 10+ R packages that have amassed 2.5 million downloads.

I've consulted for S&P Global, MRM McCann, and 3 more Fortune 500 companies.
But I made 5 critical mistakes that I want to teach you how to fix.

1. I was too focused on #DataScience

2. I couldn't relate it to the business

3. I struggled to communicate the value
🤯¡Los datos ausentes están por todas partes!😜
👉Pueden invalidar los resultados de tu estudio
👉Muchas funciones utilizan métodos automáticos que pueden no ser óptimos
👉El impacto de la falta de datos es un tema que la mayoría quiere evitar, pero hoy no
¿Qué hacer con los NA?:
🎯Necesitas identificar los datos ausentes, averiguar por qué y cómo faltan:
- errores humanos
- interrupciones del flujo de datos (e.g. meses)
- problemas de privacidad
- sesgo (e.g. tipos de participantes del estudio que tienen >NA)

¡Es info clave para intentar solucionarlo!
Explora los datos con los paquetes:
✅ visdat
✅ naniar

Un ejemplo con los 3:… ImageImageImageImage
💥14 herramientas secretas impulsadas por #RStats para ahorrar tiempo y esfuerzo en tus proyectos de datos (¡No te lo pierdas!):👀
1️⃣ ¡Edita tus datos de forma interactiva (y guarda el código)! 👀
📦 'editData' es un complemento de RStudio para editar un data.frame o un tibble de forma interactiva

#DataScience #DataVisualization #dataviz #stats #analytics #RStats #Analytics
2️⃣ ¡Crea gráficos #ggplot de forma interactiva!🚀
📦esquisse es otro de mis addins favoritos de #rstudio
✅ exporta el gráfico o recupera el código para reproducir el gráfico
#DataScience #DataVisualization #dataviz #stats #analytics #RStats #Analytics
🌎 ¡Acabo de encontrar una serie de mapas increíbles realizados con #RStats! Desde mapas interactivos hasta diseños 3D, hay algo para todos los amantes de los datos espaciales
👇 ¡8 cuentas a las que definitivamente vale la pena darle un vistazo!🧵
#dataviz #maps #geospatial #gis
✅ Tyler Morgan-Wall @tylermorganwall

Mapa 3D giratorio con puntos de luz (mapa anterior)

La red de cable submarino de fibra óptica de la Tierra.

Utiliza #rayshader #rayrender #rayverse
#dataviz #maps #geospatial #gis
@tylermorganwall ✅ Milos Popovic @milos_agathon
Mapa % de empleados en la fabricación, datos Eurostat.

#dataviz #maps #geospatial #gis #rstats #DataVisualization #stats #DataScience
💥 ¡Hey #RStudio users! ¿Quieres integrar #ChatGPT en tu código?
😱 ¡Descarga 📦 gpttools! (amplía gptstudio)
👉 4 complementos (hilo 🧵)

#chatgpt3 #GPTwitter #gptchat #RStats #datascience #stats #analytics #machinelearning #ML #IA #ArtificialIntelligence #dataviz @posit_pbc #AI
@posit_pbc Código de comentario: utiliza el modelo code-davinci-edit-001 de OpenAI para agregar comentarios a tu código con el mensaje: "agregue comentarios a cada línea de código, explicando lo que hace el código"
#ChatGPTenRStudio #RStats #DataScience #IA #ML #data #dataviz #analytics #AI
@posit_pbc Agrega roxygen: usa el modelo text-davinci-003 de OpenAI para agregar y completar un esqueleto de roxygen a tu código resaltado (debe ser una función) con el mensaje: "insertar esqueleto de roxygen para documentar esta función"
#ChatGPTenRStudio #RStats #DataScience #IA #ML #data
😜¡No seas un inocente del #DataScience !
⚠️Aunque el #MachineLearning puede ser una herramienta poderosa, siempre es importante evaluar y validar tus modelos antes de confiar demasiado en ellos.
😱¿Cómo evaluar y validar modelos de #ML? 👉(Hilo 🧵)

#RStats #analytics #stats #IA
✅ Dividir los datos disponibles en dos (o más) conjuntos. Se entrena el modelo con un conjunto de entrenamiento y luego se mide su rendimiento en un conjunto de prueba. Así obtienes una estimación del rendimiento del modelo en datos que no ha visto antes
#ML #IA #DataScience
✅ Utilizar métricas de evaluación apropiadas: Dependiendo del tipo de problema y modelo, existen diferentes métricas que se pueden utilizar para evaluar el rendimiento del modelo.
E.g. para clasificación la precisión o recall, para regresión el error cuadrático medio o RMSE
This used to be me when I was first learning linear regression. 😂

But here's what's changed for me. 🧵

#datascience #rstats #stats
I always thought it was funny how in the stats books, they'd always tell you of this magical P-Value level of 0.05.

You either reject (p > 0.05) or fail to reject (keep it when p <= 0.05).

So naturally, I'd do whatever I could to get my p-values under 0.05.

So what changed?
I started learning machine learning and actually seeing that my model performance would improve with features that had p-values greater than 0.05.

Wait, what?!
Bold Prediction for 2023 - #ChatGPT has almost no impact on businesses...

Here's why 🧵

#datascience #stats Image
While everyone's playing around with ChatGPT, here's what's about to happen that will actually change 2023

It's the "Birth of the Business Scientist". Image
Listen, ChatGPT is great, and may make you more productive at work....

But, businesses actually need strategic thinkers (and that's really tough to automate).
Bold prediction for 2023: The Birth Of The Business Scientist.

Here's what is about to happen. 🧵

#datascience #python #rstats #stats
According to Glassdoor research, we already have:

1. Data Analyst, $71,298/yr
2. Business Analyst, $83,924/yr
3. Data Scientist, $124,680/yr

But noticeably missing is the "Business Scientist".
So what is a "Business Scientist"?

And why are we about to see the birth of this new role?
You may not be the best coder, but don’t lose sight of the main goal: results.

Clean code takes time.

You’ll get better.🧵

#rstats #datascience #stats #code Image
This meme was presented in John Paul Helveston’s undergrad R class... blank stares, because most students were born AFTER 2003 when Pirates of the Caribbean came out.

While John got blank stares from his students, his point is well taken over here.
My code wasn’t always good.

In fact, if I look back at my early code, it was terrible.

Stop learning data science.

Start learning how to solve problems. 🧵

#datascience #rstats #stats Image
Most people dive into learning data science by taking course after course on

computer science,
web apps,
SQL again (advanced this time),
Tableau (because I like dashboards),
PowerBI (because I realized my company doesn’t use tableau),

You get the idea.
This graph is exactly why I got into data science.

I was fed up with not using my brain at work.

I was bored.

And Here’s what I did.🧵

#datascience #stats Image
It was 2015.

I had recently been promoted to manager of a new sales team.

One problem. 👇

I didn’t know Sales.
I was an engineering manager that was good at selling expensive technical products.

Totally qualified right?!

Haha. Wrong.🛑
🤦🏻‍♀️Muchos usan RStudio durante años sin conocer esta herramienta👀
🎯Complementos: extensiones para ejecutar funciones avanzadas de #RStats sin código
👉Haz clic en el botón Addins del menú de RStudio, y el código correspondiente se ejecuta sin que tengas que escribir el código
👉Los complementos de RStudio se distribuyen como 📦paquetes #RStats
👉Una vez instalado y activado el paquete R, los complementos estarán disponibles de inmediato en RStudio
✅Ejemplo 📦addinexamples

#datascience #programming #dataviz #analytics Image
💡Cómo seleccionar un subconjunto de un conjunto de datos de forma interactiva en R

#datascience #analytics #dataviz #data #RStats #RStudio #posit #programming #code #analisisdedatos #cienciadedatos #BI #Python #stats #RAddins #complementosR
I think the message in Data Science needs to be: Don't believe everything you read. 🧵

#stats #datascience
I'd like to thank Nico Cota for pointing me to this modified graphic from a recent Harvard Business Review article on "Prioritizing Which Data Skills Your Company Needs".
You hear Harvard Business Review, & you think this must be legit.

Well, in this case, they dropped the ball.

If you're coming up with an educational plan for your org in 2023, here are some tips:
I’m super excited for today: I’m revealing a secret about text analysis that 90% of data scientists are not using.

It's being overlooked by 90% of data scientists.

And, true story, it helped me double my business in 2022... 🧵

#stats #datascience #rstats
Text analysis is a gold mine for customer analytics.

Yet few organizations are harnessing its power.

In fact, I wasn't...

Until I put my money where my mouth is.
True story - Text analysis was part of a solution that helped me increase me double my business in 2022.

Yes, that’s right.

In the middle of a recession, my company doubled its revenue, AND text analysis was a key part.
This technique helped me grow my company’s revenue from $900K to $2M in 2022 (during a “recession”)

Yet 90% of companies don’t know how to use it.

Here’s everything you need to know about text analysis. 🧵

#rstats #stats #datascience #nlp Image
Text is a treasure trove of information.

Seriously… a gold mine. $$$

But it’s not as easy to work with as other types of data like numerical.

Most data scientists just convert text to categorical…

Or even worse, simply don’t use text.
Why is text not being used?

Text requires special techniques like:

1. Tokenization
2. N-grams
3. Stop word removal
4. Stripping characters
5. Counting characters

And formatting text can be tough work.

But a lot of data scientists make 1 big mistake…
📊"Una imagen vale más que mil palabras", o que mil datos. Los gráficos cuentan la historia de los datos, nos ayudan a guiar, interpretar y comunicar😉
Cuidado con estos #HorrorStats
#HappyHalloween #Halloween #FelizDomingo #HalloweenEnds
🚫1. Elegir el gráfico incorrecto💀

Cada gráfico tiene sus propios casos de uso. ¿Tiene sentido representar el crédito € de una tarjeta con un gráfico de sectores? 🤌

#HorrorStats #HappyHalloween~ #trickortreat #DataScience #dataviz #DataScience #data
¿Qué gráfico utilizar?👇
🚫2. Manipular los ejes del gráfico💀

👉Distorsionar la escala, truncarla u omitir líneas de base es un error, intencionado o no.🤦🏻‍♀️

¿Quieres más ejemplos?👇

#HorrorStats #HappyHalloween~ #trickortreat #DataScience #dataviz #RStats #Python #DataVisualization #Stats #Analytics
Do you love 3D maps, worlds & visualisations? Here are 24 world creators, mapmakers, or visuals I've come across recently. Brilliant and creative minds using many different tools! PART 1 #dataviz #GISchat #3dmaps #map #gis #3d 1/🧵
Steven Kay | @stevefaeembra creates many original and cool #3dmaps such as this great visualisation of Windturbines in the British Isles. Follow him! #Blender3D and #QGis are his tools. #SDGs #Wind #b3d 2/🧵
Neil Southall | @neilcfd1 creates fantastic #3dmaps such as this hypnotising animation of #LiDAR data of Copenhagen. He uses #rstats and #rayshader in wonderful ways. #rspatial 3/🧵
What could go wrong?

LOL. 😂

Plus the 3 #datascience books that helped me learn #stats the most. 🧵

#rstats Image
I’m not saying you need to be an expert in advanced calculus to do machine learning…

BUT, there is a big difference between someone that does vs someone that does NOT have a good foundation in stats when it comes to getting & explaining business results.
My thought process back in the day was to obtain a great foundation in stats and machine learning at the same time.

So here’s what helped me. I read a ton of books.

Here are the 3 books that helped me learn data science the most...
I think the message in #DataScience needs to be: Don't believe everything you read. 🧵

I'd like to thank Rafael Nicolas Fermin Cota for pointing me to this modified graphic from a recent Harvard Business Review article on "Prioritizing Which Data Skills Your Company Needs".
You hear Harvard Business Review, & you think this must be legit.

Well, in this case, they dropped the ball.

If you're coming up with an educational plan for your org in 2022, here are some tips...
