Just 3 days ago, I had the pleasure of watching the #rstudioconf2022 kick off.
I've been attending since 2018 and watching even longer than that.
And, I was just a normal spectator in the audience until this happened.
@topepos and @juliasilge's keynote showed all of the open source work their team has been working on to build the best machine learning ecosystem in R called #tidymodels.
And then they brought this slide up.
Max and Julia then proceeded to talk about how the community members have been working on expanding the ecosystem.
- Text Recipes for Text
- Censored for Survival Modeling
- Stacks for Ensembles
And then they announced me and my work on Modeltime for Time Series!!!
I had no clue this was going to happen.
Just a spectator in the back.
My friends to both sides went nuts. Hugs, high-fives, and all.
My students in my slack channel went even more nuts.
Throughout the rest of the week, I was on cloud-9.
My students that were at the conf introduced themselves.
Much of our discussions centered around Max & Julia's keynote and the exposure that modeltime got.
And all of this wouldn't be possible without the support of this company. Rstudio / posit.
So, I'm honored to be part of something bigger than just a programming language.
And if you'd like to learn more about what I do, I'll share a few links.
The first is my modeltime package for #timeseries.
This has been a 2-year+ passion project for building the premier time series forecasting system.
It now has multiple extensions including ensembles, resampling, deep learning, and more.
A new paper shows how you can predict real purchase intent without asking people.
~90% of human testβretest reliability.
Here's what's inside the 28 page paper:
1. Problem with direct Likert from LLMs:
When you ask LLMs to output 1β5 ratings directly, the distributions are too narrow/skewed and donβt look like human survey data, limiting usefulness for concept testing.
Have the LLM write a short free-text purchase-intent statement, then map that text onto a 5-point Likert score using embedding cosine similarity to predefined anchor sentences (i.e., semantic matching instead of raw numbers).
1οΈβ£ System Prompt: Define your agentβs role, capabilities, and boundaries. This gives your agent the necessary context.
2οΈβ£ LLM (Large Language Model): Choose the engine. GPT-5, Claude, Mistral, or an open-source model β pick based on reasoning needs, latency, and cost.
3οΈβ£ Tools - Equip your agent with tools: API access, code interpreters, database queries, web search, etc. More tools = more utility. Max 20.
4οΈβ£ Orchestration: Use frameworks (like LangChain, AutoGen, CrewAI) to manage reasoning, task decomposition, and multi-agent collaboration.
Understanding P-Values is essential for improving regression models.
In 2 minutes, I'll crush your confusion.
1. The p-value:
A p-value in statistics is a measure used to assess the strength of the evidence against a null hypothesis.
2. Null Hypothesis (Hβ):
The null hypothesis is the default position that there is no relationship between two measured phenomena or no association among groups. For example, under Hβ, the regressor does not affect the outcome.
Understanding probability is essential in data science.
In 4 minutes, I'll demolish your confusion.
Let's go!
1. Statistical Distributions:
There are 100s of distributions to choose from when modeling data. Choices seem endless. Use this as a guide to simplify the choice.
2. Discrete Distributions:
Discrete distributions are used when the data can take on only specific, distinct values. These values are often integers, like the number of sales calls made or the number of customers that converted.