DuckDB is a fantastic tool currently seeing a rapid rise in the data world.
It is designed for rapid analytical queries and works brilliantly with big data.
Best of all you can use it directly in tools like #rstats, #python, #java and others.
Let's see how it works in R.
2.
You can install DuckDB straight from the console with install.packages("duckdb").
Unlike some other big data tools it is entirely self-contained.
This means no extra ongoing maintenance - your IT department will thank you for that.
3.
Let's look at an example.
The data I'll use is NYC Taxi Trips Data from Google on Kaggle in CSV format. It contains 10 million rows and is 1.5GB. Is it big data? Not quite, I can still open in R with my laptop but it's not far off.
- DuckDB is a brilliant data tool for writing fast analytical queries
- It handles big data with ease
- You can use it directly in #rstats (+ #python#java & others)
- It uses syntax you will be familiar with if you know a little dplyr
9.
I hope you've found this thread helpful. Follow me @neilgcurrie for more R and data tweets.