Jeroen Janssens Profile picture
Dec 9, 2021 β€’ 22 tweets β€’ 12 min read β€’ Read on X
πŸŽ‰ The second edition of Data Science at the Command Line is out! πŸŽ‰

You can read the entire book for free at datascienceatthecommandline.com

It took a good year to rewrite and expand the first edition, so I'd like to say a few things. 🧡 Jeroen holds a physical cop...
First of all, many thanks to @OReillyMedia for allowing me to make the book available for free under a CC BY-ND license. You can, of course, also buy a physical copy from your favorite bookstore. A screenshot of the homepag...
I'm grateful for all the help I've received. Most notably my editors @JessHaberman @GreyEditing Kate Galloway and my reviewers Aaditya Maruthi @bde @beeonaposy @juliasilge @mikedewar @reustle. The acknowledgements lists many others, without whom, I couldn't have pulled this off. A screenshot of the acknowl...
If you're only going to read two pages of this book, let it be those that @timoreilly wrote. Tim, thank you for writing such an inspiring foreword. A screenshot of the foreword.
Writing this book triggered a decent amount of imposter syndrome, so I'm very happy with all the positive reactions so far, including the generous praise by @dancow, @chrishwiggins, @JohnDCook, @DynamicWebPaige, @jaredlander, and @jakehofman. A screenshot of the praise ...
Let me give you a rundown of the actual content!

Chapter 1 discusses the OSEMN model for data science by @hmason and @chrishwiggins and explains why I believe the command line can be helpful here. Screenshot of Chapter 1 of ...
Chapter 2 covers how to get started with the Docker image (which contains over πŸ’― tools!) and introduces some essential Unix concepts such as working with redirection and getting help. Screenshot of Chapter 2 of ...
Chapter 3 shows several ways of obtaining data; the first step of the OSEMN model. APIs, databases, Excel sheets, web pages; nothing is safe from the command line! Screenshot of Chapter 3 of ...
Chapter 4 explains how you can create your own command-line tools using #bash, #python, or #rstats.

True story: In 2014 I competed in the US beatboxing championship. My stage name was shebang (after #!). I thought it sounded cool until the MC introduced me as "she bang". πŸ₯²πŸŽ€ Screenshot of Chapter 4 of ...
Chapter 5 is all about scrubbing (aka cleaning) data. I discuss classic tools such as grep and awk, and newer tools such as jq and pup. Screenshot of Chapter 5 of ...
Chapter 6 introduces the essentials of make, a command-line tool to formalize your data workflow steps in terms of input and output dependencies. Confession: I often use make as a glorified task runner for my projects. Screenshot of Chapter 6 of ...
Chapter 7 talks about exploring data, mainly through visualization. I demonstrate rush, a tool that allows you to run R one liners from the command line. A data visualization create...
Chapter 8 demonstrates GNU parallel, a wonderful tool to parallelize and distribute your pipeline. Run cowsay on dozens of EC2 instances! If AWS is up of course. πŸ€“ Screenshot of Chapter 8 of ...
Chapter 9 covers modeling data, where I demonstrate how you can do dimensionality reduction, regression, and classification at the command line. Screenshot of Chapter 9 of ...
Chapter 10 is an entirely new chapter! It's about using multiple tools and programming languages together, including Jupyter, Python, RStudio, R, and Apache Spark. Screenshot of Chapter 10 of...
Chapter 11 concludes the book with three pieces of advice and references to some excellent resources if you want to learn more. Screenshot of Chapter 11 of...
And last but certainly not least, the appendix lists all the tools used in the book (and which are installed in the Docker container) together with citations and examples. Keep in mind that tools come and go. The command line itself, however, is here to stay. Screenshot of the Appendix ...
While this book was challenging to write at times, I also had a lot of fun (I managed to sneak in a few Easter eggs and obscure jokes). In any case, I hope you'll find this book helpful. If you want to help me, please consider leaving a review on Amazon.
One more thing: I'm working on a brand new course! It's tentatively titled "Embracing the Command Line". First beta cohort expected to start in Q1 2022. You can learn more about this course and let me know what you think at datascienceatthecommandline.com/#course
If you can't wait to learn from me in person: I regularly give workshops about the command line, Python, R, etc. for a living (both on-site and online). In fact, it's the first edition from 2014 that eventually allowed me start my own company datascienceworkshops.com. Cheers! Logo of Jeroen's company Da...
The next two-day workshop "Data Science at the Command Line" will be on March 10-11, 2022 from 10am to 3pm EST.

Join me on Zoom for eight hours of interactive, hands-on sessions. Sign up via Gumroad.

datascienceworkshops.gumroad.com/l/data-science…
Data Science at the Command Line on Hacker News news.ycombinator.com/item?id=295893…

β€’ β€’ β€’

Missing some Tweet in this thread? You can try to force a refresh
γ€€

Keep Current with Jeroen Janssens

Jeroen Janssens Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @jeroenhjanssens

Feb 15, 2022
My six favorite command-line tools for working with JSON. The last two will surprise you!!

Exclamation marks incoming! 🧡

Screenshot below taken from datascienceatthecommandline.com/2e/chapter-3-o… Image
1. jq by Stephen Dolan. JSON processor.

Because it’s great for extracting values, filtering objects, and creating smaller objects! πŸ§‘β€πŸ”¬

stedolan.github.com/jq
2. jless by @CodeIsTheEnd. Interactive JSON viewer.

Because navigating deeply nested JSON can be challenging! 🧭

pauljuliusmartinez.github.io
Read 6 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(