Today's about version control and collaboration today and one of its powerful tools: #git ✨
💡What is Git?
Using Git can be a lifesaver (and it has often been one in the past for me 🙏). It’s basically like a mini time travel machine that you use - it allows you to have version control of your work progress.
But unlike Dropbox or other tools, it does not automatically save the status quo of your work but requires you to do it actively with commits and pushes. A typical workflow looks like this 👇
RStudio has a nice GUI that allows you to do everything without writing code - but if you need to remember some commands, it’s most likely git add, git commit, git push, git pull, and git status (to check if you have uncommitted files) 😊
Here's what the typical workflow can look like in action:
You start with your local repository on your own machine, work on your code and do some changes. Now the #git workflow starts 💫
✨ git add: Once you made some changes, this command lets you add them to the staging area (this is an essential step before committing them and tells git that these are the files you want to commit in your next commit) 😊
✨ git commit: Once you used "git add", this allows you to “commit” them and to “version control” them. It takes a snapshot of your current project status in git - but your changes are still only versioned on your local machine.
I talked to many people and I couldn’t find a best practice on how often you should send commits. I like to think of them as a status report or a (small) milestone to which you may want to return to. So I try to send a commit once a (thematic) step is reached.
✨ git push: If you hit this command, you will push one (or more) commits to the remote repository (for instance to GitHub). Now your changes are also versioned in the #cloud ☁️
✨ git pull: This command pulls changes from others and makes sure that you’re working on the most current version 😊
✨ git status: This command allows you to check if you have still some uncommitted changes in files 🕵🏼♀️
Whenever I close a project, I usually push my changes (or at least commit them) and whenever I open it, I usually pull first to get changes from others before starting my work.
But there are many more commands out there! When I get lost, I usually find myself here looking things up atlassian.com/git 👩🏼💻
You have probably also heard of branches and merges in Git — this is an excellent way to collaborate with others.
The GIF shows how you start working from the main branch (this is where all the changes should eventually end up and where your final product lives). Each dot shows a new commit that is pushed.
Once you want to make changes (like integrating a new function in your package) you start a new feature branch.
The feature branch eventually goes back to the main branch (this is what we call "merging"). The cool thing is that you can somewhat work independently from your colleagues or collaborators on individual tasks because they can start their own feature branch ✨
Merging back feature branches (in the best case) requires a code review - you can also do this on GitHub (github.com/features/code-…) and I'm a big fan of it because it makes you a better programmer step-by-step and allows sharing knowledge.
I learned it the hard way but it's best if feature branches don't get too long and complicated because it easily becomes hard to review them 🤓
If you want to visualize it yourself, here's a slide deck that explains workflows and more: bit.ly/how-git-works 👩🏼🏫
And, one thing that I should add: git is fantastic but it can be intimidating at first. It's a steady learning process (that can be steep at times). I remember when I received my first pull request on GitHub (for merging some changes into my repository) and how lost I felt 😊
The best advice I can give here is to ask (and I know it's hard to do it sometimes - I'm also working on it 😄) - but it's getting better and there are so many great people out there who were in the same situation and who are happy to help 🤗
• • •
Missing some Tweet in this thread? You can try to
force a refresh
If you connect your local repository with a remote repository (for instance on GitHub), you’ll be able to store it also in the cloud and access it from everywhere. Setting up this connection is easy in @rstudio - just follow these steps:
@rstudio You can see one detailed use case in the GIF. It shows how I typically set up a project in #rstats with GitHub when working in academia
@rstudio@AcademicChatter I create a #GitHub repository first (depending on data privacy and other things, I go for either public or private but I always add a README. READMEs are great because they allow you write a short description of your repository in #markdown)
In the last thread, I described in brief how to set up your #rstats package - but as you have already seen, a package contains a bit more than just your function(s) ✨
When building your R package, you can luckily rely on the work of others who provide an excellent framework to get you started (and also take care of some of the things in the background).
I found it difficult to understand "what" I really need and "why" when I started writing my first package. So here's a short list with what I believe are among the most helpful tools out there:
Now that you know how a general package structure looks like, we can start building a package 👩🏼💻
#RStudio is great, just follow these steps: Select "File", "New Project...", "New Directory" and select "R Package". You can now give your R package a meaningful name, select a path and hit "Create project".
You're now ready to go! Once executed, you have a fully functional package structure in your #Rproject (that we already discussed) 😊 Now it's time to move your function to your "R/" folder and populate it!
Writing a package sounds big -and it can for sure be. But in its simplest form, it’s not that much more than putting a function in a package structure. The #rstats community is great and came up with multiple great helpers that make your life easier!
💡 What’s in an R package?
Simply speaking, an R package allows you to put functions in a box and make them available for others to use.
Ideally, your R package also comes with unit tests that make sure that your package works (or if it doesn't throw meaningful errors and let you dive into the functions and explore why it doesn't) and it adheres to the common standards of developing a package.
Before we get into 📦 development, I wanted to share my favorite shortcuts in the RStudio IDE with you. There are so many out there (bit.ly/rstudio-shortc…) but these are the ones that I regularly use when changing something directly in my code 😊 #rstats
I love them because they usually make your life easier. The first one allows you to add a new R code chunk in your Rmd/Quarto file using "Option + Cmd + I" on a Mac (or "Ctrl + Alt + I"). And this is exactly what the GIF shows:
The next one makes writing a type operator so much simpler! At first, it feels a bit like looking for the keys but once you have it inherited, you probably won't want to go back 😊 So instead of typing "%>%" you can now use "Cmd + Shift + M" on a Mac (or "Ctrl + Shift + M")