I see still a lot of #RStats tweets about the base pipe vs. the {magrittr} pipe, often in favor of the former.
I have to admit: I still stick to the magrittr pipe and here is a Tweetorial on why.
🧵
%>%
The {magrittr} pipe supports at least seven advanced features.
Some of them are quite useful and once used to them, they are hard to give up.
1. One feature, that comes in handy quite often, is that we can use the dot `.` as unnamed placeholder.
%>%
Although the base pipe supports a placeholder `_` since R 4.2, the downside is that it has to be named. So when using the base pipe with functions taking the ellipsis `…` as argument this leaves es with two choices.
%>%
i) wrap the expression in an anonymous function (that’s waaay to many parentheses)
ii) use the placeholder `_` with random unused arg names (not really a great idea either).
%>%
2. With {magrittr} pipe we can use the same placeholder `.` several times. I don’t use this feature that often, but in some cases it definitely helps to save some typing - especially with long variable names.
%>%
3. We can further use the placeholder inside nested functions.
This is one of the biggest features for me, as I regularly use this with `bind_rows()` as in the example below.
%>%
4. We can use the placeholder on the left-hand side …
… and …
%>%
5. … the right-hand side of `[[` and similar functions.
I don’t make use of these features very often, but they come in handy from time to time.
6. The {magrittr} pipe doesn’t require parentheses after the function call, so …
`mtcars %>% glimpse`
… works.
I know it’s not good style to pipe into a bare object name, but in interactive analysis I use this ALL THE TIME.
%>%
Especially when using a shortcut for the pipe it’s much faster than wrapping an object into a function with parentheses. like `glimpse(mtcars)`.
%>%
7. With the {magrittr} pipe we can define functions by starting with the `.`
To be honest, I haven’t used this feature once. Especially since R introduced the shorthand syntax for anonymous functions, I don’t think that this feature is a big deal.
%>%
Apart from those technical reasons, I also stick to the {magrittr} pipe when answering questions on SO.
Not everyone uses R >= 4.1 yet, while the {magrittr} package has 1.2 million downloads a month on CRAN, and the pipe itself is reexported by many other packages.
%>%
What are the downsides for *not* using the base pipe?
I see three benefits in using the base pipe:
|>
1. It has no dependencies (apart from R >= 4.1)
As already mentioned above, the {magrittr} pipe is widely spread so for now this is not such a huge benefit.
However, this will change with more people using the base pipe and with more users being on R >= 4.1.
|>
2. Unlike the {magrittr} pipe, the base pipe *is not* a function call. The parser converts
`mtcars |> glimpse()`
to
`glimpse(mtcars)`
This in turn has two main benefits.
First, the base pipe has no overhead, so it is always faster than the {magrittr} pipe.
|>
Second, we don’t see it in the call stack, which makes debugging easier.
This are two great arguments for using the base pipe, but for me, they don’t outweigh the benefit gained by the extra functionality.
%>%
Regarding, speed: Yes, being a function call, the {magrittr} pipe has some overhead.
But since v 2.0 the pipe is implemented in C and has massively improved in terms of speed.
For a more in-depth analysis see also this blog post by @MyKo101AB
Also: If speed is an issue there are probably better ways to speed up code than replacing the {magrittr} with the base pipe.
%>%
Regarding debugging:
This, too, was an issue of the older {magrittr} version. Since version 2.0 the only thing that shows up on the call stack is the call to `%>%` - that’s it.
The {magrittr} pipe comes with seven cool features.
We can:
1. use `.` with unnamed args 2. use several `.` 3. use `.` in nested functions 4. use `.` on the lhs and … 5. … rhs side of `[[` etc. 6. omit `()` of function calls 7. use `. ` to define functions
%>%
The base pipe has three advantages:
1. no dependency 2. speed 3. not part of the call stack
But for me that’s not enough to outweigh the benefits of the {magrittr} pipe.
Do you have something to add?
Just pipe it in this Tweet 😉
%>%
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Big data is a no brainer: up to a certain point, like one million rows, speed is not really an issue. But once we are dealing with several million rows, things might get slow depending on the framework / code base and the computer power.
Simulations: Normally I don’t care if an operation takes 20 or 2 seconds, unless I’m running it several thousand times. This can make the difference between running something over lunch (1 hour) or over night (10 hours).