Tweet

Rémi Thériault

@RemPsyc

Jun 6 • 3 tweets • 2 min read Twitter logo

@easystats4u

The #rstats datawizard package (from the @easystats4u ecosystem) has two very useful functions to deal with duplicates.

* data_duplicated: Extract all duplicates including the first, unlike duplicated() or dplyr::distinct()

* data_unique: by default selects the "best" duplicate

data_duplicated() also contains an additional column reporting the number of missing values for that row, to help in the decision-making when selecting which duplicates to keep.

data_unique() can keep either the first, last, or "best" duplicate. The "best" duplicate (default) will pick the row with the smallest number of missing values. In case of ties, it picks the first one, as it is the one most likely to be valid and authentic, given practice effects

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

Share this page!

Enter Twitter Thread URL to Unroll

Rémi Thériault

People who liked this thread also liked...

Try unrolling a thread yourself!

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!