Pablo Gomez Profile picture
Nov 28 14 tweets 8 min read
Inspired by @PaoloCrosetto's blog on #MDPI, this weekend I've been on a side-quest exploring their journals publications using R and web scraping package #rvest. A thread 🧵1/11 👇
#Rstats
Basically, I use two loops to download public editorial information. My example code is for "Plants #MDPI" but is easy to adapt to other journals. Takes about 2 hours to run (>8000 papers to explore).
The first loop, extracts identity urls from MDPI's search website.
2/11
Sys.sleep(1) is there to slow down the extraction and avoid being kicked out by the search server. Can be probably speed up reducing the value. 3/11
The second loop use the urls from loop one and search for editorial details of each of the published papers. Easy peasy. The more publications the journal has the longer the process takes. 4/11
And now, some numbers and graphs. #MDPI Plants has about 8900 papers. About 200ish of these have been accepted straight away (not even minor revision)... and I decided against used these in the analysis to save me time. This is how the dataset looks in time. 5/11
I'm curious about #MDPI's time between submission and publication... hence, I had to take a peek to how they are doing. From 2017 the monthly average time between submission and publication does not go over 50 days. Bear in mind I have not included error bars! 6/11
This paper: mdpi.com/2223-7747/10/1…, was submitted on a 18th of December... resubmitted with revisions on a 21st of December and published on a 24 of December. A Christmas miracle! 7/11
On the opposite side, this one took almost 500 days! (an anomaly on the data set): mdpi.com/2223-7747/10/8… 8/11
I had to look as well to the proportion of papers published or not in special issues. Seems unusual these days for a paper to be out of a special issue for #MDPI Plants. 9/11
There are a lot of things that can be done via R's #rvest. Find which author publish more in the journal, which special issues have less/more papers, which institutions use this journal more frequently, abstract word clodus, etc... 10/11
The code, dataset and other outputs can be found here: github.com/pgomba/MDPI_ex… I plan to do more journals and build a tutorial, but if in the meantime you have any problem running it, I'm happy to help! 11/11
Bonus tweet 1: Diversity
Bonus tweet 2: Link to @PaoloCrosetto´s blog (referenced in tweet 1): "Is MDPI a predatory publisher?". Is the best analysis out there on MDPI´s business model. paolocrosetto.wordpress.com/2021/04/12/is-…
Bonus tweet 3, per request(@macerwan): Microorganisms.
Dataset: github.com/pgomba/MDPI_ex…

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Pablo Gomez

Pablo Gomez Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(