Inspired by @PaoloCrosetto's blog on #MDPI, this weekend I've been on a side-quest exploring their journals publications using R and web scraping package #rvest. A thread 🧵1/11 👇
#Rstats
Basically, I use two loops to download public editorial information. My example code is for "Plants #MDPI" but is easy to adapt to other journals. Takes about 2 hours to run (>8000 papers to explore).
The first loop, extracts identity urls from MDPI's search website.
2/11
Sys.sleep(1) is there to slow down the extraction and avoid being kicked out by the search server. Can be probably speed up reducing the value. 3/11
The second loop use the urls from loop one and search for editorial details of each of the published papers. Easy peasy. The more publications the journal has the longer the process takes. 4/11
And now, some numbers and graphs. #MDPI Plants has about 8900 papers. About 200ish of these have been accepted straight away (not even minor revision)... and I decided against used these in the analysis to save me time. This is how the dataset looks in time. 5/11
I'm curious about #MDPI's time between submission and publication... hence, I had to take a peek to how they are doing. From 2017 the monthly average time between submission and publication does not go over 50 days. Bear in mind I have not included error bars! 6/11
This paper: mdpi.com/2223-7747/10/1…, was submitted on a 18th of December... resubmitted with revisions on a 21st of December and published on a 24 of December. A Christmas miracle! 7/11
On the opposite side, this one took almost 500 days! (an anomaly on the data set): mdpi.com/2223-7747/10/8… 8/11
I had to look as well to the proportion of papers published or not in special issues. Seems unusual these days for a paper to be out of a special issue for #MDPI Plants. 9/11
There are a lot of things that can be done via R's #rvest. Find which author publish more in the journal, which special issues have less/more papers, which institutions use this journal more frequently, abstract word clodus, etc... 10/11
The code, dataset and other outputs can be found here: github.com/pgomba/MDPI_ex… I plan to do more journals and build a tutorial, but if in the meantime you have any problem running it, I'm happy to help! 11/11
Bonus tweet 1: Diversity
Bonus tweet 2: Link to @PaoloCrosetto´s blog (referenced in tweet 1): "Is MDPI a predatory publisher?". Is the best analysis out there on MDPI´s business model. paolocrosetto.wordpress.com/2021/04/12/is-…
Bonus tweet 3, per request(@macerwan): Microorganisms.
Dataset: github.com/pgomba/MDPI_ex…
Share this Scrolly Tale with your friends.
A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.