Tweet

Pablo Gomez

Nov 28 • 14 tweets • 8 min read

@PaoloCrosetto

Inspired by @PaoloCrosetto's blog on #MDPI, this weekend I've been on a side-quest exploring their journals publications using R and web scraping package #rvest. A thread 🧵1/11 👇
#Rstats

Basically, I use two loops to download public editorial information. My example code is for "Plants #MDPI" but is easy to adapt to other journals. Takes about 2 hours to run (>8000 papers to explore).
The first loop, extracts identity urls from MDPI's search website.
2/11

Sys.sleep(1) is there to slow down the extraction and avoid being kicked out by the search server. Can be probably speed up reducing the value. 3/11

The second loop use the urls from loop one and search for editorial details of each of the published papers. Easy peasy. The more publications the journal has the longer the process takes. 4/11

And now, some numbers and graphs. #MDPI Plants has about 8900 papers. About 200ish of these have been accepted straight away (not even minor revision)... and I decided against used these in the analysis to save me time. This is how the dataset looks in time. 5/11

I'm curious about #MDPI's time between submission and publication... hence, I had to take a peek to how they are doing. From 2017 the monthly average time between submission and publication does not go over 50 days. Bear in mind I have not included error bars! 6/11

This paper: mdpi.com/2223-7747/10/1…, was submitted on a 18th of December... resubmitted with revisions on a 21st of December and published on a 24 of December. A Christmas miracle! 7/11

On the opposite side, this one took almost 500 days! (an anomaly on the data set): mdpi.com/2223-7747/10/8… 8/11

I had to look as well to the proportion of papers published or not in special issues. Seems unusual these days for a paper to be out of a special issue for #MDPI Plants. 9/11

There are a lot of things that can be done via R's #rvest. Find which author publish more in the journal, which special issues have less/more papers, which institutions use this journal more frequently, abstract word clodus, etc... 10/11

The code, dataset and other outputs can be found here: github.com/pgomba/MDPI_ex… I plan to do more journals and build a tutorial, but if in the meantime you have any problem running it, I'm happy to help! 11/11

Bonus tweet 1: Diversity

@PaoloCrosetto

Bonus tweet 2: Link to @PaoloCrosetto´s blog (referenced in tweet 1): "Is MDPI a predatory publisher?". Is the best analysis out there on MDPI´s business model. paolocrosetto.wordpress.com/2021/04/12/is-…

@macerwan

Bonus tweet 3, per request(@macerwan): Microorganisms.
Dataset: github.com/pgomba/MDPI_ex…

• • •

Missing some Tweet in this thread? You can try to force a refresh

Share this page!

Pablo Gomez

People who liked this thread also liked...

Try unrolling a thread yourself!

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!