1) It generates a set of up to 100 URLs, each corresponding to a page of visible search results
2) It then downloads each of these pages of results as an HTML file
3) It then extracts the HTML code, identifies each unique record, and extracts what information is visible into a dataframe of citations
4) A few bugs to iron out, and I need to convert this dataframe into an .ris format to facilitate repair from other sources, but it's an OK start, I think
5) Other packages like Anne-Will Harzing's 'Publish or Perish' obviously do this already, but they don't extract DOIs, and they don't function in R, hence this project
• • •
Missing some Tweet in this thread? You can try to
force a refresh
We agree with others that we now face an 'information crisis' (#infodemic). There is SO much published research we need to find and digest.
Doing this reliably requires systematic review approaches, but even then, it's hugely challenging to find all relevant research.
(2/20)
Academic searching/information retrieval is an art form, and there is no 'perfect' search strategy - it takes careful planning and requires substantial skill and training.
These searches are highly complex and must be used in fit-for-purpose bibliographic databases. (3/20)
Problem 1: Their search string is flawed. The authors say they used a 'systematic' search for articles on Google Scholar. However, their search will not work as intended because:
a) GS doesn't support Boolean operators (AND, OR, NOT); b) you cannot nest more than one substring (bracketed set of synonyms) in a search; c) they have not nested geographical synonyms within brackets...