I've built an R package to scrape search results from up to 100 pages of hits from Google Scholar: github.com/nealhaddaway/G…

It's probably a bit buggy, but feedback greatly appreciated!

Here's a thread on how it works...

#AcademicTwitter #InformationRetrieval #MedLibs
1) It generates a set of up to 100 URLs, each corresponding to a page of visible search results Image
2) It then downloads each of these pages of results as an HTML file Image
3) It then extracts the HTML code, identifies each unique record, and extracts what information is visible into a dataframe of citations Image
4) A few bugs to iron out, and I need to convert this dataframe into an .ris format to facilitate repair from other sources, but it's an OK start, I think
5) Other packages like Anne-Will Harzing's 'Publish or Perish' obviously do this already, but they don't extract DOIs, and they don't function in R, hence this project

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Neal Haddaway

Neal Haddaway Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @nealhaddaway

9 Oct
Check out our new paper on the need for better understanding of academic searching:
onlinelibrary.wiley.com/doi/10.1002/jr…

#InformationRetrieval #MedLibs #EvidenceSynthesis

A thread with our key points... (1/20) Image
We agree with others that we now face an 'information crisis' (#infodemic). There is SO much published research we need to find and digest.

Doing this reliably requires systematic review approaches, but even then, it's hugely challenging to find all relevant research.
(2/20) Image
Academic searching/information retrieval is an art form, and there is no 'perfect' search strategy - it takes careful planning and requires substantial skill and training.

These searches are highly complex and must be used in fit-for-purpose bibliographic databases. (3/20) Image
Read 20 tweets
24 Apr 19
Here's a thread on what's very wrong about using only Google Scholar for a literature view, as done by Buribalova et al. in this recent review: onlinelibrary.wiley.com/doi/full/10.11… #SystematicReview #EvidenceSynthesis
Problem 1: Their search string is flawed. The authors say they used a 'systematic' search for articles on Google Scholar. However, their search will not work as intended because:
a) GS doesn't support Boolean operators (AND, OR, NOT); b) you cannot nest more than one substring (bracketed set of synonyms) in a search; c) they have not nested geographical synonyms within brackets...
Read 15 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!