Tweet

Neal Haddaway

8 Oct, 6 tweets, 3 min read

I've built an R package to scrape search results from up to 100 pages of hits from Google Scholar: github.com/nealhaddaway/G…

It's probably a bit buggy, but feedback greatly appreciated!

Here's a thread on how it works...

#AcademicTwitter #InformationRetrieval #MedLibs

1) It generates a set of up to 100 URLs, each corresponding to a page of visible search results

2) It then downloads each of these pages of results as an HTML file

3) It then extracts the HTML code, identifies each unique record, and extracts what information is visible into a dataframe of citations

4) A few bugs to iron out, and I need to convert this dataframe into an .ris format to facilitate repair from other sources, but it's an OK start, I think

5) Other packages like Anne-Will Harzing's 'Publish or Perish' obviously do this already, but they don't extract DOIs, and they don't function in R, hence this project

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @nealhaddaway

Neal Haddaway

@nealhaddaway

9 Oct

Check out our new paper on the need for better understanding of academic searching:
onlinelibrary.wiley.com/doi/10.1002/jr…

#InformationRetrieval #MedLibs #EvidenceSynthesis

A thread with our key points... (1/20)

We agree with others that we now face an 'information crisis' (#infodemic). There is SO much published research we need to find and digest.

Doing this reliably requires systematic review approaches, but even then, it's hugely challenging to find all relevant research.
(2/20)

Academic searching/information retrieval is an art form, and there is no 'perfect' search strategy - it takes careful planning and requires substantial skill and training.

These searches are highly complex and must be used in fit-for-purpose bibliographic databases. (3/20)

Read 20 tweets

Neal Haddaway

@nealhaddaway

24 Apr 19

Here's a thread on what's very wrong about using only Google Scholar for a literature view, as done by Buribalova et al. in this recent review: onlinelibrary.wiley.com/doi/full/10.11… #SystematicReview #EvidenceSynthesis

Problem 1: Their search string is flawed. The authors say they used a 'systematic' search for articles on Google Scholar. However, their search will not work as intended because:

a) GS doesn't support Boolean operators (AND, OR, NOT); b) you cannot nest more than one substring (bracketed set of synonyms) in a search; c) they have not nested geographical synonyms within brackets...

Read 15 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Share this page!

Neal Haddaway

Try unrolling a thread yourself!

More from @nealhaddaway

Neal Haddaway

Neal Haddaway

Did Thread Reader help you today?

Like this author's thread?