The article is also a practical #guide to help researchers select an appropriate #method and degree of preprocessing for their own data. Our #corpus consisted of 6.041 summaries of reviews of contemporary German #literature acquired from @perlentaucher00 (2/9)
The linguistic #complexity of #literature reviews differ from other texts with regard to their #language-- ambiguity, irony, metaphors, etc.-- are comparatively difficult to capture with #computational approaches. So how did our different methods fare? (3/9)
We began with some "off-the-shelf" dictionaries, built our own context-specific dictionaries with #WordEmbedding models, trained a #Wordscores algorithm with human coded reviews, and ran a fully computational #Wordfish model (4/9)
Prefabricated #dictionaries, counting #positive and #negative words, are computationally efficient and easy to implement. Yet they yielded only a low to medium #correlation with the #sentiment identified by human coders (5/9)
We therefore created our own dictionaries based on #WordEmbeddings, including terms which appear in similar contexts as known #positive and #negative words. However, despite their #computational complexity, the self-created dictionaries performed poorly with our corpus ☹️ (6/9)
Other #MachineLearning methods yielded mixed results: While the automated scaling method #Wordfish also performed poorly (likely due to our corpus also comprising non-fiction texts), semi-automated scaling with #Wordscores performed really well😃 r of around 0.6 (7/9)
But the good performance of #Wordscores comes at the high #cost of manually coding training texts. We therefore conclude: choose your method carefully; #computational methods can work, even with #complex texts (see: #QualiFiction) (8/9)