Lin (🌊, 🏛️) Profile picture
Independent Researcher, PKU CS & Blockchain Alumni. Decoding Market Structure & Logic. Not financial advice. Views are my own. 📝Deep Dives

Jul 25, 2022, 12 tweets

#TwitterAPI #爬虫

【How to automatically get Twitter data through Python】

(Python自动抓取Twitter数据经验分享,中英双语)

It may be helpful for those who know a little Python but not a speciallized programmer.

(本经验适用于那些对Python略有了解、但并不精通编程的小伙伴)

Twitter is the most commonly used social media for Web3ers. If we have plenty of Twitter data, we can perform many interesting tasks, like automatically tracks a KOL’s new followings and find a project’s early followers. Here I'm going to share my experience for this automation:

Twitter是Web3用户中使用频率最高的社交媒体。如果我们可以自动获取大量Twitter数据,我们就可以做很多有意思的事情,比如自动跟踪KOL的新关注、分析一个项目的早期关注者和其传播链。这里我将分享有关Python自动爬取Twitter数据的亲身经验:

【Way 1: Twitter API】

Description: Twitter’s official access to get data. You can apply for a Twitter API in (developer.twitter.com/en/docs/twitte…) and use Tweepy (tweepy.org) to access the API through Python.

【Way 1: Twitter API】

Pros: Simple, Fast, Stable

Cons: (Fatal!) The requests rate are quite limited (900 requests/every 15min) for any research that requires large amount of data.

【方案1:通过Twitter API】

说明:Twitter API是Twitter官方开发的数据接口,可以通过developer.twitter.com/en/docs/twitte…申请,并用Tweepy(tweepy.org)库来用Python调用它。
(中国大陆的用户建议淘宝购买一个临时的英美手机号注册临时Twitter,因为中国大陆地区的手机很难通过申请)

【方案1:通过Twitter API】

优点:简单、快速、稳定

缺点:(可能致命!)Twitter API的访问频率有900次/每15分钟的限制,这让几乎任何需要大规模数据的自动化任务难以实现。

【2. Twitter Scraper】

Description:Access the data by automatically control your browser to mimic real human’s actions and get the data through HTML. However, due to Twitter’s anti-scraper efforts, many public Github repository on Twitter scraper no longer works now.

Scweet(htttps://github.com/Altimis/Scweet) is an available repository, but it still needs many adjustments to make it work on your computer

Pros: No limits, Personalize

Cons: Complex, comparatively Slow, Unstable(depends on your network connections), Against the Twitter Terms

【方案2:通过Twitter爬虫】

说明:通过自动控制浏览器模拟机械重复的人工操作,从网页HTML获取数据。但由于Twitter反爬虫策略的迭代,多数公开的Github爬虫代码库已经不可用。

Scweet(htttps://github.com/Altimis/Scweet) 是个人亲测可用的代码库,但也无法直接跑通,需要做不少的调整。

优点:无限制、个性化

缺点:复杂、相对较慢、不稳定(取决于你的网络连接)、违反了Twitter使用条款

If find my experience sharing is helpful, please like&retweet it! I am going to share more relavant experiences if there are enough retweets~
如果你觉得我的经验分享有帮助,请点赞+转发!如果有足够的转发,我将分享更多的相关经验~

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Keep scrolling