Discover and read the best of Twitter Threads about #scraping

Most recents (10)

💥🔥 El Poder de los datos para #OSINT: ¿Qué tan expuesto está nuestra #HuellaDigital en internet?, ¿Es fácil recopilar y vincular en diferentes servicios digitales nuestro #RastroDigital?, pues claro que si, y en este 🧵MEGAHILO🧵, te enseño como...

#Ciberinteligencia #Socmint Image
1⃣ Iniciaremos esta ciberinvestigación, desde un mensaje #Spam o #Phishing, esos que a diario se difunden en las diversas aplicaciones de mensajerías como WhatsApp.

💥 El número de remitente empezaba con el código de área "+62", que al parecer era de Indonesia...

⬇️ Image
2⃣ Realizando una búsqueda inversa del teléfono podemos encontrar registros en los sgtes servicios digitales:
💥 WhatsApp
💥 Instagram
💥 Google
Y lo más importante... Una vinculación con una cuenta en:
💥 Telegram
(Esto nos servirá para continuar nuestra investigación)

⬇️ Image
Read 17 tweets
🚀Degen Strategy🚀

💰How I turned $2048 into $100,000 in 1 month, and NOT losing it all back.

👉 No bullshit, at the end of the thread, there is a link for you guys to verify trade record on @Myfxbook

🆘This is NOT AN EASY STRATEGY.🆘
#forex #crypto Image
1⃣ A High Conviction View.
$DXY (USD Index) was trading around 105, a hard support, which if it holds, I have high conviction it will bounce to 110 in a short time, aka, bullish continuation bias.

With that view, I chose to short #EURUSD and Long #USDJPY Image
2⃣ Money Management (MM) is 🔑

I deposited $2048 on 08.08.2022, with that fund, I can trade up to 100 lot #EURUSD. But my trade size is only 0.1 to 0.5 lot.

First trade was a loss, neverminded, I tried again the next day and started making some small wins. Image
Read 8 tweets
No matter how benevolent a dictatorship is, it's still a dictatorship, and subject to the dictator's whims. We must demand that the owners and leaders of tech platforms be fair and good - but we must also be prepared for them to fail at this, sometimes catastrophically. 1/ Moses confronting the Pharaoh, demanding that he release the
If you'd like an essay-formatted version of this thread to read or share, here's a link to it on pluralistic.net, my surveillance-free, ad-free, tracker-free blog:

pluralistic.net/2022/12/23/sem… 2/
Maybe you trust #TimCook to decide what apps you are and aren't allowed to install - including whether you are allowed to install apps that block #Apple's own extensive, nonconsensual, continuous commercial surveillance of its customers. 3/
Read 118 tweets
Data analysts should be able to efficiently build datasets using the content on the web.

Here I will show you a use case for a small script I created that allows the user to build corpora of textual information from online blogs.

People working in #seo can definitely use this👇
It is a #Python script that efficiently organizes data in a #pandas dataframe for ease of use and readability.

It leverages Trafilatura, an open source library that is able to read and follow links from the website's sitemap and identify the main content of a page.
This is particularly important because each website has its own structure and it is often difficult to grab the main content as opposed, for instance, to the sidebar.

Trafilatura uses a series of heuristics to find the main text of the page, and it works pretty damn good.
Read 11 tweets
Twitter Mining & Web Scraping Projects using Pytho🐍

Thread: 🧵

#Python #pythonprojects #Scraping #Mining
Mining Twitter Data with Python

1: Collecting Data (this article)
2: Text Pre-processing
3: Term Frequencies
4: Rugby and Term Co-Occurrences
5: Data Visualisation Basics
6: Sentiment Analysis Basics
7: Geolocation and Interactive Maps

🔗
marcobonzanini.com/2015/03/02/min…
Web Scraping with Scrapy and MongoDB

Python program to scrape data from Stack Overflow to grab new questions (question title and URL).
Scraped data should then be stored in MongoDB.

🔗
realpython.com/web-scraping-w…
Read 5 tweets
3 days to finish the year, and I decided to do a countdown with the TOP 3 blog posts I've written in 2021. And some context. 🧵

Direct to #3: DOs and DON'Ts of Web #Scraping

zenrows.com/blog/dos-and-d…
Published December 21, the last one of the year, and straight to #3! No way we could have seen it coming.

SEO wasn't relevant here, no time for it to work either. #GoogleDiscover launched us there in our official blog. But even in other sites we publish, it has great numbers.
Coming in at #2 is Web Scraping with #Javascript and #NodeJS

Probably the longest and with more code in the whole lot. Published on September 1, the primary source in Google through SEO. Many interesting keywords in high positions.

zenrows.com/blog/web-scrap…
Read 7 tweets
New post! DOs and DON'Ts of Web #Scraping

Learn how to create better web scrapers by following best practices and avoiding common mistakes. Choose the right approach for the job thanks to these tips.

zenrows.com/blog/dos-and-d…
1. DO Rotate IPs

The most common anti-scraping solution is to ban by IP. By using proxies, you'll avoid showing the same IP in every request and thus increasing your chances of success. You can code your own or use a Rotating Proxy like zenrows.com.
2. DO Use Custom User-Agent

Overwrite your client's User-Agent, or you'll risk sending something like "curl/7.74.0".

But always sending the same UA might be suspicious, too, so you need a vast and updated list.
Read 10 tweets
New post! Web #Scraping: Intercepting XHR Requests.

Take advantage of XHR requests and scrape websites content without any effort. No need for fickle HTML or CSS selectors, API endpoints tend to remain stable.

zenrows.com/blog/web-scrap…
@playwrightweb and other headless browsers allow response/request interception. We can take advantage and inspect them easily.

Those responses usually come already formatted and structured.
We covered auction.com, twitter.com, and nseindia.com as examples, but the opportunities are infinite.

And not just for the first load, but from any subsequent browsing. The same rules apply.
Read 4 tweets
è difficile capire le rivelazioni del #Foglio senza accedere ai documenti. Da quello che riesco a capire i dati raccolti erano semplicemente estratti dai social network attraverso #scraping e forse dal #deepweb: la raccolta di dati pubblici è semplice doxing
se, però, dovesse emergere che i dati raccolti da società cinese su personalità della politica, economia, ecc includono informazioni riservate, allora è diverso. Difficile capire con queste poche informazioni
mi fa piacere vedere che si levano voci dalla politica per chiedere chiarezza su raccolta di dati da parte della società cinese. Ricordiamo vergognoso silenzio della politica su nostre rivelazioni intercettazioni #NSA. E inquietante insabbiamento inchiesta procura #Roma
Read 3 tweets

Related hashtags

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!