Tweet

Huda Nassar

23 Oct, 7 tweets, 4 min read

https://twitter.com/nassarhuda/status/1316809555154698242

The #julialang twitter data network was supposed to be part of this lecture but unfortunately I didn't have enough time -- so here's a thread about it.

How I built it: (1) take the #julialang tweets w >5 likes, (2) get the usernames, (3) find who they follow and build a network.

https://twitter.com/nassarhuda/status/1316809555154698242

When I first visualized the network, I noticed that there was an apparent separation of some clusters, so the next thing I did is color-coded all the nodes based on whether the words "julia", "python", "rlang"/"rstats" appear in their bios. The resulting figure is pretty amazing!

Now if you're like me, you'll probably wonder what's going on with the "two arms" branching out from the julia cluster. So here is an annotated figure w high degree nodes... Fun observation: Everyone I manually inspected in the first group (top in the figure) has a Japanese bio.

So far, I haven't really given any *numbers*... one thing I was very curious about is to find the clustering coefficients. Here's what I found: the global CC was 0.43, but when I extracted the julia subgraph, the CC jumped to 0.7! Figure: marker size is bigger if local cc >0.5

Here is another local clustering coefficient figure where the marker size is proportional to the local clustering coefficient value.

And last but not least... PageRank! Of course, I had to run PageRank on this network. Here is the PageRank visualization with node sizes proportional to the PageRank value... I guess not so surprising, a bunch of the bigger circles were purple circles. 💜

@austinbenson

Finally, just some references:
- Code available here: github.com/nassarhuda/MIT…

- Visualization method used: GLANCE -- honestly I was very proud of the visualizations our method, GLANCE, produced (cc: @austinbenson @dgleich). Here's a link to the paper: cs.cornell.edu/~arb/papers/GL…

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @nassarhuda

Huda Nassar

@nassarhuda

27 May

https://twitter.com/JuliaLanguage/status/1265278348005122049

I had so much fun working on this data science course!

One aspect of the fun I had was learning interesting information about the data I used. I share my learnings here and look forward to hearing about yours.

#julialang #datascience

https://twitter.com/JuliaLanguage/status/1265278348005122049

The next time you visit Yellowstone National Park to check out the Old Faithful geyser, know that if you wait for too long for the geyser to go off... you are likely to witness a longer eruption.

We use a cars dataset of car models with features such as horsepower and cylinders (& 5 more). We perform dimensionality reduction on this data & find out that European/Japanese cars cluster together whereas American cars form their own two clusters. But why? I'd love to find out

Read 13 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!