Profile picture
Eiso Kant @eisokant
, 11 tweets, 6 min read Read on Twitter
For my first tweet storm, I wanted to share a bit about the @sourcedtech open-source stack and how recent releases and announcements fit into the bigger picture.

In the coming years we believe #MLonCode will start to drastically change developer tooling in the following areas:
* Security & Compliance
* QA & Testing
* API Understanding
* Code Review/Quality

For #MLonCode to make an impact in these areas we believe the following ingredients are needed:
1. Large datasets of millions of repositories (thank you, @github)

To make this accessible to others, we released Public Git Archive:
2. A language-agnostic representation of code
* Universal Abstract Syntax Trees
* Abstractions for high-level concepts (functions, imports etc.)
* Ability to resolve cross-references

Which is why we're working on Babelfish, check it out here: dashboard.bblf.sh
3. Ability to query the history of millions of repositories, their source code, and the language agnostic representations of it

Which is why we're working on Gitbase as a SQL layer on top of Git: github.com/src-d/gitbase & github.com/src-d/go-git
4. Ability to learn from source code, which means fast scalable distributed processing of billions of Universal ASTs & their diffs

Which is why we're working on the engine (extending Apache Spark for #MLonCode): github.com/src-d/engine
5. An understanding of natural language in code since language is intent i.e. naturalness of code

Which is why we are training large-scale identifier embedding models on top of 10s of millions of repositories: github.com/src-d/models
6. An understanding of structure in code at UAST level, at the project level and at global dependency level

So far we have tackled structural embeddings on top of UAST's: github.com/src-d/models (combined with identifier embeddings, these are very powerful).
7. Shareable, versionable datasets & models for the community to be able to use and improve upon

Which is why we have github.com/src-d/modelfor… & github.com/src-d/datasets but also tools like code annotation: github.com/src-d/code-ann…
8. And the most important: an #MLonCode community of ML researchers, PL enthusiasts, dev. tooling engineers, data engineers, dev. advocates, PM's, designers and many other profiles that believe in the future of a language-agnostic ML powered development experience.
We believe these are the fundamental components for building #MLonCode applications (and here we'll have some exciting announcements as well coming up later this summer).
Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to Eiso Kant
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member and get exclusive features!

Premium member ($3.00/month or $30.00/year)

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!