In the coming years we believe #MLonCode will start to drastically change developer tooling in the following areas:
* QA & Testing
* API Understanding
* Code Review/Quality
For #MLonCode to make an impact in these areas we believe the following ingredients are needed:
To make this accessible to others, we released Public Git Archive:
* Universal Abstract Syntax Trees
* Abstractions for high-level concepts (functions, imports etc.)
* Ability to resolve cross-references
Which is why we're working on Babelfish, check it out here: dashboard.bblf.sh
Which is why we're working on Gitbase as a SQL layer on top of Git: github.com/src-d/gitbase & github.com/src-d/go-git
Which is why we're working on the engine (extending Apache Spark for #MLonCode): github.com/src-d/engine
Which is why we are training large-scale identifier embedding models on top of 10s of millions of repositories: github.com/src-d/models
So far we have tackled structural embeddings on top of UAST's: github.com/src-d/models (combined with identifier embeddings, these are very powerful).
Which is why we have github.com/src-d/modelfor… & github.com/src-d/datasets but also tools like code annotation: github.com/src-d/code-ann…