Latest Twitter Threads by @philipvollet on Thread Reader App

Jul 27, 2021 • 9 tweets • 5 min read

Tools that make every data scientist and machine learning engineer better!

The universal swiss army knife infrastructure and scheduling collection: Redis as a data structures server for caching queue handling

Redis provides access to mutable data structures via a set of commands!

github.com/redis/redis

Jun 11, 2021 • 8 tweets • 4 min read

Insights from an open source influencer

I'm often asked how I get my content, over the years I've built an unusual technology stack for it

philipvollet.co

Some insights:

I use Feedly for most content inputs because I can access the content through a single API endpoint and scraping is often pure pain.

@feedly

Feedly saves me a lot of time and manual work.

feedly.com

Jun 11, 2021 • 4 tweets • 2 min read

XBNet: An Extremely Boosted Neural Network for Tabular Data with a novel architecture combining tree-based models with neural networks

arxiv.org/abs/2106.05239

github.com/tusharsarkar3/…

Trained by using a novel optimization technique, Boosted Gradient Descent for Tabular Data which increases its interpretability and performance.

Jun 11, 2021 • 4 tweets • 2 min read

Reconstructing Implicit Knowledge with Language Models.

Generating statements that explicate implicit knowledge connecting sentences in text.

aclweb.org/anthology/2021…

github.com/Heidelberg-NLP…

They make use of pre-trained language models which they refine by fine-tuning them on specifically prepared corpora that we enriched with implicit information and by constraining them with relevant concepts and connecting commonsense knowledge paths.

Jun 10, 2021 • 4 tweets • 2 min read

Deepface is a lightweight face recognition and facial attribute analysis framework in Python

$ pip install deepface

@serengil

Don't forget to spend some star love for the repository!

github.com/serengil/deepf…

It is a hybrid face recognition framework wrapping state-of-the-art models: VGG-Face, Google FaceNet, OpenFace, Facebook DeepFace, DeepID, ArcFace and Dlib

The library is mainly based on Keras & TensorFlow

Jun 10, 2021 • 4 tweets • 2 min read

Quant UX is a research, usability and prototyping tool to quickly test your designs & get data driven insights

@quant_ux

Quant-UX makes it simple to validate your ideas. Create a prototype, share a link & learn through user feedback and analytics

quant-ux.com

github.com/KlausSchaefers…

Jun 9, 2021 • 4 tweets • 2 min read

Code for: Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning.

github.com/OATML/Non-Para…

arxiv.org/abs/2106.02584

Jun 9, 2021 • 4 tweets • 2 min read

Figma-Low-Code use Figma designs directly in VUE.js applications.

The low code approach reduces drastically the hand-off time between designers and developers, reduces front-end code and ensures that the Figma design stays the single source of truth.

github.com/KlausSchaefers…

Don't forget to spend some star love for the repository!

Jun 3, 2021 • 4 tweets • 2 min read

coqui a library for advanced Text-to-Speech generation.

New paper: Efficient zero-shot multi-speaker text-to-speech model that improves similarity and speech quality for new speakers unseen in training.

@coqui_ai

$ pip install TTS

github.com/coqui-ai/TTS

The model reaches state-of-the-art results for similarity with new speakers and speech quality with only 11 speakers in training.

SC-GlowTTS: An Efficient Zero-Shot Multi-Speaker Text-To-Speech Model

edresson.github.io/SC-GlowTTS

Jun 2, 2021 • 14 tweets • 6 min read

Do you need social media data for your machine learning project?

- Twitter data?
- Reddit data?
- Facebook data?

Where to get it?

Reddit: Pushshift

Pushshift is a big-data storage and analytics project.

Most people know it for its copy of reddit comments and submissions.

reddit.com/r/pushshift/co…

May 22, 2021 • 5 tweets • 2 min read

Where to find trending machine learning papers?

3 tools to find what's trending: Find trending ArXiv papers on arxiv-sanity.com you can sort by categories and save for later reading

May 22, 2021 • 9 tweets • 3 min read

Why are graphs the future of biomedical research and what is the value of NLP here?

A small case study about:

How to speed up drug discovery with knowledge graphs and discover potential cures for diseases

In this case text mining is used to contextualize knowledge about:

- Genes
- Compounds
- Diseases
- Adverse drug effects
- Receptor bindings

May 20, 2021 • 4 tweets • 2 min read

Image Cropping on Twitter: Fairness Metrics, their Limitations, and the Importance of Representation, Design, and Agency

github.com/twitter-resear…

In fall 2020, Twitter users raised concerns that the automated image cropping system on Twitter favored light-skinned over dark-skinned individuals, as well as concerns that the system favored cropping woman's bodies instead of their heads

arxiv.org/abs/2105.08667

May 20, 2021 • 10 tweets • 4 min read

Did you think bringing your machine learning model to production was the hard part?

What about model drift?

Now MLOps comes into play but how does it work and what are good tools?

What is:
- Continuous integration (CI)
- Continuous deployment (CD)
- Continuous training (CT)

The full MLOps life cycle

- Data Engineering: Get and clean the data recurring if necessary
- Model Engineering: Model training, evaluation, testing, and packaging
- Model Deployment: integrating the trained model. Model serving, performance monitoring

May 16, 2021 • 4 tweets • 2 min read

Note taking apps are like muscle training - you have to do it every day.

How many times I have changed ...

From Evernote to OneNote to Google Keep to Notion and from Roam now to Obsidian

@obsdmd

Why?

Where the big ones like OneNote, Google Keep and Evernote fail is that the brain does not work like an index, thoughts are linked and associatively this is where the next generation of note taking apps show their strength.

Roam and Obsidian

roamresearch.com

May 15, 2021 • 14 tweets • 7 min read

Your open source project is ready for deployment? Documentation is still missing?

Good documentation and its presentation is an art!

A case study with 4 examples on awesome documentation

What makes good documentation?

- No prosaic texts! Choose a practical approach with code snippets
- Good structure and overview with a quick entry then in depth
- Good search is everything
- Good code examples

May 13, 2021 • 11 tweets • 6 min read

Where to get data for your next machine learning project?

An overview of 8 amazing resources to accelerate your next project with data!

- Google Datasets
- Big Bad NLP Datasets
- Hugging Face Datasets
- Papers with Code Datasets
- Open Data on AWS
- Awesome Public Datasets

Hugging Face Datasets

Mainly for NLP but the good news Hugging Face is expanding and we can be sure that they will add datasets for visual machine learning soon!

@huggingface

huggingface.co/datasets

May 11, 2021 • 6 tweets • 1 min read

How to get your dream job in Data Science if you are a career changer?

First you have to sneak around HR and their antiquated methods. This is only possible through contacts or unusual ways.

But what are good ways? The middleman

Someone who can hand over your application who has a connection to the company or someone who works there.

May 10, 2021 • 4 tweets • 2 min read

Does BERT Pretrained on Clinical Notes Reveal Sensitive Data? • Large Transformers pretrained over clinical notes from Electronic Health Records (EHR) have afforded substantial gains in performance on predictive clinical tasks.

Paper arxiv.org/abs/2104.07762
GitHub

↓ 1/4

github.com/elehman16/expo…

The cost of training such models and the necessity of data access to do so is coupled with their utility motivates parameter sharing, i.e., the release of pretrained models such as ClinicalBERT.

↓ 2/4

May 8, 2021 • 11 tweets • 2 min read

To build a chatbot you need data for your intent classification.

But what if you have too little training data?

Paraphrasing is one option for augmentation

But what is a good paraphrase?

Almost all conditioned text generation models are validated on 2 factors:

1. If the generated text conveys the same meaning as the original context (Adequacy)

2. If the text is fluent / grammatically correct english (Fluency)

May 7, 2021 • 8 tweets • 5 min read

How do you create a beautiful interface for your machine learning or data science project?

Handmade from scratch?
Any good tools?

Sure there are incredible tools: Beautiful ML & DS interfaces

Gradio
Quickly create customizable UI components around your ML models. By dragging-and-dropping in your own images, pasting your own text, recording your own voice & seeing what the model outputs.

@GradioML

github.com/gradio-app/gra…

Share this page!

Enter URL or ID to Unroll