Philip Vollet Profile picture
Head of Developer Growth @weaviate_io & Open source lover tweeting about machine learning and data science projects.
RacerMario Profile picture Leo Tideman Profile picture 2 subscribed
Jul 27, 2021 9 tweets 5 min read
Tools that make every data scientist and machine learning engineer better!

The universal swiss army knife infrastructure and scheduling collection: Redis as a data structures server for caching queue handling

Redis provides access to mutable data structures via a set of commands!

github.com/redis/redis
Jun 11, 2021 8 tweets 4 min read
Insights from an open source influencer

I'm often asked how I get my content, over the years I've built an unusual technology stack for it

philipvollet.co

Some insights: I use Feedly for most content inputs because I can access the content through a single API endpoint and scraping is often pure pain.

@feedly

Feedly saves me a lot of time and manual work.

feedly.com
Jun 11, 2021 4 tweets 2 min read
XBNet: An Extremely Boosted Neural Network for Tabular Data with a novel architecture combining tree-based models with neural networks

arxiv.org/abs/2106.05239

github.com/tusharsarkar3/… Trained by using a novel optimization technique, Boosted Gradient Descent for Tabular Data which increases its interpretability and performance.
Jun 11, 2021 4 tweets 2 min read
Reconstructing Implicit Knowledge with Language Models.

Generating statements that explicate implicit knowledge connecting sentences in text.

aclweb.org/anthology/2021…

github.com/Heidelberg-NLP… They make use of pre-trained language models which they refine by fine-tuning them on specifically prepared corpora that we enriched with implicit information and by constraining them with relevant concepts and connecting commonsense knowledge paths.
Jun 10, 2021 4 tweets 2 min read
Deepface is a lightweight face recognition and facial attribute analysis framework in Python

$ pip install deepface

@serengil

Don't forget to spend some star love for the repository!

github.com/serengil/deepf… It is a hybrid face recognition framework wrapping state-of-the-art models: VGG-Face, Google FaceNet, OpenFace, Facebook DeepFace, DeepID, ArcFace and Dlib

The library is mainly based on Keras & TensorFlow
Jun 10, 2021 4 tweets 2 min read
Quant UX is a research, usability and prototyping tool to quickly test your designs & get data driven insights

@quant_ux

Quant-UX makes it simple to validate your ideas. Create a prototype, share a link & learn through user feedback and analytics

quant-ux.com github.com/KlausSchaefers…
Jun 9, 2021 4 tweets 2 min read
Code for: Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning.

github.com/OATML/Non-Para… Image arxiv.org/abs/2106.02584
Jun 9, 2021 4 tweets 2 min read
Figma-Low-Code use Figma designs directly in VUE.js applications.

The low code approach reduces drastically the hand-off time between designers and developers, reduces front-end code and ensures that the Figma design stays the single source of truth.

github.com/KlausSchaefers… Image Don't forget to spend some star love for the repository!
Jun 3, 2021 4 tweets 2 min read
coqui a library for advanced Text-to-Speech generation.

New paper: Efficient zero-shot multi-speaker text-to-speech model that improves similarity and speech quality for new speakers unseen in training.

@coqui_ai

$ pip install TTS

github.com/coqui-ai/TTS The model reaches state-of-the-art results for similarity with new speakers and speech quality with only 11 speakers in training.

SC-GlowTTS: An Efficient Zero-Shot Multi-Speaker Text-To-Speech Model

edresson.github.io/SC-GlowTTS
Jun 2, 2021 14 tweets 6 min read
Do you need social media data for your machine learning project?

- Twitter data?
- Reddit data?
- Facebook data?

Where to get it? Reddit: Pushshift

Pushshift is a big-data storage and analytics project.

Most people know it for its copy of reddit comments and submissions.

reddit.com/r/pushshift/co…
May 22, 2021 5 tweets 2 min read
Where to find trending machine learning papers?

3 tools to find what's trending: Find trending ArXiv papers on arxiv-sanity.com you can sort by categories and save for later reading
May 22, 2021 9 tweets 3 min read
Why are graphs the future of biomedical research and what is the value of NLP here?

A small case study about:

How to speed up drug discovery with knowledge graphs and discover potential cures for diseases In this case text mining is used to contextualize knowledge about:

- Genes
- Compounds
- Diseases
- Adverse drug effects
- Receptor bindings
May 20, 2021 4 tweets 2 min read
Image Cropping on Twitter: Fairness Metrics, their Limitations, and the Importance of Representation, Design, and Agency

github.com/twitter-resear… In fall 2020, Twitter users raised concerns that the automated image cropping system on Twitter favored light-skinned over dark-skinned individuals, as well as concerns that the system favored cropping woman's bodies instead of their heads

arxiv.org/abs/2105.08667
May 20, 2021 10 tweets 4 min read
Did you think bringing your machine learning model to production was the hard part?

What about model drift?

Now MLOps comes into play but how does it work and what are good tools?

What is:
- Continuous integration (CI)
- Continuous deployment (CD)
- Continuous training (CT) Image The full MLOps life cycle

- Data Engineering: Get and clean the data recurring if necessary
- Model Engineering: Model training, evaluation, testing, and packaging
- Model Deployment: integrating the trained model. Model serving, performance monitoring
May 16, 2021 4 tweets 2 min read
Note taking apps are like muscle training - you have to do it every day.

How many times I have changed ...

From Evernote to OneNote to Google Keep to Notion and from Roam now to Obsidian

@obsdmd

Why? Where the big ones like OneNote, Google Keep and Evernote fail is that the brain does not work like an index, thoughts are linked and associatively this is where the next generation of note taking apps show their strength.

Roam and Obsidian

roamresearch.com
May 15, 2021 14 tweets 7 min read
Your open source project is ready for deployment? Documentation is still missing?

Good documentation and its presentation is an art!

A case study with 4 examples on awesome documentation What makes good documentation?

- No prosaic texts! Choose a practical approach with code snippets
- Good structure and overview with a quick entry then in depth
- Good search is everything
- Good code examples
May 13, 2021 11 tweets 6 min read
Where to get data for your next machine learning project?

An overview of 8 amazing resources to accelerate your next project with data!

- Google Datasets
- Big Bad NLP Datasets
- Hugging Face Datasets
- Papers with Code Datasets
- Open Data on AWS
- Awesome Public Datasets Hugging Face Datasets

Mainly for NLP but the good news Hugging Face is expanding and we can be sure that they will add datasets for visual machine learning soon!

@huggingface

huggingface.co/datasets
May 11, 2021 6 tweets 1 min read
How to get your dream job in Data Science if you are a career changer?

First you have to sneak around HR and their antiquated methods. This is only possible through contacts or unusual ways.

But what are good ways? The middleman

Someone who can hand over your application who has a connection to the company or someone who works there.
May 10, 2021 4 tweets 2 min read
Does BERT Pretrained on Clinical Notes Reveal Sensitive Data? • Large Transformers pretrained over clinical notes from Electronic Health Records (EHR) have afforded substantial gains in performance on predictive clinical tasks.

Paper arxiv.org/abs/2104.07762
GitHub

↓ 1/4 github.com/elehman16/expo…

The cost of training such models and the necessity of data access to do so is coupled with their utility motivates parameter sharing, i.e., the release of pretrained models such as ClinicalBERT.

2/4
May 8, 2021 11 tweets 2 min read
To build a chatbot you need data for your intent classification.

But what if you have too little training data?

Paraphrasing is one option for augmentation

But what is a good paraphrase? Image Almost all conditioned text generation models are validated on 2 factors:

1. If the generated text conveys the same meaning as the original context (Adequacy)

2. If the text is fluent / grammatically correct english (Fluency)
May 7, 2021 8 tweets 5 min read
How do you create a beautiful interface for your machine learning or data science project?

Handmade from scratch?
Any good tools?

Sure there are incredible tools: Beautiful ML & DS interfaces

Gradio
Quickly create customizable UI components around your ML models. By dragging-and-dropping in your own images, pasting your own text, recording your own voice & seeing what the model outputs.

@GradioML

github.com/gradio-app/gra…