Andres Segura-Tinoco Profile picture
CS PhD student in the IRG at @UAM_Madrid | XAI, ML, RecSys, NLP, Argument Mining & Information Retrieval | Tweets in En & Es
Sep 13, 2022 5 tweets 3 min read
There are a variety of tutorials that will be given at @ACMRecSys 2022.

Let me share with you some information about 3 of them that I will be attending (website, paper and repository).

Then, during the tutorials (Sunday 09/18), I will share more info about them.

🧵1/5 1⃣ Neural Re-ranking for Multi-stage Recommender Systems.

▪️ Website: librerank-community.github.io

▪️ Paper: librerank-community.github.io

▪️ GitHub repo: github.com/LibRerank-Comm…

2/5
Sep 8, 2022 4 tweets 2 min read
How easy and cool it is to use pre-trained diffusion🧨 models (available from the @huggingface Hub) to generate images from input text.

Here's an example of a dancing blue monkey (maybe, break-dance).

🧵1/4 Image In this case, I am using the Stable Diffusion v1-4 model.

Here is the model card for more information: huggingface.co/CompVis/stable…

2/4
Apr 28, 2022 20 tweets 7 min read
¿Has escuchado hablar sobre los sistemas de recomendación basados en argumentos?

¿Te gustan las recomendaciones explicadas y argumentadas?

🚨SPOILER ALERT🚨: mi investigación es al respecto.

@RedDivulga @CrueUniversidad @UAM_Madrid @UCCUAM @EDoctorado_UAM

Abro #HiloTesis 👇 En mi tesis doctoral investigo sobre sistemas de recomendación basados en argumentos, específicamente, sobre la generación y explicación de recomendaciones personalizadas e interactivas a partir de argumentos extraídos de contenido textual en español.

2/20
Apr 28, 2022 20 tweets 7 min read
Have you heard about argument-based recommender systems?

Do you like to receive explained and argued recommendations?

🚨SPOILER ALERT🚨: my research is about it.

@RedDivulga @CrueUniversidad @UAM_Madrid @UCCUAM @EDoctorado_UAM

I open #HiloTesis 👇 In my doctoral dissertation I research on argument-based recommender systems, specifically, on the generation and explanation of personalized and interactive recommendations from arguments extracted from textual content in Spanish.

2/20
Sep 21, 2021 9 tweets 2 min read
Earlier today I asked the Machine Learning community a question about what methodology they use to solve regression problems and I got a lot of great answers.

Now, I'm going to share what is the approach or methodology I use when faced with these types of problems.

🧵1/9 1⃣ If there are only one or a few variables as input data and after decomposing the series I get that it does not have a seasonal behavior, then I use a (multivariate) linear regression, and establish a base case for my problem. Easy and concise!

2/9
Sep 17, 2021 9 tweets 2 min read
I want to share with you 5⃣ tips and insights I learned working on a project where I had to create and deploy more than 750 Machine Learning models (regressors, specifically).

🧵Let's get started!

1/9
1⃣ At the beginning of the project, spend time comparing different algorithms to solve your task. Then, select one of them and focus on model creation and validation.

We rarely have to change the selected algorithm, it is more common to have to clean or improve the data.

2/9
Aug 29, 2021 6 tweets 1 min read
Machine Learning Community:

When you have to solve a classification problem, what is usually your criterion for selecting the algorithm to use?

Below I share with you what I do, however, I would love to read what your methodology is.

1/6
Like everything in life: it depends, above all, depends on the data.

If the data has probability associated with it (like data coming from natural language), I like to try Naive Bayes or Maximum Entropy first, to get a base-line. Many times, it ends up being the solution.

2/6
Aug 9, 2021 4 tweets 2 min read
How cool is @TensorFlow Embedding Projector!

It allows you to:
▪️ Load your own data (word embeddings and metadata).
▪️ Select the dimensionality reduction algorithm
▪️ Visualize the data in 3D
▪️ and Filter vectors by similarity

1/4 I recommend you to check it out and play with it.

projector.tensorflow.org

2/4
Aug 4, 2021 13 tweets 4 min read
A common practice when we create NLP models is to use one-hot encoded vectors to feed classical ML models (Bayes, SVM, RandomForest), but to use embeddings (dense vectors) to feed Neural Networks.

But do you know what is the reason behind this?

Let me talk about it🧵

1/13 First of all, one-hot encoded vectors are discrete and sparse, which makes them suitable for representing words.

However, they do not retain similarity (or dissimilarity) information between words/context and their size depends on the length of the vocabulary.

2/13
Aug 2, 2021 7 tweets 2 min read
If you want to understand something, the best way to do it is to analyze it on your own.

Therefore, if you want to know and understand how your own language is made up, you can start from a descriptive text analysis of its words.

Let me tell you about this example🧵

1/7
Use case:

I took all the words contained in the Dictionary of the Spanish language (DSL) and separated them into 27 plain text files (1 for each letter).

I then performed 8 different descriptive analyses on those words, starting from the simplest to some more complex ones.

2/7
Jul 23, 2021 10 tweets 3 min read
It is amazing how much information can be obtained about the performance and behavior of a classifier using a Confusion Matrix.

In the following thread🧵, let me tell you a bit about what you can learn from your model using this technique.

1/10 To begin with, don't you find it strange that this technique is called Confusion Matrix and not Accuracy Matrix?

This is, because its main purpose is to allow to analyze the errors or confusion that the model has.

2/10
Jul 7, 2021 5 tweets 2 min read
I have been working on 3 free, hands-on Python courses on the following fields:

1⃣ Algorithms
2⃣ Machine Learning
3⃣ Natural Language Procesing

Below are the content and links for each of them. I really hope they will be useful for you to learn more about these topics.

1/5
Content of the Algorithms course:

1. Algorithm's Basics
2. Divide and Conquer
3. Graphs
4. Dynamic Programming
5. Probabilistic Algorithms
6. Similarity Functions
7. Chaotic Systems
8. Cellular Automata

github.com/ansegura7/Algo…

2/5
Jul 2, 2021 7 tweets 2 min read
Feature Scaling is one of the most useful and necessary transformations to perform on a training dataset, since with very few exceptions, ML algorithms do not fit well to datasets with attributes that have very different scales.

Let's talk about it 🧵

1/7 There are 2 very effective techniques to transform all the attributes of a dataset to the same scale, which are:
▪️ Normalization
▪️ Standardization

The 2 techniques perform the same task, but in different ways. Moreover, each one has its strengths and weaknesses.

2/7
Apr 15, 2021 5 tweets 2 min read
Why is there such a difference in the results of clustering the same data using the following 3 algorithms?

🔹K-means
🔹Hierarchical Agglomerative
🔹DBSCAN

Let me explain to you briefly 🧵

1/5 K-Means:

▪️ Partition-based approach
▪️ Reproducibility is low because it is sensitive to initial random selection of clusters
▪️ It is sensitive to outliers
▪️ The result is subjective
▪️ Does not reproduce globular clusters

2/5
Apr 13, 2021 6 tweets 2 min read
Clustering is one of the most common applications of Unsupervised Learning.

Let me share with you this thread🧵 with the most common classical and modern clustering algorithms, as well as the most commonly used similarity functions for quantitative and qualitative data.

1/6 Image Some of the traditional clustering approaches are:

🔹Centroid-based clustering
🔹Hierarchical clustering
🔹Density-based clustering
🔹Distribution-based clustering
🔹Fuzzy clustering
🔹Model-based clustering

2/6
Mar 24, 2021 5 tweets 2 min read
Difference between Batch Size and Epoch in ML/DL:

- Batch Size: number of samples processed before updating the model parameters.

- Epoch: number of complete passes over the training dataset.

A model can be updated multiple times during 1 epoch.

#MachineLearning Difference between Parameters and Hyperparameters in ML/DL:

- Hyperparameters: indicate how the model is and how it will be trained, e.g. learning rate, batch size.

- Parameters: are the internal variables of the model that will be updated during training, e.g. weights, bias.