@otis_reid Columns (and rows) of real life matrices are typically not independent (example: matrix row = person, column = movie, entry = whether person liked movie or not). If you like Terminator 1, you probably like Terminator 2. 1/n
@otis_reid This means that each row i has a vector vec_i and column j has vector vec_j, where vec_i dot vec_j = matrix_{ij}. The dimensionality of vec_i and vec_j are less than the rank of the original matrix, typically orders of magnitude less 2/n
@otis_reid This means that if you observe some fraction of the elements of the matrix, you can get the others. Since many real life structures (e.g. graphs) can be represented as matrices, this means this kind of decomposition and trick applies to them too 3/n
@otis_reid Because nothing I said requires a particular assumption about functional form, you can do things like suppose that the matrix entries are 0/1, nothing stops you from saying p(i has edge to j) = sigmoid(vec_i dot vec_j). 4/n
@otis_reid There's lots of other technical stuff to be talked about, but basically the idea is that most problems don't have nearly as many parameters as you think they do since most important things in life are correlated. Matrix completion is a general way of taking advantage of that 4/n
@otis_reid "There is a low dimensional latent space behind everything" is basically the reason that modern machine learning works. All deep learning does is project high dimensional things (pixels) into a low dimensional space and then run a linear classifier on top of the projection. n/n
@otis_reid Note all of this was actually invented by psychologists. What is personality psych if not saying "the matrix where m_{ij} = does person i do behavior j? can be described by 5 latent factors"? IQ is just saying, "answering q1 correctly is correlated with answering q2 correctly."
@otis_reid Also, the fact that this is not taught to economists in Metrics 101 is baffling and crazy.

