@vgr All machine learning algorithms are biased, the only question is do we know what the biases are and do we care?
All statistical machine learning algorithms are essentially devices for interpolating between data points already seen. They can’t generalize to novel situations.
85% of the progress in machine learning over the past 15 years is due to increases in hardware performance and availability.
The true science of machine learning is the science of figuring out which biases are needed to learn for which tasks.
Nobody knows how to build systems with common sense. We don’t even have the equivalent of our chemical theory about this
Nobody knows how to build systems with common sense. We don’t even have the equivalent of alchemical theory about this yet.
Knowledge = justified true belief.
But justification requires communication, truth can be a matter of degree, and belief is a matter of the causal structure of an agent, not sentences in its head.
#MachineLearning on one foot: Use an appropriate representation and bias to update your priors from the data. All else is commentary. Now go and learn.
12. All good theoretical frameworks for machine learning are equivalent to Bayes, or approximately so.
13. Connectionist methods will need to either incorporate or simulate symbolic processing to get beyond perception/response tasks.
14. We will need fundamentally new conceptual breakthroughs to get beyond current "interpolative" #ArtificialIntelligence. These will be very different from anything currently conceived of, and will not achieve #SOTA on any known tasks for some time.
15. SOTA- and publication-chasing are bad for science and bad for scientists.
16. Knowledge (such as it is) inheres in the system as a whole, not any particular representations or algorithms.
17. A "brain in a vat" knows nothing, as it is causally connected to nothing.
18. Imagine what we could accomplish using GOFAI with the knowledge-base equivalent of the computing power we can now devote to backpropagation learning! Incredible!
21. Replace the phrases "artificial intelligence" and "machine learning" in any news article by the phrase "computer program", and remove the phrase "learns like a person/baby/brain", and see if the achievement seems as cool or impressive.
22. Then read the original research article to see what was actually accomplished.
23. Language learning is not just machine learning applied to sequences. Nor is automated genomics. Nor is time series analysis. Ad infinitem.
24. You can get 80-plus percent of the possible accuracy by applying advanced machine learning to a problem without knowing anything about the domain. You can also do that using logistic regression.
25. Does deep learning solve problem X? First, compare it to logistic regression and naïve Bayes. Then you might get a clue.
(If the authors didn’t, be suspicious, very suspicious.)
26. As a field, #MachineLearning is stuck in a local minimum. Not all learning is function approximation, and I warrant most of the really interesting kinds of learning are not. Let’s go back to exploring the full space of learning tasks and methods.
Bonus.
STOP CHASING SOTA!
</rant>
27. #MachineLearning is the science of finding the right bias for the problem. So… KNOW YOUR DOMAIN!
28. If your fancy #MachineLearning system gets 99.5% accuracy (wow!!), either:
a. You have a bug in your evaluation procedure, or
Thus, if you create a black box model with perfect prediction accuracy, you have FAILED.
30. For any #MachineLearning or #ArtificialIntelligence application, the “system” includes the people that use it, and the organizational matrix they are embedded in. Any analysis that does not account for that is woefully inadequate.
31. "Neural networks" are not brain-like. Unless you take your glasses off and squint. After a couple of beers.
32. The problems of "bias" and "fairness" in #MachineLearning are mainly problems of specifying implicit assumptions, and have no purely technical solutions. The real issue is a version of the old "is/ought" conundrum.
34. Don’t be afraid of the algorithms getting too smart, be afraid of people giving them too much power before they do.
35. You know something if you can do something with it. That might involve action or communication, but beware: It is very easy to convince insufficiently critical observers that you know something, even inadvertently. Even yourself.
The first principle is that you must not fool yourself – and you are the easiest person to fool.
--Richard Feynman
36. Your #MachineLearning algorithm works - congratulations! Excellent accuracy on out-of-sample data - wonderful!
But do you know if it learned what you wanted it to learn? Does it recognize stop signs, or large-enough-red-regions-with-certain-specific-other-colors-nearby? Hm.
37. What statistical assumptions does your method make? More importantly, what assumptions does your evaluation procedure make? And do they match reality? (Answer: No.)
The proof of the model is in the eating.
38. Statistical #MachineLearning is algorithmic demagoguery - it is winner-take-all for (often subtle) patterns with a slight majority (plurality).
That is its power, and its danger.
39. As a general rule on #MachineLearning, representations matter more than algorithms.
40. The proper function to optimize in #MachineLearning is task dependent utility, even though this is nearly never done.
41. The results of a single study, however rigorous, never generalize on their own. Validity depends upon statistical assumptions, and until you replicate, you don’t know whether those assumptions match reality.
This paper, entitled "On Classifying Facial Races with Partial Occlusions and Pose Variations" appeared in the proceedings of the 2017 @IEEEorg ICMLA conference, in Cancun. researchgate.net/publication/32…
As stated in the abstract, the goal of the work is to apply a face classification model "trained on four major human races, Caucasian, Indian, Mongolian, and Negroid." Needless to say, these categories have no empiric or scientific basis.
In the body of the paper, we see this table characterizing the supposed "four major human races" in terms redolent of the height of 19th century racist phrenology:
Regulations, arguably, should not be based on detailed understanding of how AI systems work (which the regulators can't have in any depth). However, AI systems need to be able to explain decisions in terms that humans can understand, if we are to consider them trustworthy. 1/
Not explanations involving specifics of the algorithms, weights in a neural network, etc., but explanations that engage people's theories of mind, explanations at the level of Dennett's intentional stance - in terms of values, goals, plans, and intentions. 2/
Previous computer systems, to be comprehensible, and, yes, trustworthy, needed to consistently present behavior that fit people's natural inferences to physical models (e.g., the "desktop"). Anyone old enough to remember programming VCRs? Nerdview is a failure of explanation. 3/