Ida Momennejad Profile picture
Sep 3 17 tweets 11 min read Read on X
A thread on the history of RL/ML based on Andy Barto's talk #RLC2024: the Reinforcement Learning Conference.
Beyond seeing friends & giving talks/panel, talking to @RichardSSutton & hearing Andy Barto revived a need for attention to historical psych/neuro influences on AI.
1/n🧵 Image
Andy Barto started off the talk defining RL in terms of Search (trial & error, generate & test, variation & selection) + Memory (caching past solutions),
leading to RL as "General contextual search".
He then took us through a tour of historical intellectual influences on RL
2/n
Image
Image
First, RL work with @RichardSSutton & contemporary intellectual influences:
The logic of computers group @ Michigan (Burks, Holland, Zeigler: cellular automata, modeling, simulation)
The systems neuroscience center @ Amherst (Arbib, Kilmer, Spinelli: Adaptive Intelligence).
3/n
Image
Image
Learning by Trial & Error in RL is a la Klopf's law of effect for synaptic plasticity: Hedonistic Neurons maximize local analog of pleasure & minimize local pain. Synapses active in action potential become eligible for change-increase weight if rewarded, decrease if punished
4/n

Image
Image
Image
Andy Barto distinguished 2 kinds of Eligibility for weight change.
Contingent Eligibility depends on pre- and post-synaptic activity, leading to 3-factor learning rule.
Non-contingent Eligibility is triggered by only pre-synaptic activity, leading to a 2-factor learning rule.
5/n
Image
Image
Bartio & @RichardSSutton's seminal first paper (1981): A modern theory of adaptive networks with expectation & prediction.
Influences include
-Klopf: learning by trial & error-idea dates back to 1800s
-adaptive intelligence
-synaptic plasticity inspired
-Sutton BA in psych!
6/n
Image
Image
@RichardSSutton Barto Acknowledging the many disciplines, interactions, & lineages of ideas that shaped RL was refreshing.
-TD error & alg for temporal credit assignment were inspired by interactions with psychology
-Pole balancing paper in collaboration with Chuck Anderson
Next:actor-critic
7/n


Image
Image
Image
Image
@RichardSSutton The Actor Critic architecture:
The actor is responsible for learning the policy, a mapping from states to actions, to decide which action to take in a given state.
An adaptive critic uses a value function to evaluate actor's policy & translate reward to TD error for learning.
8/n
Image
Image
@RichardSSutton Second part: Barto dove into the early history of machine learning
- Thomas Ross 1933 Thinking machine
- Steven Smith 1935 (psychology) Robot rats
- Grey Walter 1948 (neuroscience) Machina Speculatrix
- Alan Turing 1948: Pleasure-Pain system, earliest call to implementing RL?
9/n


Image
Image
Image
Image
@RichardSSutton - Farley & Clark 1954 first simulation of ANN learning on a digital computer
- Minsky 1954 "Neural Nets and the brain-model problem", SNARCs (stoachstic neural-analog reinforcement calculators), "Steps towards AI" (1961).
Challenge: Structural & Temporal credit assignment
10/n


Image
Image
Image
Image
@RichardSSutton - Farley & Clark 1955 generalization of pattern recognition in self-organizing system
- Frank Rosenblatt 1958 Perceptron "Foundation of AI"
- Arthur Samuel 1959-67 Checkers player (was RL)
- Widrow & Hoff 1960 Adaptive Linear Neuron, Widrow -Hoff algorithm, LMS
11/n


Image
Image
Image
Image
@RichardSSutton - Widrow et al 1973 Selective Bootstrap Adaptation. Rewarded? Treat committed action as target, do LMS. Punished: treat alternative action as target then LMS.

- Michael Tsetlin 1960s Learning Automata (& modeling biological systems 1973), teams & games
12/n
Image
Image
- Schultz Dayan Montague 1997 Reward-Prediction-Error in brains: Phasic activity of dopamine neurons signals the error between an old & a new estimate of future reward
Barto noted the critic in RL can take virtually any unmodeled influence & turn that into a learning signal
13/n
Image
Image
Dopamine inspired Collective Learning, reinforcement is broadcasted to a team of RL units. Potential alt to back prop, but RL broadcast didn't scale: structural credit assignment problem (getting signal to the right place).
cf Barto 1985 Learning by statistical cooperation
14/n

Image
Image
Image
In the end, a call for critical thinking. Andy Barto shared stories to caution against the challenges of designing reward signals.
Quoting Norbert Wiener's example of The Monkey's paw:
"... it grants what you asked for, not what you should have asked for or what you intend"

15/n Image
Andy Barto then thanked his former students, said RL is not a cult, & found himself facing a standing ovation by the audience.
I appreciated the history of ML from the POV of developing a learning framework, & how interdisciplinary interactions of ideas shaped it.
TY #RLC2024
n/n Image
Many thanks to all organizers, esp @MarlosCMachado & @robertarail for heroic paper awards & for making RLC a warm & friendly experience.
Special thanks to @RichardSSutton for continual support & generous discussions on both specific & big picture ideas on learning & intelligence. Image

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Ida Momennejad

Ida Momennejad Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @criticalneuro

Sep 29, 2023
Delighted to share our #neurips2023 paper w @grockious @hmd_palangi et al
Evaluating Cognitive Maps & Planning in LLMs with CogEval

We test planning in 8 LLMs.
Failures like hallucinating invalid paths/falling in loops don't support emergent planning.
1/n
arxiv.org/abs/2309.15129



Image
Image
Image
Image
Recently an influx of studies claim emergent cognitive abilities in LLMs & doomers warn of AI planning a takeover.
But can LLMs plan?!
Such claims often lack systematic evaluation involving multiple tasks, control conditions, iterations, stats, etc.
We make 2 contributions.
2/n
1-We propose CogEval: a cognitive science-inspired protocol for systematic evaluation of cognitive capacities in LLMs.
Inspired by @mcxfrank's "Experimentology" CogEval operationalizes a capacity w multiple tasks, iterations, domains, & can be applied to various abilities.
3/n Image
Read 9 tweets
May 23, 2021
Excited to share new work w @katjahofmann @smdvln @ralgeorgescu @JarekRzepecki Evelyn Zuniga & colleagues!

Navigation Turing Test (NTT): Learning to Evaluate Human-Like Navigation
arxiv.org/abs/2105.09637
accepted@ICML

We propose a method to evaluate human-like navigation
🧵1/n
Many algorithms pass benchmarks, like navigation from a given location to a goal location in 3D games.

But passing benchmarks doesn't guarantee human-like navigation behavior nor cognitively or neurally plausible human-like algorithms/representations. This matters whether...2/n
...the goal is to use the algorithm to understand human behavior or cognition, as in cog neuro,
or to design agents that generate human-like behavior in XBoX games so humans can play w agents as a team.

Would pursuing these goals simultaneously accelerate achieving both?
3/n Image
Read 13 tweets
Sep 26, 2019
Thrilled to share new work w Stacey Sinclair & @profcikara! Computational Justice: Simulating Structural Bias and Interventions.
We ran agent based simulation of structural bias, params set from studies.
Then simulated/compared different interventions.1/n
biorxiv.org/content/biorxi…
@profcikara We distinguish interpersonal bias (sexism) & structural bias, allow social learning. We exclude gender differences in interpersonal bias to isolate effect of structural bias. Unequal gender ratios => gender differences in # sexist comments received & increase in p(sexism). 2/n
@profcikara According to empirical findings 40% women confront sexism: 10% 3/3 times, 10% 2/3, 20% 1/3. Men perceive sexism reported by women 50% times=>we set their p(confront)=1/2 women's. Receiving sexism or objection has a cost=>Costs to women & institutions higher in unequal ratios.3/n
Read 7 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(