Government Office for Science: Future Risks of Frontier AI
Definition and object:
Frontier AI: "highly capable general-purpose AI models that can perform a wide variety of tasks and match or exceed the capabilities present in today’s most advanced models" assets.publishing.service.gov.uk/media/653bc393…
"As of October 2023, this primarily encompasses foundation models consisting of very large neural networks using transformer architectures"
(e.g., LLMs, "generative" AI)
Risks:
"Frontier AI will include the risks we see today, but with potential for larger impact and scale. These include enhancing mass mis- and disinformation, enabling cyber-attacks or fraud, reducing barriers to access harmful information, and harmful or biased decisions."
AGI:
"debate has intensified on whether or when [] AGI might be realised. However, the risks and opportunities posed by a given model derive from its capabilities, and how it is used, not the breadth of tasks at which it can match human performance. "
"Frontier models could be disruptive, beneficial, powerful, or risky without being an AGI."
Indeed!
"Risk and opportunity will be shaped by uncertain factors including geopolitics, access, ownership, safety measures and public attitudes"
Back to humans, good!
"Given the significant uncertainty, there is insufficient evidence to rule out that future Frontier AI, if misaligned, misused or inadequately controlled, could pose an existential threat. However, many experts see this as highly unlikely."
"AI safety is a socio-technical challenge that cannot be resolved with technical interventions alone. Industry, academia, civil society, governments and the public all have an important role to play."
Current capabilities highlighted as relevant to Frontier AI risk:
Content Creation
Computer vision
Planning and reasoning
Theory of mind
Memory
Mathematics
Accurately predicting the physical world
Robotics
Autonomous Agents
Trustworthy AI
Future risk scenarios:
1 Unpredictable Advanced AI
2 Disrupts the Workforce
3 AI ‘Wild West’
4 Advanced AI on a knife edge
5 AI Disappoints
Categories of future risks based on risks evident today:
"
a. Providing new capabilities to a malicious actor.
b. Misapplication by a non-malicious actor.
c. Poor performance of a model used for its intended purpose, for example leading to biased decisions.
>>
d. Unintended outcomes from interactions with other AI systems.
e. Impacts resulting from interactions with external societal, political, and economic systems.
f. Loss of human control and oversight, with an autonomous model then taking harmful actions.
>>
g. Overreliance on AI systems, which cannot subsequently be unpicked.
h. Societal concerns around AI reduce the realisation of potential benefits."
"Beyond technical measures, the decisions and actions of humans can shape risks posed by future Frontier AI. These decisions will include how AI is designed, regulated, monitored and used."
*Need to understand the potential for consequential risk.
Potential non-technical mitigations:
"a. Requiring reporting of training of large-scale models.
b. Requiring safety assessments such as containment, alignment, or shut-down systems within risk management process.
>>
c. Transparency measures from development through to deployment and use that enable oversight and interventions where necessary."
Overall, it is a thoughtful document, necessarily vague, but a good attempt to highlight real AI problems and uncertainties.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
A Sunday late apéritif:
Dreyfus and Dreifus (1990). Making a mind versus modelling the brain: Artificial Intelligence back at a branch-point. In M. Boden (Ed). The Philosophy of Artificial Intelligence.
“In the early 1950s, as calculating machines were coming into their own…
“At that point two opposed visions of what computers could be, each with its correlated research programme, emerged and struggled for recognition. One faction saw computers as a system for manipulating mental symbols; the other, as a medium for modelling the brain.
"One sought to use computers to instantiate a formal representation of the world; the other, to simulate the interactions of neurones. One took problem-solving as its paradigm of intelligence; the other, learning. One utilized logic; the other, statistics.
>
It's been a long time since I added to the 'back to the sources (or classics) series'. This is another favourite for you.
A Framework for Misrepresenting Knowledge. H.L. Dreyfus (1979). In M. Ringle (Ed) Philosophical Perspectives in Artificial Intelligence.
"... an interesting change has, indeed, taken place at the MIT AI Laboratory. In previous works (Minsky, 1968) Minsky and his co-workers sharply distinguished themselves from workers in cognitive simulation who presented their programs as psychological theories,
>>
insisting that the MIT programs were 'an attempt to build intelligent machines without any prejudice toward making the system ... humanoid'. "
>>
This time I am going to excerpt one of my own old papers.
There is much confusion about what constitutes a cognitive computational model and the underlying psychological theory.
. https://t.co/tGMznxEhVwmondragon.cal-r.org/home/Papers/Al…
We focussed our analysis on conditioning due to the early identification of ANNs and associative learning
The critique, however, can be easily extended to other cognitive phenomena.
.
"It is worth noting though that the benefits derived from using implementations do not spring exclusively from the formal specification of the psychological models in equations and algorithms. ➡️
A year ago, I summarised our DDA model. Afterwards, I presented it three or four times to different audiences, in none of them I was satisfied with the way I explained the problems that motivated the model and the solution we offered.
Today, I was preparing some slides introducing complex AL models, (with fully connected networks) and decided to give it a new go. My tactic this time has been to focus only on the model’s relevance in accounting for retrospective revaluation in the conditioning literature.
Although there are other proposals, including those purely based on performance (e.g., Miller’s comparator hypothesis), the debate has revolved around Wagner’s SOP memory system and the distinct and opposing learning rules proposed to operate at SOP’s dynamic states of activation
According to Wagner (2008) one critical result in favour of Pearce’s configural approach that could potentially be solved by new, more advanced elemental developments is that obtained when reversing a conditioned inhibitor.
Following A+, AB-, B becomes a conditioned inhibitor— able, e.g., to reduce the responding to a different excitatory CS. According to the RW model, the discrimination is learned, with A becoming excitatory and AB neutral, as a result of B becoming as negative as A positive.
Pearce also assumes that AB becomes neutral, here AB’s direct strength becomes as inhibitory as the excitation that generalizes to it from A. B alone acts as an inhibitor due to its similarity to AB.
Two different predictions can be made if B is subsequently reinforced alone.
I firmly advocate for our right to be lazy, thus for my dear lazy (otherwise very busy to read a 140 pp. paper) fellows I'm going to summarise the DDA M'odel (upadated preprint, 2nd round)
The DDA is a “real-time” formal model of associative learning which incorporates representational and computational mechanisms able to make accurate predictions of a variety of phenomena that so far have eluded a unified account.
>
The model instantiates a connectionist network consisting of elements, which belong to individual stimuli or are shared between pairs of stimuli, and that are temporally clustered. There are two sources of cluster activation, direct (sensory) activation and associative (cued).
>