Announcement:

Sign up and learn more about the AgentX Competition here!

Prospective Students

Course Staff

Instructor (Guest) Co-instructor (Guest) Co-instructor
Dawn Song Xinyun Chen Kaiyu Yang
Professor, UC Berkeley Research Scientist,
Google DeepMind
Research Scientist,
Meta FAIR

Guest Speakers

Jason Weston Yu Su Hanna Hajishirzi
Charles Sutton Ruslan Salakhutdinov Caiming Xiong
Thomas Hubert Sean Welleck Swarat Chaudhuri

Course Description

Large language model (LLM) agents have been an important frontier in AI, however, they still fall short critical skills, such as complex reasoning and planning, for solving hard problems and enabling end-to-end applications in real-world scenarios. Building on our previous course, this course dives deeper into advanced topics in LLM agents, focusing on reasoning, AI for mathematics, code generation, and program verification. We begin by introducing advanced inference and post-training techniques for building LLM agents that can search and plan. Then, we focus on two application domains: mathematics and programming. We study how LLMs can be used to prove mathematical theorems, as well as generate and reason about computer programs. Specifically, we will cover the following topics:

Syllabus

Date Guest Lecture
(4:00PM-6:00PM PT)
Supplemental Readings
Jan 27th Inference-Time Techniques for LLM Reasoning
Xinyun Chen, Google DeepMind
Livestream Intro Slides Quiz 1
- Large Language Models as Optimizers
- Large Language Models Cannot Self-Correct Reasoning Yet
- Teaching Large Language Models to Self-Debug
Feb 3rd Learning to reason with LLMs
Jason Weston, Meta
Livestream Slides Quiz 2
- Direct Preference Optimization: Your Language Model is Secretly a Reward Model
- Iterative Reasoning Preference Optimization
- Chain-of-Verification Reduces Hallucination in Large Language Models
Feb 10th On Reasoning, Memory, and Planning of Language Agents
Yu Su, Ohio State University
Livestream Slides Quiz 3
- Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization
- HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models
- Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents
Feb 17th No Class - Presidents’ Day  
Feb 24th Open Training Recipes for Reasoning in Language Models
Hanna Hajishirzi, University of Washington
Livestream Slides Quiz 4
- Tulu 3: Pushing Frontiers in Open Language Model Post-Training
- Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback
- OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs
Mar 3rd Coding Agents and AI for Vulnerability Detection
Charles Sutton, Google DeepMind
Livestream Slides Quiz 5
- Interactive Tools Substantially Assist LM Agents in Finding Security Vulnerabilities
- From Naptime to Big Sleep: Using Large Language Models To Catch Vulnerabilities In Real-World Code
Mar 10th Multimodal Autonomous AI Agents
Ruslan Salakhutdinov, CMU/Meta
Livestream Slides Quiz 6
- Mind2Web: Towards a Generalist Agent for the Web
- WebArena: A Realistic Web Environment for Building Autonomous Agents
- VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks
- Tree Search for Language Model Agents
Mar 17th Multimodal Agents – From Perception to Action
Caiming Xiong, Salesforce AI Research
Livestream Slides Quiz 7
- OSWORLD: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
- AGUVIS: Unified Pure Vision Agents For Autonomous GUI Interaction
Mar 24th No Class - Spring Recess  
Mar 31st AlphaProof: when reinforcement learning meets formal mathematics
Thomas Hubert, Google DeepMind
10am-noon PT
Livestream Slides
 
Apr 7th Language models for autoformalization and theorem proving
Kaiyu Yang, Meta FAIR
 
Apr 14th Advanced Topics in Neural Theorem Proving
Sean Welleck, CMU
 
Apr 21st Program verification & generating verified code
Swarat Chaudhuri, UT Austin
10am-noon PT
 
Apr 28th Agent safety & security
Dawn Song, UC Berkeley
 

Completion Certificate

All of the instructions below are also provided in this Google Doc format for your convinience.

LLM Agent course completion certificates will be awarded to students based on the rules of the following tiers. All assignments are due at the end of May (exact date/time TBA). All assignments should send a Google Forms confirmation email on successful submission.

All students will need to complete a Certificate Declaration Form by the end of May (exact date/time TBA). This form will be released in late April.

Trailblazer Tier:

Mastery Tier:

Ninja Tier:

Legendary Tier:

Honorary Tier:

NOTE: completing the assignments associated with this course in order to earn a Completion Certificate is completely optional. You are more than welcome to just watch the lectures and audit the course!

Coursework

IMPORTANT: Please use the same email address to submit all coursework, the certificate declaration form, and the initial signup form as this is how we track your progress throughout the course!

Quizzes

All quizzes are released shortly after the corresponding lecture. Please remember to complete the quiz each week. Although it’s graded on completion, we encourage you to do your best. There are 5 multiple choice questions per quiz.

The quizzes are posted in the Syllabus section. Answers will be shared when we release the next quiz. Click on previous quiz links to access the “view score” button.

Written Article

Create a social media post (X/LinkedIn/etc) of roughly 500 words. Include the link to our MOOC website in the article and tweet.

The written article is an effort-based assignment that will be graded as pass or no pass (P/NP).

Submission Form

Labs

Our staff are still designing the lab(s). Stay tuned! We currently plan to release the labs late March / early April. The link to submit the lab assignments will be posted here when the labs are released.

Project

Check out our AgentX competition website. Every member of the team should sign up individually here. There are no limits to team sizes.

Two Tracks:

Select students will be given mentorship by Berkeley postdocs/mentors on an AgentX Research Track project. Apply here. DUE March 26th at 11:59pm PDT. NOTE: Mentorship is not required to join or succeed in AgentX.

Submissions will be due at the end of May. Please ask any questions and find potential team members in our LLM Agents Discord.