Advanced Large Language Model Agents

Announcement:

Prospective Students

To sign up for the course, please fill in this form.
For course discussion and questions, please join our LLM Agents Discord.
This course is built upon the fundamentals from the Fall 2024 LLM Agents MOOC.

Course Staff

Instructor	(Guest) Co-instructor	(Guest) Co-instructor

Dawn Song	Xinyun Chen	Kaiyu Yang
Professor, UC Berkeley	Research Scientist, Google DeepMind	Research Scientist, Meta FAIR

Guest Speakers


Jason Weston	Yu Su	Hanna Hajishirzi


Charles Sutton	Ruslan Salakhutdinov	Caiming Xiong


Thomas Hubert	Sean Welleck	Swarat Chaudhuri

Course Description

Large language model (LLM) agents have been an important frontier in AI, however, they still fall short critical skills, such as complex reasoning and planning, for solving hard problems and enabling end-to-end applications in real-world scenarios. Building on our previous course, this course dives deeper into advanced topics in LLM agents, focusing on reasoning, AI for mathematics, code generation, and program verification. We begin by introducing advanced inference and post-training techniques for building LLM agents that can search and plan. Then, we focus on two application domains: mathematics and programming. We study how LLMs can be used to prove mathematical theorems, as well as generate and reason about computer programs. Specifically, we will cover the following topics:

Inference-time techniques for reasoning
Post-training methods for reasoning
Search and planning
Agentic workflow, tool use, and functional calling
LLMs for code generation and verification
LLMs for mathematics: data curation, continual pretraining, and finetuning
LLM agents for theorem proving and autoformalization

Syllabus

Date	Guest Lecture (4:00PM-6:00PM PT)	Supplemental Readings
Jan 27th	Inference-Time Techniques for LLM Reasoning Xinyun Chen, Google DeepMind Livestream Intro Slides Quiz 1	- Large Language Models as Optimizers - Large Language Models Cannot Self-Correct Reasoning Yet - Teaching Large Language Models to Self-Debug
Feb 3rd	Learning to reason with LLMs Jason Weston, Meta Livestream Slides Quiz 2	- Direct Preference Optimization: Your Language Model is Secretly a Reward Model - Iterative Reasoning Preference Optimization - Chain-of-Verification Reduces Hallucination in Large Language Models
Feb 10th	On Reasoning, Memory, and Planning of Language Agents Yu Su, Ohio State University Livestream Slides Quiz 3	- Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization - HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models - Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents
Feb 17th	No Class - Presidents’ Day
Feb 24th	Open Training Recipes for Reasoning in Language Models Hanna Hajishirzi, University of Washington Livestream Slides Quiz 4	- Tulu 3: Pushing Frontiers in Open Language Model Post-Training - Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback - OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs
Mar 3rd	Coding Agents and AI for Vulnerability Detection Charles Sutton, Google DeepMind Livestream Slides Quiz 5	- Interactive Tools Substantially Assist LM Agents in Finding Security Vulnerabilities - From Naptime to Big Sleep: Using Large Language Models To Catch Vulnerabilities In Real-World Code
Mar 10th	Multimodal Autonomous AI Agents Ruslan Salakhutdinov, CMU/Meta Livestream Slides Quiz 6	- Mind2Web: Towards a Generalist Agent for the Web - WebArena: A Realistic Web Environment for Building Autonomous Agents - VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks - Tree Search for Language Model Agents
Mar 17th	Multimodal Agents – From Perception to Action Caiming Xiong, Salesforce AI Research Livestream Slides Quiz 7	- OSWORLD: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments - AGUVIS: Unified Pure Vision Agents For Autonomous GUI Interaction
Mar 24th	No Class - Spring Recess
Mar 31st	AlphaProof: when reinforcement learning meets formal mathematics Thomas Hubert, Google DeepMind 10am-noon PT Livestream Slides Quiz 8	- AI achieves silver-medal standard solving International Mathematical Olympiad problems - Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm - The Future of Mathematics? - Building the Mathematical Library of the Future
Apr 7th	Language models for autoformalization and theorem proving Kaiyu Yang, Meta FAIR Livestream Slides Quiz 9	- LeanDojo: Theorem Proving with Retrieval-Augmented Language Models - Autoformalization with Large Language Models - Autoformalizing Euclidean Geometry
Apr 14th	Bridging Informal and Formal Mathematical Reasoning Sean Welleck, CMU Livestream Slides Quiz 10	- Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs - miniCTX: Neural Theorem Proving with Long-Contexts - Lean-STaR: Learning to Interleave Thinking and Proving - ImProver: Agent-Based Automated Proof Optimization
Apr 21st	Abstraction and Discovery with Large Language Model Agents Swarat Chaudhuri, UT Austin 10am-noon PT Livestream Slides Quiz 11	-An In-Context Learning Agent for Formal Theorem-Proving - Symbolic Regression with a Learned Concept Library
Apr 28th	Towards building safe and secure agentic AI Dawn Song, UC Berkeley Livestream Slides Quiz 12	- Privtrans: Automatically Partitioning Programs for Privilege Separation - DataSentinel: A Game-Theoretic Detection of Prompt Injection Attacks - AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases - Progent: Programmable Privilege Control for LLM Agents

Completion Certificate

All of the instructions below are also provided in this Google Doc format for your convinience.

LLM Agent course completion certificates will be awarded to students based on the rules of the following tiers. All assignments are due on May 31st at 11:59pm PDT. All assignments should send a Google Forms confirmation email on successful submission.

All students will need to complete a Certificate Declaration Form by May 31st at 11:59pm PDT.

Trailblazer Tier:

Complete all 12 quizzes associated with each lecture
Pass the written article assignment

Mastery Tier:

Complete all 12 quizzes associated with each lecture
Pass the written article assignment
Pass all lab assignments

Ninja Tier:

Complete all 12 quizzes associated with each lecture
Pass the written article assignment
Submit a project to the AgentX competition

Legendary Tier:

Complete all 12 quizzes associated with each lecture
Pass the written article assignment
Become a prize winner or finalist at the AgentX competition

Honorary Tier:

For the most helpful/supportive students in discord!
Meets coursework requirements of Ninja OR Mastery Tier

NOTE: completing the assignments associated with this course in order to earn a Completion Certificate is completely optional. You are more than welcome to just watch the lectures and audit the course!

Coursework

IMPORTANT: Please use the same email address to submit all coursework, the certificate declaration form, and the initial signup form as this is how we track your progress throughout the course!

Quizzes

All quizzes are released shortly after the corresponding lecture. Please remember to complete the quiz each week. Although it’s graded on completion, we encourage you to do your best. There are 5 multiple choice questions per quiz.

The quizzes are posted in the Syllabus section. Answers will be shared when we release the next quiz. Click on previous quiz links to access the “view score” button.

An archive of all quizzes can be found here.

Written Article

Create a social media post (X/LinkedIn/etc) of roughly 500 words. Include the link to our MOOC website in the article and tweet.

Students in the Trailblazer or Mastery Tier should either summarize information from one of the lecture(s) or write a postmortem on their learning experience during our MOOC
Students in the Ninja or Legendary Tier should write about their AgentX submission

The written article is an effort-based assignment that will be graded as pass or no pass (P/NP).

Submission Form

Labs

There are two labs to give students experience with verificable code generation agents using Lean. Students must pass both labs to recieve credit towards the Mastery Tier Certificate. Please read the Lab FAQs before asking any questions in our LLM Agents Discord.

Instructions & Starter Code

Labs Submission Form

Project

Check out our AgentX competition website. Every member of the team should sign up individually here. There are no limits to team sizes.

Two Tracks:

Entrepreneurship: Build agent-powered products & startups
Research: Explore the frontiers of LLM Agents technology

Select students will be given mentorship by Berkeley postdocs/mentors on an AgentX Research Track project. Apply here. DUE March 26th at 11:59pm PDT. NOTE: Mentorship is not required to join or succeed in AgentX.

Final submissions are due May 31st. Please ask any questions and find potential team members in our LLM Agents Discord.