Andy Zhou
I am a graduate student at the University of Illinois at Urbana-Champaign advised by Bo Li. I also work closely with Yuxiong Wang and Haohan Wang.
My research interests are in LM agents and trustworthy machine learning. My work focuses on improving the capabilities and reliability of large language models, particularly for autonomous decision making.
I am also the founder and Head of Research at Lapis Labs, a student-led research group.
Email /
GitHub /
Google Scholar /
Twitter
|
|
News
September 2024. 3 papers accepted at NeurIPS 2024. RPO is accepted as a Spotlight!
September 2024. I gave an invited talk on LLM agents at the AutoGen event
July 2024. 1 paper is accepted at ACM CCS 2024
May 2024. LATS is accepted at ICML 2024
March 2024. 1 paper is accepted at LLM Agents @ ICLR 2024
March 2024. 3 papers accepted at SeT LLM @ ICLR 2024
March 2024. LATS reaches 600 stars on GitHub and is implemented in LangChain and LLamaIndex
February 2024. 1 paper is accepted at CVPR 2024
September 2023. 2 papers accepted at NeurIPS 2023
June 2023. 1 paper is accepted at FL @ ICML 2023
June 2023. 1 paper is accepted at ICCV 2023
|
|
|
Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks
Andy Zhou, Bo Li, Haohan Wang
NeurIPS 2024 Spotlight (Top 3%), 2024
arxiv /
code /
paper
We propose a defense objective for defending LLMs against jailbreaking and an algorithm to generate trigger tokens that enforce harmless behavior, improving robustness across jailbreaks and models.
|
|
AIR-Bench 2024: A Safety Benchmark Based on Risk Categories from Regulations and Policies
Yi Zeng*, Yu Yang*, Andy Zhou*, Jeffrey Tan*, Yuheng Tu*, Yifan Mai*, Kevin Klyman, Minzhou Pan, Ruoxi Jia, Dawn Song, Percy Liang, Bo Li
arXiv, 2024
arxiv /
We present a safety benchmark based on risk categories from regulations and policies with 5,694 diverse prompts based on 16 company policies and 8 government policies mapped into 314 risk categories.
|
|
KnowGraph: Knowledge-Enabled Anomaly Detection via Logical Reasoning on Graph Data
Andy Zhou, Xiaojun Xu, Ramesh Raghunathan, Alok Lal, Xinze Guan, Bin Yu, Bo Li
ACM Conference on Computer and Communications Security (CCS), 2024
We propose a logical reasoning framework for anomaly detection on graph data that uses domain knowledge to organize the predictions of a collection of specialized models.
|
|
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
Andy Zhou, Kai Yan, Michal Shlapentokh Rothman, Haohan Wang, Yuxiong Wang
ICML, 2024
arxiv /
code /
website /
paper
We propose the first search algorithm for LM agents which draws upon aspects of reasoning and acting prompting methods to improve decision-making. We achieve SOTA on HumanEval with a Pass@1 rate of 94.4%
|
|
Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models
Andy Zhou, Jindong Wang, Haohan Wang, Yuxiong Wang
NeurIPS, 2023
arxiv /
code /
paper
We propose a data augmentation and knowledge distillation objective that uses teacher gradients to generate diverse samples, improving out-of-distribution robustness. We distill from CLIP to train the most robust ResNet34 and ResNet50 on OOD generalization.
|
|
RedCode: Risky Code Execution and Generation Benchmark for Code Agents
Chengquan Guo, Xun Liu, Chulin Xie, Andy Zhou, Yi Zeng, Zinan Lin, Dawn Song, Bo Li
NeurIPS Datasets and Benchmarks Track, 2024
We propose a benchmark for evaluating the safety and reliability of code agents on executing and generating malicious code.
|
|
Jailbreaking Large Language Models Against Moderation Guardrails via Cipher Characters
Haibo Jin, Andy Zhou, Joe D. Menke, Haohan Wang
NeurIPS, 2024
arxiv /
We propose an attack on moderation guardrails that use cipher characters to detect harmful content and a benchmark for evaluating LLM guardrails.
|
|
Tamper-Resistant Safeguards for Open-Weight LLMs
Rishub Tamirisa*, Bhrugu Bharathi*, Long Phan, Andy Zhou, Alice Gatti, Tarun Suresh, Maxwell Lin, Justin Wang, Rowan Wang, Ron Arel, Andy Zou, Dawn Song, Bo Li, Dan Hendrycks, Mantas Mazeika
arXiv preprint, 2024
arxiv /
We present a method for improving safeguard robustness against finetuning attacks.
|
|
AI Risk Categorization Decoded: From Corporate Policies to Government Regulations
Yi Zeng*, Kevin Klyman*, Andy Zhou, Yu Yang, Minzhou Pan, Ruoxi Jia, Dawn Song, Percy Liang, Bo Li
ICML Workshop on Generative AI and Law, 2024
arxiv /
paper
We present a comprehensive AI risk taxonomy derived from eight government policies from the European Union, United States, and China and 16 company policies worldwide, making a significant step towards establishing a unified language for generative AI safety evaluation.
|
|
Towards Robust Unlearning in LLMs
Rishub Tamirisa, Bhrugu Bharathi, Andy Zhou, Bo Li, Mantas Mazeika
Secure and Trustworthy LLMs @ ICLR, 2024
We outline the setting of robust machine unlearning in LLMs for reliably removing unwanted knowledge.
|
|
FedSelect: Personalized Federated Learning with Customized Selection of Parameters for Fine-Tuning
Rishub Tamirisa, Chulin Xie, Wenxuan Bao, Andy Zhou, Ron Arel, Aviv Shamsian
CVPR, 2024
We propose a federated-learning algorithm based on selecting which parameters to use for fine-tuning and which to make global updates.
|
|
GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models
Haibo Jin*, Ruoxi Chen*, Andy Zhou, Jinyin Chen, Yang Zhang, Haohan Wang
Secure and Trustworthy LLMs @ ICLR, 2024
arxiv /
paper
We propose a framework to generate semantic jailbreaks from human safety guidelines using syntatic parsing organized into knowledge graphs and LM optimization. Jailbreaks are SOTA for success rate and work on VLMS.
|
|
YouTubePD: A Multimodal Benchmark for Parkinson’s Disease Analysis
Andy Zhou*, Samuel Li*, Pranav Sriram*,Xiang Li*, Jiahua Dong*, Ansh Sharma, Yuanyi Zhong, Shirui Luo, Volodymyr Kindratenko, George Heintz, Christopher Zallek, Yuxiong Wang
NeurIPS Datasets and Benchmarks, 2023
arxiv /
paper
We propose the first public benchmark for automated Parkinson’s disease analysis. We explore three tasks–facial-expression-based
PD classification, multimodal PD classification, and PD progression synthesis–and show models trained on YouTubePD generalize to real clinical data.
|
|
FedSelect: Customized Selection of Parameters for Fine-Tuning during Personalized Federated Learning
Rishub Tamirisa, John Won, Chengjun Lu, Ron Arel, Andy Zhou
Federated Learning @ ICML, 2023
arxiv /
paper
We propose a federated-learning algorithm based on selecting which parameters to use for fine-tuning and which to make global updates.
|
|
A Sentence Speaks a Thousand Images: Domain Generalization through Distilling CLIP with Language Guidance
Zeyi Huang, Andy Zhou, Zijian Lin, Mu Cai, Haohan Wang, Yong Jae Lee
ICCV, 2023
arxiv /
code /
paper
We propose a distillation objective based on CLIP text representations to improve domain generalization.
|
|