About Me

I'm Andy Zhou and I'm the founder of Intology, a lab studying the science of intelligence. Check out our work on Zochi, the world's first Artificial Scientist with peer-reviewed publications.

I previously founded Lapis Labs, a student-led research group. We published more than 20 papers in top AI conferences and collaborated with many groups, including DeepMind, the Center for AI Safety, and IBM Research.

Email / GitHub / Google Scholar / Twitter

Research

My research focuses on advancing artificial intelligence with an emphasis on:

AI agents and autonomous systems
AI safety and alignment
Scientific discovery through AI

I'm particularly interested in developing AI systems that can reason, learn, and discover in ways that augment human capabilities and advance scientific knowledge. Check out my papers here.

News

November 2024. I joined Meta AI

September 2024. 3 papers accepted at NeurIPS 2024. RPO is accepted as a Spotlight!

September 2024. I gave an invited talk on LLM agents at the AutoGen event

September 2024. Our work on AIR and AIR-Bench was featured in Wired

July 2024. 1 paper is accepted at ACM CCS 2024

May 2024. LATS is accepted at ICML 2024

March 2024. 1 paper is accepted at LLM Agents @ ICLR 2024

March 2024. 3 papers accepted at SeT LLM @ ICLR 2024

March 2024. LATS reaches 600 stars on GitHub and is implemented in LangChain and LLamaIndex

February 2024. 1 paper is accepted at CVPR 2024

September 2023. 2 papers accepted at NeurIPS 2023

June 2023. 1 paper is accepted at FL @ ICML 2023

June 2023. 1 paper is accepted at ICCV 2023

Selected Research

	Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks Andy Zhou, Bo Li, Haohan Wang NeurIPS 2024 Spotlight (Top 3%), 2024 arxiv / code / paper We propose a defense objective for defending LLMs against jailbreaking and an algorithm to generate trigger tokens that enforce harmless behavior, improving robustness across jailbreaks and models.
	AIR-Bench 2024: A Safety Benchmark Based on Risk Categories from Regulations and Policies Yi Zeng, Yu Yang, Andy Zhou, Jeffrey Tan, Yuheng Tu, Yifan Mai, Kevin Klyman, Minzhou Pan, Ruoxi Jia, Dawn Song, Percy Liang, Bo Li arXiv, 2024 arxiv / [Wired Article] We present a safety benchmark based on risk categories from regulations and policies with 5,694 diverse prompts based on 16 company policies and 8 government policies mapped into 314 risk categories.
	KnowGraph: Knowledge-Enabled Anomaly Detection via Logical Reasoning on Graph Data Andy Zhou, Xiaojun Xu, Ramesh Raghunathan, Alok Lal, Xinze Guan, Bin Yu, Bo Li ACM Conference on Computer and Communications Security (CCS), 2024 arxiv / We propose a logical reasoning framework for anomaly detection on graph data that uses domain knowledge to organize the predictions of a collection of specialized models.
	Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models Andy Zhou, Kai Yan, Michal Shlapentokh Rothman, Haohan Wang, Yuxiong Wang ICML, 2024 arxiv / code / website / paper We propose the first search algorithm for LM agents which draws upon aspects of reasoning and acting prompting methods to improve decision-making. We achieve SOTA on HumanEval with a Pass@1 rate of 94.4%
	Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models Andy Zhou, Jindong Wang, Haohan Wang, Yuxiong Wang NeurIPS, 2023 arxiv / code / paper We propose a data augmentation and knowledge distillation objective that uses teacher gradients to generate diverse samples, improving out-of-distribution robustness. We distill from CLIP to train the most robust ResNet34 and ResNet50 on OOD generalization.

Research

	RedCode: Risky Code Execution and Generation Benchmark for Code Agents Chengquan Guo, Xun Liu, Chulin Xie, Andy Zhou, Yi Zeng, Zinan Lin, Dawn Song, Bo Li NeurIPS Datasets and Benchmarks Track, 2024 We propose a benchmark for evaluating the safety and reliability of code agents on executing and generating malicious code.
	Jailbreaking Large Language Models Against Moderation Guardrails via Cipher Characters Haibo Jin, Andy Zhou, Joe D. Menke, Haohan Wang NeurIPS, 2024 arxiv / We propose an attack on moderation guardrails that use cipher characters to detect harmful content and a benchmark for evaluating LLM guardrails.
	Tamper-Resistant Safeguards for Open-Weight LLMs Rishub Tamirisa, Bhrugu Bharathi, Long Phan, Andy Zhou, Alice Gatti, Tarun Suresh, Maxwell Lin, Justin Wang, Rowan Wang, Ron Arel, Andy Zou, Dawn Song, Bo Li, Dan Hendrycks, Mantas Mazeika arXiv preprint, 2024 arxiv / We present a method for improving safeguard robustness against finetuning attacks.
	AI Risk Categorization Decoded: From Corporate Policies to Government Regulations Yi Zeng, Kevin Klyman, Andy Zhou, Yu Yang, Minzhou Pan, Ruoxi Jia, Dawn Song, Percy Liang, Bo Li ICML Workshop on Generative AI and Law, 2024 arxiv / paper / [Wired] We present a comprehensive AI risk taxonomy derived from eight government policies from the European Union, United States, and China and 16 company policies worldwide, making a significant step towards establishing a unified language for generative AI safety evaluation.
	Towards Robust Unlearning in LLMs Rishub Tamirisa, Bhrugu Bharathi, Andy Zhou, Bo Li, Mantas Mazeika Secure and Trustworthy LLMs @ ICLR, 2024 We outline the setting of robust machine unlearning in LLMs for reliably removing unwanted knowledge.
	FedSelect: Personalized Federated Learning with Customized Selection of Parameters for Fine-Tuning Rishub Tamirisa, Chulin Xie, Wenxuan Bao, Andy Zhou, Ron Arel, Aviv Shamsian CVPR, 2024 We propose a federated-learning algorithm based on selecting which parameters to use for fine-tuning and which to make global updates.
	GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models Haibo Jin, Ruoxi Chen, Andy Zhou, Jinyin Chen, Yang Zhang, Haohan Wang Secure and Trustworthy LLMs @ ICLR, 2024 arxiv / paper / We propose a framework to generate semantic jailbreaks from human safety guidelines using syntatic parsing organized into knowledge graphs and LM optimization. Jailbreaks are SOTA for success rate and work on VLMS.
	YouTubePD: A Multimodal Benchmark for Parkinson’s Disease Analysis Andy Zhou, Samuel Li, Pranav Sriram,Xiang Li, Jiahua Dong, Ansh Sharma, Yuanyi Zhong, Shirui Luo, Volodymyr Kindratenko, George Heintz, Christopher Zallek, Yuxiong Wang NeurIPS Datasets and Benchmarks*, 2023 arxiv / paper / We propose the first public benchmark for automated Parkinson’s disease analysis. We explore three tasks–facial-expression-based PD classification, multimodal PD classification, and PD progression synthesis–and show models trained on YouTubePD generalize to real clinical data.
	FedSelect: Customized Selection of Parameters for Fine-Tuning during Personalized Federated Learning Rishub Tamirisa, John Won, Chengjun Lu, Ron Arel, Andy Zhou Federated Learning @ ICML, 2023 arxiv / paper / We propose a federated-learning algorithm based on selecting which parameters to use for fine-tuning and which to make global updates.
	A Sentence Speaks a Thousand Images: Domain Generalization through Distilling CLIP with Language Guidance Zeyi Huang, Andy Zhou, Zijian Lin, Mu Cai, Haohan Wang, Yong Jae Lee ICCV, 2023 arxiv / code / paper / We propose a distillation objective based on CLIP text representations to improve domain generalization.

Misc

Design and source code from Jon Barron's website

About Me

Research

News

Selected Research

Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks

AIR-Bench 2024: A Safety Benchmark Based on Risk Categories from Regulations and Policies

KnowGraph: Knowledge-Enabled Anomaly Detection via Logical Reasoning on Graph Data

Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models

Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models

Research

RedCode: Risky Code Execution and Generation Benchmark for Code Agents

Jailbreaking Large Language Models Against Moderation Guardrails via Cipher Characters

Tamper-Resistant Safeguards for Open-Weight LLMs

AI Risk Categorization Decoded: From Corporate Policies to Government Regulations

Towards Robust Unlearning in LLMs

FedSelect: Personalized Federated Learning with Customized Selection of Parameters for Fine-Tuning

GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models

YouTubePD: A Multimodal Benchmark for Parkinson’s Disease Analysis

FedSelect: Customized Selection of Parameters for Fine-Tuning during Personalized Federated Learning

A Sentence Speaks a Thousand Images: Domain Generalization through Distilling CLIP with Language Guidance

Misc