| 
          
            | 
                About Me
              
                I'm Andy Zhou and I'm the founder of Intology, a lab studying the science of intelligence. Check out our work on Zochi, the world's first Artificial Scientist with peer-reviewed publications.
               
                I previously founded Lapis Labs, a student-led research group. We published more than 20 papers in top AI conferences and collaborated with many groups, including DeepMind, the Center for AI Safety, and IBM Research.
               
                 Email  / 
                GitHub  / 
                Google Scholar  / 
                 Twitter 
               |   |  
          
            | Research
                My research focuses on advancing artificial intelligence with an emphasis on:
               
                AI agents and autonomous systemsAI safety and alignmentScientific discovery through AI 
                I'm particularly interested in developing AI systems that can reason, learn, and discover in ways that augment human capabilities and advance scientific knowledge. Check out my papers here.
               |  
          
            | News
                
                  
                    | November 2024. I joined Meta AI September 2024. 3 papers accepted at NeurIPS 2024. RPO is accepted as a Spotlight! September 2024. I gave an invited talk on LLM agents at the AutoGen event September 2024. Our work on AIR and AIR-Bench was featured in Wired July 2024. 1 paper is accepted at ACM CCS 2024 May 2024. LATS is accepted at ICML 2024 March 2024. 1 paper is accepted at LLM Agents @ ICLR 2024 March 2024. 3 papers accepted at SeT LLM @ ICLR 2024 March 2024. LATS reaches 600 stars on GitHub and is implemented in LangChain and LLamaIndex February 2024. 1 paper is accepted at CVPR 2024 September 2023. 2 papers accepted at NeurIPS 2023 June 2023. 1 paper is accepted at FL @ ICML 2023 June 2023. 1 paper is accepted at ICCV 2023 |  |  
 
          
          
          
          
            |   | Robust Prompt Optimization for Defending Language Models Against Jailbreaking AttacksAndy Zhou, Bo Li, Haohan Wang
 NeurIPS 2024 Spotlight (Top 3%), 2024
 arxiv /
              
              
              
              code /
              
              
              
              
              
              paper
 We propose a defense objective for defending LLMs against jailbreaking and an algorithm to generate trigger tokens that enforce harmless behavior, improving robustness across jailbreaks and models. |  
            |   | AIR-Bench 2024: A Safety Benchmark Based on Risk Categories from Regulations and PoliciesYi Zeng*, Yu Yang*, Andy Zhou*, Jeffrey Tan*, Yuheng Tu*, Yifan Mai*, Kevin Klyman, Minzhou Pan, Ruoxi Jia, Dawn Song, Percy Liang, Bo Li
 arXiv, 2024
 arxiv /
              
              
              
              
              
              
              
              
              [Wired Article]
 We present a safety benchmark based on risk categories from regulations and policies with 5,694 diverse prompts based on 16 company policies and 8 government policies mapped into 314 risk categories. |  
            |   | KnowGraph: Knowledge-Enabled Anomaly Detection via Logical Reasoning on Graph DataAndy Zhou, Xiaojun Xu, Ramesh Raghunathan, Alok Lal, Xinze Guan, Bin Yu, Bo Li
 ACM Conference on Computer and Communications Security (CCS), 2024
 arxiv /
 We propose a logical reasoning framework for anomaly detection on graph data that uses domain knowledge to organize the predictions of a collection of specialized models. |  
            |   | Language Agent Tree Search Unifies Reasoning Acting and Planning in Language ModelsAndy Zhou, Kai Yan, Michal Shlapentokh Rothman, Haohan Wang, Yuxiong Wang
 ICML, 2024
 arxiv /
              
              
              
              code /
              
              
              
              website /
              
              
              
              paper
 We propose the first search algorithm for LM agents which draws upon aspects of reasoning and acting prompting methods to improve decision-making. We achieve SOTA on HumanEval with a Pass@1 rate of 94.4% |  
            |   | Distilling Out-of-Distribution Robustness from Vision-Language Foundation ModelsAndy Zhou, Jindong Wang, Haohan Wang, Yuxiong Wang
 NeurIPS, 2023
 arxiv /
              
              
              
              code /
              
              
              
              
              
              paper
 We propose a data augmentation and knowledge distillation objective that uses teacher gradients to generate diverse samples, improving out-of-distribution robustness. We distill from CLIP to train the most robust ResNet34 and ResNet50 on OOD generalization. |  
 
          
          
          
          
          
          
          
          
            |   | RedCode: Risky Code Execution and Generation Benchmark for Code AgentsChengquan Guo, Xun Liu, Chulin Xie, Andy Zhou, Yi Zeng, Zinan Lin, Dawn Song, Bo Li
 NeurIPS Datasets and Benchmarks Track, 2024
 
 We propose a benchmark for evaluating the safety and reliability of code agents on executing and generating malicious code. |  
            |   | Jailbreaking Large Language Models Against Moderation Guardrails via Cipher CharactersHaibo Jin, Andy Zhou, Joe D. Menke, Haohan Wang
 NeurIPS, 2024
 arxiv /
 We propose an attack on moderation guardrails that use cipher characters to detect harmful content and a benchmark for evaluating LLM guardrails. |  
            |   | Tamper-Resistant Safeguards for Open-Weight LLMsRishub Tamirisa*, Bhrugu Bharathi*, Long Phan, Andy Zhou, Alice Gatti, Tarun Suresh, Maxwell Lin, Justin Wang, Rowan Wang, Ron Arel, Andy Zou, Dawn Song, Bo Li, Dan Hendrycks, Mantas Mazeika
 arXiv preprint, 2024
 arxiv /
 We present a method for improving safeguard robustness against finetuning attacks. |  
            |   | AI Risk Categorization Decoded: From Corporate Policies to Government RegulationsYi Zeng*, Kevin Klyman*, Andy Zhou, Yu Yang, Minzhou Pan, Ruoxi Jia, Dawn Song, Percy Liang, Bo Li
 ICML Workshop on Generative AI and Law, 2024
 arxiv /
              
              
              
              
              
              
              
              paper /
              
              
              [Wired]
 We present a comprehensive AI risk taxonomy derived from eight government policies from the European Union, United States, and China and 16 company policies worldwide, making a significant step towards establishing a unified language for generative AI safety evaluation. |  
            |   | Towards Robust Unlearning in LLMsRishub Tamirisa, Bhrugu Bharathi, Andy Zhou, Bo Li, Mantas Mazeika
 Secure and Trustworthy LLMs @ ICLR, 2024
 
 We outline the setting of robust machine unlearning in LLMs for reliably removing unwanted knowledge. |  
            |   | FedSelect: Personalized Federated Learning with Customized Selection of Parameters for Fine-TuningRishub Tamirisa, Chulin Xie, Wenxuan Bao, Andy Zhou, Ron Arel, Aviv Shamsian
 CVPR, 2024
 
 We propose a federated-learning algorithm based on selecting which parameters to use for fine-tuning and which to make global updates. |  
            |   | GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language ModelsHaibo Jin*, Ruoxi Chen*, Andy Zhou, Jinyin Chen, Yang Zhang, Haohan Wang
 Secure and Trustworthy LLMs @ ICLR, 2024
 arxiv /
              
              
              
              
              
              
              
              paper /
 We propose a framework to generate semantic jailbreaks from human safety guidelines using syntatic parsing organized into knowledge graphs and LM optimization. Jailbreaks are SOTA for success rate and work on VLMS. |  
            |   | YouTubePD: A Multimodal Benchmark for Parkinson’s Disease AnalysisAndy Zhou*, Samuel Li*, Pranav Sriram*,Xiang Li*, Jiahua Dong*, Ansh Sharma, Yuanyi Zhong, Shirui Luo, Volodymyr Kindratenko, George Heintz, Christopher Zallek, Yuxiong Wang
 NeurIPS Datasets and Benchmarks, 2023
 arxiv /
              
              
              
              
              
              
              
              paper /
 We propose the first public benchmark for automated Parkinson’s disease analysis. We explore three tasks–facial-expression-based
PD classification, multimodal PD classification, and PD progression synthesis–and show models trained on YouTubePD generalize to real clinical data. |  
            |   | FedSelect: Customized Selection of Parameters for Fine-Tuning during Personalized Federated LearningRishub Tamirisa, John Won, Chengjun Lu, Ron Arel,    Andy Zhou
 Federated Learning @ ICML, 2023
 arxiv /
              
              
              
              
              
              
              
              paper /
 We propose a federated-learning algorithm based on selecting which parameters to use for fine-tuning and which to make global updates. |  
            |   | A Sentence Speaks a Thousand Images: Domain Generalization through Distilling CLIP with Language GuidanceZeyi Huang, Andy Zhou, Zijian Lin, Mu Cai, Haohan Wang, Yong Jae Lee
 ICCV, 2023
 arxiv /
              
              
              
              code /
              
              
              
              
              
              paper /
 We propose a distillation objective based on CLIP text representations to improve domain generalization. |  
 
 
 
 
 
 |