publications

2025

  1. ICML 2025
    Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions
    Yik Siu Chan, Narutatsu Ri, Yuxin Xiao, and Marzyeh Ghassemi
    2025
  2. ICML 2025
    MIB: A Mechanistic Interpretability Benchmark
    Aaron Mueller, Atticus Geiger, Sarah Wiegreffe, Dana Arad, Iván Arcuschin, Adam Belfki, Yik Siu Chan, Jaden Fiotto-Kaufman, Tal Haklay, Michael Hanna, Jing Huang, Rohan Gupta, Yaniv Nikankin, Hadas Orgad, Nikhil Prakash, Anja Reusch, Aruna Sankaranarayanan, Shun Shao, Alessandro Stolfo, Martin Tutek, Amir Zur, David Bau, and Yonatan Belinkov
    2025

2024

  1. NeurIPS 2024
    MDAgents: An Adaptive Collaboration of LLMs for Medical Decision Making
    Yubin Kim, Chanwoo Park, Hyewon Jeong, Yik Siu Chan, Xuhai Xu, Daniel McDuff, Hyeonhoon Lee, Marzyeh Ghassemi, Cynthia Breazeal, and Hae Won Park
    2024