[1] Chakraborty, Trishna, et al. "Cross-Modal Safety Alignment: Is textual unlearning all you need?." arXiv preprint arXiv:2406.02575 (2024).
[2] Wang, Pengyu, et al. "Inferaligner: Inference-time alignment for harmlessness through cross-model guidance." arXiv preprint arXiv:2401.11206 (2024).
[3] Gou, Yunhao, et al. "Eyes closed, safety on: Protecting multimodal llms via image-to-text transformation." European Conference on Computer Vision. Springer, Cham, 2025.
[4] Gong, Yichen, et al. "Figstep: Jailbreaking large vision-language models via typographic visual prompts." arXiv preprint arXiv:2311.05608 (2023).
[5] Luo, Weidi, et al. "Jailbreakv-28k: A benchmark for assessing the robustness of multimodal large language models against jailbreak attacks." arXiv preprint arXiv:2404.03027 (2024).
[6] Mazeika, Mantas, et al. "Harmbench: A standardized evaluation framework for automated red teaming and robust refusal." arXiv preprint arXiv:2402.04249 (2024).
[7] Chen, Yangyi, et al. "Dress: Instructing large vision-language models to align and interact with humans via natural language feedback." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024.
[8] Zong, Yongshuo, et al. "Safety fine-tuning at (almost) no cost: A baseline for vision large language models." arXiv preprint arXiv:2402.02207 (2024).
[9] Zhang, Yongting, et al. "SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model." arXiv preprint arXiv:2406.12030 (2024).