微调是增强预训练模型的重要手段,不同任务往往需要不同的微调方式。例如Luo et al.[1]提出RLEIF通过Evove-instruction来增强模型数学推理能力;Wei et al.[2]利用Code snnipet合成高质量的指令数据来增加模型的代码能力。然而,这些方法通常依赖高质量数据,并需要精心设计的策略才能实现显著的效果。
[1]Yu, L., Jiang, W., Shi, H., Jincheng, Y., Liu, Z., Zhang, Y., Kwok, J., Li, Z., Weller, A., and Liu, W.Metamath: Bootstrap your own mathematical questions for large language models. In The Twelfth International Conference on Learning Representations, 2023.
[2] Luo, Z., Xu, C., Zhao, P., Sun, Q., Geng, X., Hu, W., Tao, C., Ma, J., Lin, Q., and Jiang, D. Wizardcoder: Empowering code large language models with evol-instruct. arXiv preprint arXiv:2306.08568, 2023b
[3] Liu, J., Xiao, G., Li, K., Lee, J. D., Han, S., Dao, T., and Cai, T. Bitdelta: Your fine-tune may only be worth one bit. arXiv preprint arXiv:2402.10193, 2024b.