-
Notifications
You must be signed in to change notification settings - Fork 675
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
推理代码bug反馈 #585
Comments
你好,请问微调完,想写一个py文件快速推理,参考您给的modelscope里写的代码:
微调完怎么把checkpoint及模型整合到类似model_path = 'zjunlp/OneKE’模型的文件夹里 |
export_model.py的用途是把底座模型的权重和lora参数合并成一个新的模型。 |
合成后就可以用快速推理代码进行使用了吗,据测试,快速推理的时间要比inference要慢不少,而且在inference.py测试的测试数据的输出与合成模型+简单推理输出不一致,请问这个怎么解决呢? |
看是否4bits量化,快速推理若有量化,则推理速度会慢一些 |
微调时bits设置为8,训练完我bits设置为16比bits8和4快很多,但精度没有什么区别,正常来说量化不应该更快吗,快速推理代码
不设置会更快一些?另外,lora微调不量化的话,bits是设置为16or32吗,训练一轮要太久了,所以请教您一下 |
支持训练后量化吗,是否提供代码 |
量化后会更慢,lora微调不量化的话,bits是设置为16or32 |
训练、推理是否量化均只需要修改--bits 8这一参数,16和32表示不量化,8、4表示量化。 |
用无微调的OneKe推理,bits设置8:
报错:
loader.py里
model = model.to(model_args.compute_dtype) if model_args.bits >= 8 else model
报错:ValueError: .to is not supported for 4-bit or 8-bit bitsandbytes models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct dtype.
bits 16和32没问题,
微调后:
bits8速度出奇的慢,明明量化了,为什么速度这么慢,设置成16就正常多了,我看用了bitsandbytes库做量化了,代码是存在什么bug吗?
最后弱弱问一句,export_model.py的用途是什么,把OneKe微调后的模型合并成和OneKe模型一样的格式吗
The text was updated successfully, but these errors were encountered: