Skip to content

seanses/LLM_fine_tuning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fine tune your own private Copilot

Open In Colab

Introduction

The integration between GitHub and Colab has been annoyingly difficult. While it's possible to open a notebook from a GitHub link in Colab, unfortunately, none of the rest of the repository content is brought into the Colab runtime. This makes it cumbersome to make use of other materials saved in your repo, that includes your dataset preprocessing scripts, structured training code, and maybe even the dataset itself. People have compromised and resorted to alternative solutions to complete a fine tuning lifecycle:

  1. First create some dataset and put it in GDrive or a Hugging Face dataset repo.
  2. Put up some code in notebook and run it in Colab, loading models from a Hugging Face model repo.
  3. Save the fine tuned model back into a Hugging Face model repo.
  4. Evaluate the fine tuned model. And if it's not ideal, go back to step 1.

This breaks one project into three pieces stored in different places: a dataset repo, a source code (notebook) repo, and a model repo, and there's no good way to cross reference between their individual versions. For example, if one fine tuning lifecycle deteriorates, one has to manually search back into three parallel history, letting alone the difficulty to revert to a good base.

In this guide we demonstrate that one can

  1. Version all three pieces together in one GitHub repo managed by XetData GitHub app.
  2. Clones only what you need in the training to Colab runtime using Lazy clone feature.

This fine tuning example uses a Lora approach on top of Code Llama, quantizing the base model to int 8, freezing its weights and only training an adapter. Please accept their License at https://ai.meta.com/resources/models-and-libraries/llama-downloads/. Much of the code is refactored from [1], [2], [3].

How to use this repository?

This repository already contains a drop of Code Llama in Hugging Face format. You can fork this repository and opens fine-tune-code-llama.ipynb in Colab. Follow the instructions in the notebook to fine tune your private Copilot and save it back to your repo!

About

Fine tune an LLM with everything tracked together

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published