Skip to content

add miniPile dataset for pretraining, 1M entries (solves the 'out of data' at 40 iters issue) #20

add miniPile dataset for pretraining, 1M entries (solves the 'out of data' at 40 iters issue)

add miniPile dataset for pretraining, 1M entries (solves the 'out of data' at 40 iters issue) #20

Workflow file for this run

name: 4 GPU Unit Test
on:
push:
branches: [ main ]
pull_request:
concurrency:
group: unit-test${{ github.workflow }}-${{ github.ref == 'refs/heads/main' && github.run_number || github.ref }}
cancel-in-progress: true
defaults:
run:
shell: bash -l -eo pipefail {0}
jobs:
unit_tests_4gpu:
runs-on: linux.g5.12xlarge.nvidia.gpu
strategy:
matrix:
python-version: ['3.10']
steps:
- name: Check out repo
uses: actions/checkout@v3
- name: Setup conda env
uses: conda-incubator/setup-miniconda@v2
with:
auto-update-conda: true
miniconda-version: "latest"
activate-environment: test
python-version: ${{ matrix.python-version }}
- name: Update pip
run: python -m pip install --upgrade pip
- name: Install dependencies
run: |
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121
python -m pip install -r requirements.txt
python -m pip install -r dev-requirements.txt
python -m pip install -e .
- name: Run NGPU=4 ./run_llama_train.sh
run: NGPU=4 ./run_llama_train.sh
- name: Upload Coverage to Codecov
uses: codecov/codecov-action@v3