Skip to content

Commit

Permalink
[Feature] Implement of RAM with a gradio interface. (#1802)
Browse files Browse the repository at this point in the history
* [CodeCamp2023-584]Support DINO self-supervised learning in project (#1756)

* feat: impelemt DINO

* chore: delete debug code

* chore: impplement pre-commit

* fix: fix imported package

* chore: pre-commit check

* [CodeCamp2023-340] New Version of config Adapting MobileNet Algorithm (#1774)

* add new config adapting MobileNetV2,V3

* add base model config for mobile net v3, modified all training configs of mobile net v3 inherit from the base model config

* removed directory _base_/models/mobilenet_v3

* [Feature] Implement of Zero-Shot CLIP Classifier (#1737)

* zero-shot CLIP

* modify zero-shot clip config

* add in1k_sub_prompt(8 prompts) for improvement

* add some annotations doc

* clip base class & clip_zs sub-class

* some modifications of details after review

* convert into and use mmpretrain-vit

* modify names of some files and directories

* ram init commit

* [Fix] Fix pipeline bug in image retrieval inferencer

* [CodeCamp2023-341] 多模态数据集文档补充-COCO Retrieval

* Update OFA to compat with latest huggingface.

* Update train.py to compat with new config

* Bump version to v1.1.0

* Update __init__.py

---------

Co-authored-by: LALBJ <40877073+LALBJ@users.noreply.github.com>
Co-authored-by: DE009 <57087096+DE009@users.noreply.github.com>
Co-authored-by: mzr1996 <mzr1996@163.com>
Co-authored-by: 飞飞 <102729089+ASHORE1225@users.noreply.github.com>
  • Loading branch information
5 people authored Oct 25, 2023
1 parent c076651 commit ed5924b
Show file tree
Hide file tree
Showing 69 changed files with 4,618 additions and 26 deletions.
9 changes: 3 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,13 +86,10 @@ https://github.com/open-mmlab/mmpretrain/assets/26739999/e4dcd3a2-f895-4d1b-a351

## What's new

🌟 v1.0.2 was released in 15/08/2023
🌟 v1.1.0 was released in 12/10/2023

Support [MFF](./configs/mff/) self-supervised algorithm and enhance the codebase. More details can be found in the [changelog](https://mmpretrain.readthedocs.io/en/latest/notes/changelog.html).

🌟 v1.0.1 was released in 28/07/2023

Fix some bugs and enhance the codebase. Please refer to [changelog](https://mmpretrain.readthedocs.io/en/latest/notes/changelog.html) for more details.
- Support Mini-GPT4 training and provide a Chinese model (based on Baichuan-7B)
- Support zero-shot classification based on CLIP.

🌟 v1.0.0 was released in 04/07/2023

Expand Down
9 changes: 3 additions & 6 deletions README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,13 +84,10 @@ https://github.com/open-mmlab/mmpretrain/assets/26739999/e4dcd3a2-f895-4d1b-a351

## 更新日志

🌟 2023/8/15 发布了 v1.0.2 版本
🌟 2023/10/12 发布了 v1.1.0 版本

支持了 [MFF](./configs/mff/) 自监督算法,增强算法库功能。细节请参考 [更新日志](https://mmpretrain.readthedocs.io/zh_CN/latest/notes/changelog.html)

🌟 2023/7/28 发布了 v1.0.1 版本

修复部分 bug 和增强算法库功能。细节请参考 [更新日志](https://mmpretrain.readthedocs.io/zh_CN/latest/notes/changelog.html)
- 支持 Mini-GPT4 训练并提供一个基于 Baichuan-7B 的中文模型
- 支持基于 CLIP 的零样本分类。

🌟 2023/7/4 发布了 v1.0.0 版本

Expand Down
68 changes: 68 additions & 0 deletions configs/clip/clip_vit-base-p16_zeroshot-cls_cifar100.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
_base_ = '../_base_/default_runtime.py'

# data settings
data_preprocessor = dict(
type='MultiModalDataPreprocessor',
mean=[0.48145466 * 255, 0.4578275 * 255, 0.40821073 * 255],
std=[0.26862954 * 255, 0.26130258 * 255, 0.27577711 * 255],
to_rgb=False,
)

test_pipeline = [
dict(type='Resize', scale=(224, 224), interpolation='bicubic'),
dict(
type='PackInputs',
algorithm_keys=['text'],
meta_keys=['image_id', 'scale_factor'],
),
]

train_dataloader = None
test_dataloader = dict(
batch_size=32,
num_workers=8,
dataset=dict(
type='CIFAR100',
data_root='data/cifar100',
split='test',
pipeline=test_pipeline),
sampler=dict(type='DefaultSampler', shuffle=False),
)
test_evaluator = dict(type='Accuracy', topk=(1, 5))

# schedule settings
train_cfg = None
val_cfg = None
test_cfg = dict()

# model settings
model = dict(
type='CLIPZeroShot',
vision_backbone=dict(
type='VisionTransformer',
arch='base',
img_size=224,
patch_size=16,
drop_rate=0.,
layer_cfgs=dict(act_cfg=dict(type='QuickGELU')),
pre_norm=True,
),
projection=dict(type='CLIPProjection', in_channels=768, out_channels=512),
text_backbone=dict(
type='CLIPTransformer',
width=512,
layers=12,
heads=8,
attn_mask=True,
),
tokenizer=dict(
type='AutoTokenizer',
name_or_path='openai/clip-vit-base-patch16',
use_fast=False),
vocab_size=49408,
transformer_width=512,
proj_dim=512,
text_prototype='cifar100',
text_prompt='openai_cifar100',
context_length=77,
)
69 changes: 69 additions & 0 deletions configs/clip/clip_vit-base-p16_zeroshot-cls_in1k.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
_base_ = '../_base_/default_runtime.py'

# data settings
data_preprocessor = dict(
type='MultiModalDataPreprocessor',
mean=[0.48145466 * 255, 0.4578275 * 255, 0.40821073 * 255],
std=[0.26862954 * 255, 0.26130258 * 255, 0.27577711 * 255],
to_rgb=True,
)

test_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='Resize', scale=(224, 224), interpolation='bicubic'),
dict(
type='PackInputs',
algorithm_keys=['text'],
meta_keys=['image_id', 'scale_factor'],
),
]

train_dataloader = None
test_dataloader = dict(
batch_size=32,
num_workers=8,
dataset=dict(
type='ImageNet',
data_root='data/imagenet',
split='val',
pipeline=test_pipeline),
sampler=dict(type='DefaultSampler', shuffle=False),
)
test_evaluator = dict(type='Accuracy', topk=(1, 5))

# schedule settings
train_cfg = None
val_cfg = None
test_cfg = dict()

# model settings
model = dict(
type='CLIPZeroShot',
vision_backbone=dict(
type='VisionTransformer',
arch='base',
img_size=224,
patch_size=16,
drop_rate=0.,
layer_cfgs=dict(act_cfg=dict(type='QuickGELU')),
pre_norm=True,
),
projection=dict(type='CLIPProjection', in_channels=768, out_channels=512),
text_backbone=dict(
type='CLIPTransformer',
width=512,
layers=12,
heads=8,
attn_mask=True,
),
tokenizer=dict(
type='AutoTokenizer',
name_or_path='openai/clip-vit-base-patch16',
use_fast=False),
vocab_size=49408,
transformer_width=512,
proj_dim=512,
text_prototype='imagenet',
text_prompt='openai_imagenet_sub', # openai_imagenet, openai_imagenet_sub
context_length=77,
)
68 changes: 68 additions & 0 deletions configs/clip/clip_vit-large-p14_zeroshot-cls_cifar100.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
_base_ = '../_base_/default_runtime.py'

# data settings
data_preprocessor = dict(
type='MultiModalDataPreprocessor',
mean=[0.48145466 * 255, 0.4578275 * 255, 0.40821073 * 255],
std=[0.26862954 * 255, 0.26130258 * 255, 0.27577711 * 255],
to_rgb=False,
)

test_pipeline = [
dict(type='Resize', scale=(224, 224), interpolation='bicubic'),
dict(
type='PackInputs',
algorithm_keys=['text'],
meta_keys=['image_id', 'scale_factor'],
),
]

train_dataloader = None
test_dataloader = dict(
batch_size=32,
num_workers=8,
dataset=dict(
type='CIFAR100',
data_root='data/cifar100',
split='test',
pipeline=test_pipeline),
sampler=dict(type='DefaultSampler', shuffle=False),
)
test_evaluator = dict(type='Accuracy', topk=(1, 5))

# schedule settings
train_cfg = None
val_cfg = None
test_cfg = dict()

# model settings
model = dict(
type='CLIPZeroShot',
vision_backbone=dict(
type='VisionTransformer',
arch='large',
img_size=224,
patch_size=14,
drop_rate=0.,
layer_cfgs=dict(act_cfg=dict(type='QuickGELU')),
pre_norm=True,
),
projection=dict(type='CLIPProjection', in_channels=1024, out_channels=768),
text_backbone=dict(
type='CLIPTransformer',
width=768,
layers=12,
heads=12,
attn_mask=True,
),
tokenizer=dict(
type='AutoTokenizer',
name_or_path='openai/clip-vit-large-patch14',
use_fast=False),
vocab_size=49408,
transformer_width=768,
proj_dim=768,
text_prototype='cifar100',
text_prompt='openai_cifar100',
context_length=77,
)
69 changes: 69 additions & 0 deletions configs/clip/clip_vit-large-p14_zeroshot-cls_in1k.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
_base_ = '../_base_/default_runtime.py'

# data settings
data_preprocessor = dict(
type='MultiModalDataPreprocessor',
mean=[0.48145466 * 255, 0.4578275 * 255, 0.40821073 * 255],
std=[0.26862954 * 255, 0.26130258 * 255, 0.27577711 * 255],
to_rgb=True,
)

test_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='Resize', scale=(224, 224), interpolation='bicubic'),
dict(
type='PackInputs',
algorithm_keys=['text'],
meta_keys=['image_id', 'scale_factor'],
),
]

train_dataloader = None
test_dataloader = dict(
batch_size=32,
num_workers=8,
dataset=dict(
type='ImageNet',
data_root='data/imagenet',
split='val',
pipeline=test_pipeline),
sampler=dict(type='DefaultSampler', shuffle=False),
)
test_evaluator = dict(type='Accuracy', topk=(1, 5))

# schedule settings
train_cfg = None
val_cfg = None
test_cfg = dict()

# model settings
model = dict(
type='CLIPZeroShot',
vision_backbone=dict(
type='VisionTransformer',
arch='large',
img_size=224,
patch_size=14,
drop_rate=0.,
layer_cfgs=dict(act_cfg=dict(type='QuickGELU')),
pre_norm=True,
),
projection=dict(type='CLIPProjection', in_channels=1024, out_channels=768),
text_backbone=dict(
type='CLIPTransformer',
width=768,
layers=12,
heads=12,
attn_mask=True,
),
tokenizer=dict(
type='AutoTokenizer',
name_or_path='openai/clip-vit-large-patch14',
use_fast=False),
vocab_size=49408,
transformer_width=768,
proj_dim=768,
text_prototype='imagenet',
text_prompt='openai_imagenet_sub', # openai_imagenet, openai_imagenet_sub
context_length=77,
)
6 changes: 3 additions & 3 deletions docker/serve/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
ARG PYTORCH="1.12.1"
ARG CUDA="11.3"
ARG PYTORCH="2.0.1"
ARG CUDA="11.7"
ARG CUDNN="8"
FROM pytorch/torchserve:latest-gpu

ARG MMPRE="1.0.2"
ARG MMPRE="1.1.0"

ENV PYTHONUNBUFFERED TRUE

Expand Down
22 changes: 22 additions & 0 deletions docs/en/notes/changelog.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,27 @@
# Changelog (MMPreTrain)

## v1.1.0(12/10/2023)

### New Features

- [Feature] Implement of Zero-Shot CLIP Classifier ([#1737](https://github.com/open-mmlab/mmpretrain/pull/1737))
- [Feature] Add minigpt4 gradio demo and training script. ([#1758](https://github.com/open-mmlab/mmpretrain/pull/1758))

### Improvements

- [Config] New Version of config Adapting MobileNet Algorithm ([#1774](https://github.com/open-mmlab/mmpretrain/pull/1774))
- [Config] Support DINO self-supervised learning in project ([#1756](https://github.com/open-mmlab/mmpretrain/pull/1756))
- [Config] New Version of config Adapting Swin Transformer Algorithm ([#1780](https://github.com/open-mmlab/mmpretrain/pull/1780))
- [Enhance] Add iTPN Supports for Non-three channel image ([#1735](https://github.com/open-mmlab/mmpretrain/pull/1735))
- [Docs] Update dataset download script from opendatalab to openXlab ([#1765](https://github.com/open-mmlab/mmpretrain/pull/1765))
- [Docs] Update COCO-Retrieval dataset docs. ([#1806](https://github.com/open-mmlab/mmpretrain/pull/1806))

### Bug Fix

- Update `train.py` to compat with new config.
- Update OFA module to compat with the latest huggingface.
- Fix pipeline bug in ImageRetrievalInferencer.

## v1.0.2(15/08/2023)

### New Features
Expand Down
2 changes: 1 addition & 1 deletion docs/en/notes/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ and make sure you fill in all required information in the template.

| MMPretrain version | MMEngine version | MMCV version |
| :----------------: | :---------------: | :--------------: |
| 1.0.2 (main) | mmengine >= 0.8.3 | mmcv >= 2.0.0 |
| 1.1.0 (main) | mmengine >= 0.8.3 | mmcv >= 2.0.0 |
| 1.0.0 | mmengine >= 0.8.0 | mmcv >= 2.0.0 |
| 1.0.0rc8 | mmengine >= 0.7.1 | mmcv >= 2.0.0rc4 |
| 1.0.0rc7 | mmengine >= 0.5.0 | mmcv >= 2.0.0rc4 |
Expand Down
2 changes: 1 addition & 1 deletion docs/zh_CN/notes/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@

| MMPretrain 版本 | MMEngine 版本 | MMCV 版本 |
| :-------------: | :---------------: | :--------------: |
| 1.0.2 (main) | mmengine >= 0.8.3 | mmcv >= 2.0.0 |
| 1.1.0 (main) | mmengine >= 0.8.3 | mmcv >= 2.0.0 |
| 1.0.0 | mmengine >= 0.8.0 | mmcv >= 2.0.0 |
| 1.0.0rc8 | mmengine >= 0.7.1 | mmcv >= 2.0.0rc4 |
| 1.0.0rc7 | mmengine >= 0.5.0 | mmcv >= 2.0.0rc4 |
Expand Down
2 changes: 1 addition & 1 deletion mmpretrain/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
from .version import __version__

mmcv_minimum_version = '2.0.0'
mmcv_maximum_version = '2.1.0'
mmcv_maximum_version = '2.2.0'
mmcv_version = digit_version(mmcv.__version__)

mmengine_minimum_version = '0.8.3'
Expand Down
1 change: 1 addition & 0 deletions mmpretrain/apis/image_retrieval.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,7 @@ def build_dataloader(dataset):
# A config of dataset
from mmpretrain.registry import DATASETS
test_pipeline = [dict(type='LoadImageFromFile'), self.pipeline]
prototype.setdefault('pipeline', test_pipeline)
dataset = DATASETS.build(prototype)
dataloader = build_dataloader(dataset)
elif isinstance(prototype, DataLoader):
Expand Down
Loading

0 comments on commit ed5924b

Please sign in to comment.