Skip to content

Commit

Permalink
reorganize models (#35)
Browse files Browse the repository at this point in the history
* Add new models and update imports

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

* update

* fixes

* update

* update

* Add new FastServe models and documentation

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
  • Loading branch information
aniketmaurya and pre-commit-ci[bot] committed Mar 19, 2024
1 parent b58ec8b commit a3946b2
Show file tree
Hide file tree
Showing 19 changed files with 170 additions and 160 deletions.
162 changes: 14 additions & 148 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,15 @@
# [FastServe](https://github.com/gradsflow/fastserve-ai)
<p align="center">
<img width="250" alt="logo" src="https://ik.imagekit.io/gradsflow/logo/v2/Gradsflow-gradient_TPwd2H3s4.png?updatedAt=1710283252606"/>
<br>
<strong>Machine Learning Serving focused on GenAI & LLMs with simplicity as the top priority.</strong>
</p>
<p align="center">
<a href="https://fastserve.gradsflow.com">Docs</a> |
<a href="https://github.com/gradsflow/fastserve-ai/tree/main/examples">Examples</a>
</p>

Machine Learning Serving focused on GenAI & LLMs with simplicity as the top priority.
---

> [![img_tag](https://img.youtube.com/vi/GfcmyfPB9qY/0.jpg)](https://www.youtube.com/watch?v=GfcmyfPB9qY)
>
> YouTube: How to serve your own GPT like LLM in 1 minute with FastServe

## Installation

Expand All @@ -18,129 +23,16 @@ pip install FastServeAI
pip install git+https://github.com/gradsflow/fastserve-ai.git@main
```

## Run locally

```bash
python -m fastserve
```

## Usage/Examples

<a href="https://www.youtube.com/watch?v=GfcmyfPB9qY">
<img src="https://img.youtube.com/vi/GfcmyfPB9qY/0.jpg" width=350px>
</a>

### Serve LLMs with Llama-cpp

```python
from fastserve.models import ServeLlamaCpp

model_path = "openhermes-2-mistral-7b.Q5_K_M.gguf"
serve = ServeLlamaCpp(model_path=model_path, )
serve.run_server()
```

or, run `python -m fastserve.models --model llama-cpp --model_path openhermes-2-mistral-7b.Q5_K_M.gguf` from terminal.


### Serve vLLM

```python
from fastserve.models import ServeVLLM

app = ServeVLLM("TinyLlama/TinyLlama-1.1B-Chat-v1.0")
app.run_server()
```

You can use the FastServe client that will automatically apply chat template for you -

```python
from fastserve.client import vLLMClient
from rich import print

client = vLLMClient("TinyLlama/TinyLlama-1.1B-Chat-v1.0")
response = client.chat("Write a python function to resize image to 224x224", keep_context=True)
# print(client.context)
print(response["outputs"][0]["text"])
```
> YouTube: How to serve your own GPT like LLM in 1 minute with FastServe.

### Serve SDXL Turbo

```python
from fastserve.models import ServeSDXLTurbo

serve = ServeSDXLTurbo(device="cuda", batch_size=2, timeout=1)
serve.run_server()
```

or, run `python -m fastserve.models --model sdxl-turbo --batch_size 2 --timeout 1` from terminal.

This application comes with an UI. You can access it at [http://localhost:8000/ui](http://localhost:8000/ui) .


<img src="https://raw.githubusercontent.com/gradsflow/fastserve-ai/main/assets/sdxl.jpg" width=400 style="border: 1px solid #F2F3F5;">


### Face Detection

```python
from fastserve.models import FaceDetection

serve = FaceDetection(batch_size=2, timeout=1)
serve.run_server()
```

or, run `python -m fastserve.models --model face-detection --batch_size 2 --timeout 1` from terminal.

### Image Classification

```python
from fastserve.models import ServeImageClassification

app = ServeImageClassification("resnet18", timeout=1, batch_size=4)
app.run_server()
```

or, run `python -m fastserve.models --model image-classification --model_name resnet18 --batch_size 4 --timeout 1` from
terminal.

### Serve HuggingFace Models

Leveraging FastServe, you can seamlessly serve any HuggingFace Transformer model, enabling flexible deployment across various computing environments, from CPU-based systems to powerful GPU and multi-GPU setups.

For some models, it is required to have a HuggingFace API token correctly set up in your environment to access models from the HuggingFace Hub.
This is not necessary for all models, but you may encounter this requirement, such as accepting terms of use or any other necessary steps. Take a look at your model's page for specific requirements.
```
export HUGGINGFACE_TOKEN=<your hf token>
```

The server can be easily initiated with a specific model. In the example below, we demonstrate using `gpt2`. You should replace `gpt2` with your model of choice. The `model_name` parameter is optional; if not provided, the class attempts to fetch the model name from an environment variable `HUGGINGFACE_MODEL_NAME`. Additionally, you can now specify whether to use GPU acceleration with the `device` parameter, which defaults to `cpu` for CPU usage.

```python
from fastserve.models import ServeHuggingFace

# Initialize with GPU support if desired by setting `device="cuda"`.
# For CPU usage, you can omit `device` or set it to `cpu`.
app = ServeHuggingFace(model_name="gpt2", device="cuda")
app.run_server()
```

or, run `python -m fastserve.models --model huggingface --model_name bigcode/starcoder --batch_size 4 --timeout 1 --device cuda` from
terminal.

To make a request to the server, send a JSON payload with the prompt you want the model to generate text for. Here's an example using requests in Python:
```python
import requests

response = requests.post(
"http://localhost:8000/endpoint",
json={"prompt": "Once upon a time", "temperature": 0.7, "max_tokens": 100}
)
print(response.json())
```
This setup allows you to easily deploy and interact with any Transformer model from HuggingFace's model hub, providing a convenient way to integrate AI capabilities into your applications.


Remember, for deploying specific models, ensure that you have the necessary dependencies installed and the model files accessible if they are not directly available from HuggingFace's model hub.


### Serve Custom Model

Expand Down Expand Up @@ -179,32 +71,6 @@ python fastserve.deploy.lightning --filename main.py \
--machine "CPU" # T4, A10G or A10G_X_4
```

## Containerization

To containerize your FastServe application, a Docker example is provided in the [examples/docker-compose-example](https://github.com/gradsflow/fastserve-ai/tree/main/examples/docker-compose-example) directory. The example is about face recognition and includes a `Dockerfile` for creating a Docker image and a `docker-compose.yml` for easy deployment. Here's a quick overview:

- **Dockerfile**: Defines the environment, installs dependencies from `requirements.txt`, and specifies the command to run your FastServe application.
- **docker-compose.yml**: Simplifies the deployment of your FastServe application by defining services, networks, and volumes.

To use the example, navigate to the `examples/docker-compose-example` directory and run:

```shell
docker-compose up --build
```

This will build the Docker image and start your FastServe application in a container, making it accessible on the specified port.

> **Note:** We provide an example using face recognition. If you need to use other models, you will likely need to change the requirements.txt or the Dockerfile. Don't worry; this example is intended to serve as a quick start. Feel free to modify it as needed.
## Passing Arguments to Uvicorn in `run_server()`
FastServe leverages Uvicorn, a lightning-fast ASGI server, to serve machine learning models, making FastServe highly efficient and scalable.
The `run_server()` method supports passing additional arguments to uvicorn by utilizing `*args` and `**kwargs`. This feature allows you to customize the server's behavior without modifying the source code. For example:

```shell
app.run_server(host='0.0.0.0', port=8000, log_level='info')
```

In this example, host, port, and log_level are passed directly to uvicorn.run() to specify the server's IP address, port, and logging level. You can pass any argument supported by `uvicorn.run()` to `run_server()` in this manner.

## Contribute

Expand Down
1 change: 1 addition & 0 deletions docs/404.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Oops! The page you are looking for does not exist.
28 changes: 28 additions & 0 deletions docs/fastserve/containerization.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Run and deploy with Docker container 🐳

## Containerization

To containerize your FastServe application, a Docker example is provided in the [examples/docker-compose-example](https://github.com/gradsflow/fastserve-ai/tree/main/examples/docker-compose-example) directory. The example is about face recognition and includes a `Dockerfile` for creating a Docker image and a `docker-compose.yml` for easy deployment. Here's a quick overview:

- **Dockerfile**: Defines the environment, installs dependencies from `requirements.txt`, and specifies the command to run your FastServe application.
- **docker-compose.yml**: Simplifies the deployment of your FastServe application by defining services, networks, and volumes.

To use the example, navigate to the `examples/docker-compose-example` directory and run:

```shell
docker-compose up --build
```

This will build the Docker image and start your FastServe application in a container, making it accessible on the specified port.

> **Note:** We provide an example using face recognition. If you need to use other models, you will likely need to change the requirements.txt or the Dockerfile. Don't worry; this example is intended to serve as a quick start. Feel free to modify it as needed.
## Passing Arguments to Uvicorn in `run_server()`
FastServe leverages Uvicorn, a lightning-fast ASGI server, to serve machine learning models, making FastServe highly efficient and scalable.
The `run_server()` method supports passing additional arguments to uvicorn by utilizing `*args` and `**kwargs`. This feature allows you to customize the server's behavior without modifying the source code. For example:

```shell
app.run_server(host='0.0.0.0', port=8000, log_level='info')
```

In this example, host, port, and log_level are passed directly to uvicorn.run() to specify the server's IP address, port, and logging level. You can pass any argument supported by `uvicorn.run()` to `run_server()` in this manner.
10 changes: 10 additions & 0 deletions docs/fastserve/models/face_detection.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Serve Face Detection

```python
from fastserve.models import FaceDetection

serve = FaceDetection(batch_size=2, timeout=1)
serve.run_server()
```

or, run `python -m fastserve.models --model face-detection --batch_size 2 --timeout 1` from terminal.
14 changes: 14 additions & 0 deletions docs/fastserve/models/image_classification.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Serve Image Classification models with FastServe

# Image Classification

```python
from fastserve.models import ServeImageClassification

app = ServeImageClassification("resnet18", timeout=1, batch_size=4)
app.run_server()
```

or, run `python -m fastserve.models --model image-classification --model_name resnet18 --batch_size 4 --timeout 1` from
terminal.

17 changes: 17 additions & 0 deletions docs/fastserve/models/image_gen.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Serve GenAI - Image Generation Models

## Serve SDXL Turbo

```python
from fastserve.models import ServeSDXLTurbo

serve = ServeSDXLTurbo(device="cuda", batch_size=2, timeout=1)
serve.run_server()
```

or, run `python -m fastserve.models --model sdxl-turbo --batch_size 2 --timeout 1` from terminal.

This application comes with an UI. You can access it at [http://localhost:8000/ui](http://localhost:8000/ui) .


<img src="https://raw.githubusercontent.com/gradsflow/fastserve-ai/main/assets/sdxl.jpg" width=400 style="border: 1px solid #F2F3F5;">
40 changes: 40 additions & 0 deletions docs/fastserve/models/llms/hf.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# 🤗 Hugging Face

## Serve HuggingFace Models

Leveraging FastServe, you can seamlessly serve any HuggingFace Transformer model, enabling flexible deployment across various computing environments, from CPU-based systems to powerful GPU and multi-GPU setups.

For some models, it is required to have a HuggingFace API token correctly set up in your environment to access models from the HuggingFace Hub.
This is not necessary for all models, but you may encounter this requirement, such as accepting terms of use or any other necessary steps. Take a look at your model's page for specific requirements.
```
export HUGGINGFACE_TOKEN=<your hf token>
```

The server can be easily initiated with a specific model. In the example below, we demonstrate using `gpt2`. You should replace `gpt2` with your model of choice. The `model_name` parameter is optional; if not provided, the class attempts to fetch the model name from an environment variable `HUGGINGFACE_MODEL_NAME`. Additionally, you can now specify whether to use GPU acceleration with the `device` parameter, which defaults to `cpu` for CPU usage.

```python
from fastserve.models import ServeHuggingFace

# Initialize with GPU support if desired by setting `device="cuda"`.
# For CPU usage, you can omit `device` or set it to `cpu`.
app = ServeHuggingFace(model_name="gpt2", device="cuda")
app.run_server()
```

or, run `python -m fastserve.models --model huggingface --model_name bigcode/starcoder --batch_size 4 --timeout 1 --device cuda` from
terminal.

To make a request to the server, send a JSON payload with the prompt you want the model to generate text for. Here's an example using requests in Python:
```python
import requests

response = requests.post(
"http://localhost:8000/endpoint",
json={"prompt": "Once upon a time", "temperature": 0.7, "max_tokens": 100}
)
print(response.json())
```
This setup allows you to easily deploy and interact with any Transformer model from HuggingFace's model hub, providing a convenient way to integrate AI capabilities into your applications.


Remember, for deploying specific models, ensure that you have the necessary dependencies installed and the model files accessible if they are not directly available from HuggingFace's model hub.
13 changes: 13 additions & 0 deletions docs/fastserve/models/llms/local_llm.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Serve LLMs locally

## Serve LLMs with Llama-cpp

```python
from fastserve.models import ServeLlamaCpp

model_path = "openhermes-2-mistral-7b.Q5_K_M.gguf"
serve = ServeLlamaCpp(model_path=model_path, )
serve.run_server()
```

or, run `python -m fastserve.models --model llama-cpp --model_path openhermes-2-mistral-7b.Q5_K_M.gguf` from terminal.
22 changes: 22 additions & 0 deletions docs/fastserve/models/llms/vllm.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Serve LLMs at Scale with vLLM

## Serve vLLM

```python
from fastserve.models import ServeVLLM

app = ServeVLLM("TinyLlama/TinyLlama-1.1B-Chat-v1.0")
app.run_server()
```

You can use the FastServe client that will automatically apply chat template for you -

```python
from fastserve.client import vLLMClient
from rich import print

client = vLLMClient("TinyLlama/TinyLlama-1.1B-Chat-v1.0")
response = client.chat("Write a python function to resize image to 224x224", keep_context=True)
# print(client.context)
print(response["outputs"][0]["text"])
```
12 changes: 6 additions & 6 deletions src/fastserve/models/__init__.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
from fastserve.models.face_reco import FaceDetection as FaceDetection
from fastserve.models.huggingface import ServeHuggingFace as ServeHuggingFace
from fastserve.models.image_classification import (
from fastserve.models.cv.face_reco import FaceDetection as FaceDetection
from fastserve.models.cv.image_classification import (
ServeImageClassification as ServeImageClassification,
)
from fastserve.models.llama_cpp import ServeLlamaCpp as ServeLlamaCpp
from fastserve.models.sdxl_turbo import ServeSDXLTurbo as ServeSDXLTurbo
from fastserve.models.image_gen.sdxl_turbo import ServeSDXLTurbo as ServeSDXLTurbo
from fastserve.models.llm.huggingface import ServeHuggingFace as ServeHuggingFace
from fastserve.models.llm.llama_cpp import ServeLlamaCpp as ServeLlamaCpp
from fastserve.models.llm.vllm import ServeVLLM as ServeVLLM
from fastserve.models.ssd import ServeSSD1B as ServeSSD1B
from fastserve.models.vllm import ServeVLLM as ServeVLLM
8 changes: 4 additions & 4 deletions src/fastserve/models/__main__.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
import argparse

from fastserve.models import ServeImageClassification
from fastserve.models.face_reco import FaceDetection
from fastserve.models.huggingface import ServeHuggingFace
from fastserve.models.llama_cpp import ServeLlamaCpp
from fastserve.models.sdxl_turbo import ServeSDXLTurbo
from fastserve.models.cv.face_reco import FaceDetection
from fastserve.models.image_gen.sdxl_turbo import ServeSDXLTurbo
from fastserve.models.llm.huggingface import ServeHuggingFace
from fastserve.models.llm.llama_cpp import ServeLlamaCpp
from fastserve.models.ssd import ServeSSD1B
from fastserve.utils import get_default_device

Expand Down
Empty file.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,9 @@
import torch
from diffusers import AutoPipelineForText2Image
from fastapi.responses import StreamingResponse
from pydantic import BaseModel

from fastserve import FastServe
from fastserve.utils import get_ui_folder
from pydantic import BaseModel


class PromptRequest(BaseModel):
Expand Down
Empty file.
File renamed without changes.
File renamed without changes.
File renamed without changes.

0 comments on commit a3946b2

Please sign in to comment.