reorganize models (#35)

* Add new models and update imports * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update * update * fixes * update * update * Add new FastServe models and documentation * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
gradsflow · Mar 19, 2024 · a3946b2 · a3946b2
1 parent b58ec8b
commit a3946b2
Show file tree

Hide file tree

Showing 19 changed files with 170 additions and 160 deletions.
diff --git a/README.md b/README.md
@@ -1,10 +1,15 @@
-# [FastServe](https://github.com/gradsflow/fastserve-ai)
+<p align="center">
+  <img width="250" alt="logo" src="https://ik.imagekit.io/gradsflow/logo/v2/Gradsflow-gradient_TPwd2H3s4.png?updatedAt=1710283252606"/>
+  <br>
+  <strong>Machine Learning Serving focused on GenAI & LLMs with simplicity as the top priority.</strong>
+</p>
+<p align="center">
+  <a href="https://fastserve.gradsflow.com">Docs</a> |
+  <a href="https://github.com/gradsflow/fastserve-ai/tree/main/examples">Examples</a>
+</p>
 
-Machine Learning Serving focused on GenAI & LLMs with simplicity as the top priority.
+---
 
-> [![img_tag](https://img.youtube.com/vi/GfcmyfPB9qY/0.jpg)](https://www.youtube.com/watch?v=GfcmyfPB9qY)
->
-> YouTube: How to serve your own GPT like LLM in 1 minute with FastServe
 
 ## Installation
 
@@ -18,129 +23,16 @@ pip install FastServeAI
 pip install git+https://github.com/gradsflow/fastserve-ai.git@main
 ```
 
-## Run locally
-
-```bash
-python -m fastserve
-```
 
 ## Usage/Examples
 
+<a href="https://www.youtube.com/watch?v=GfcmyfPB9qY">
+    <img src="https://img.youtube.com/vi/GfcmyfPB9qY/0.jpg" width=350px>
+</a>
 
-### Serve LLMs with Llama-cpp
-
-```python
-from fastserve.models import ServeLlamaCpp
-
-model_path = "openhermes-2-mistral-7b.Q5_K_M.gguf"
-serve = ServeLlamaCpp(model_path=model_path, )
-serve.run_server()
-```
-
-or, run `python -m fastserve.models --model llama-cpp --model_path openhermes-2-mistral-7b.Q5_K_M.gguf` from terminal.
-
-
-### Serve vLLM
-
-```python
-from fastserve.models import ServeVLLM
-
-app = ServeVLLM("TinyLlama/TinyLlama-1.1B-Chat-v1.0")
-app.run_server()
-```
-
-You can use the FastServe client that will automatically apply chat template for you -
-
-```python
-from fastserve.client import vLLMClient
-from rich import print
-
-client = vLLMClient("TinyLlama/TinyLlama-1.1B-Chat-v1.0")
-response = client.chat("Write a python function to resize image to 224x224", keep_context=True)
-# print(client.context)
-print(response["outputs"][0]["text"])
-```
+> YouTube: How to serve your own GPT like LLM in 1 minute with FastServe.
 
 
-### Serve SDXL Turbo
-
-```python
-from fastserve.models import ServeSDXLTurbo
-
-serve = ServeSDXLTurbo(device="cuda", batch_size=2, timeout=1)
-serve.run_server()
-```
-
-or, run `python -m fastserve.models --model sdxl-turbo --batch_size 2 --timeout 1` from terminal.
-
-This application comes with an UI. You can access it at [http://localhost:8000/ui](http://localhost:8000/ui) .
-
-
-<img src="https://raw.githubusercontent.com/gradsflow/fastserve-ai/main/assets/sdxl.jpg" width=400 style="border: 1px solid #F2F3F5;">
-
-
-### Face  Detection
-
-```python
-from fastserve.models import FaceDetection
-
-serve = FaceDetection(batch_size=2, timeout=1)
-serve.run_server()
-```
-
-or, run `python -m fastserve.models --model face-detection --batch_size 2 --timeout 1` from terminal.
-
-### Image Classification
-
-```python
-from fastserve.models import ServeImageClassification
-
-app = ServeImageClassification("resnet18", timeout=1, batch_size=4)
-app.run_server()
-```
-
-or, run `python -m fastserve.models --model image-classification --model_name resnet18 --batch_size 4 --timeout 1` from
-terminal.
-
-### Serve HuggingFace Models
-
-Leveraging FastServe, you can seamlessly serve any HuggingFace Transformer model, enabling flexible deployment across various computing environments, from CPU-based systems to powerful GPU and multi-GPU setups.
-
-For some models, it is required to have a HuggingFace API token correctly set up in your environment to access models from the HuggingFace Hub.
-This is not necessary for all models, but you may encounter this requirement, such as accepting terms of use or any other necessary steps. Take a look at your model's page for specific requirements.
-```
-export HUGGINGFACE_TOKEN=<your hf token>
-```
-
-The server can be easily initiated with a specific model. In the example below, we demonstrate using `gpt2`. You should replace `gpt2` with your model of choice. The `model_name` parameter is optional; if not provided, the class attempts to fetch the model name from an environment variable `HUGGINGFACE_MODEL_NAME`. Additionally, you can now specify whether to use GPU acceleration with the `device` parameter, which defaults to `cpu` for CPU usage.
-
-```python
-from fastserve.models import ServeHuggingFace
-
-# Initialize with GPU support if desired by setting `device="cuda"`.
-# For CPU usage, you can omit `device` or set it to `cpu`.
-app = ServeHuggingFace(model_name="gpt2", device="cuda")
-app.run_server()
-```
-
-or, run `python -m fastserve.models --model huggingface --model_name bigcode/starcoder --batch_size 4 --timeout 1 --device cuda` from
-terminal.
-
-To make a request to the server, send a JSON payload with the prompt you want the model to generate text for. Here's an example using requests in Python:
-```python
-import requests
-
-response = requests.post(
-    "http://localhost:8000/endpoint",
-    json={"prompt": "Once upon a time", "temperature": 0.7, "max_tokens": 100}
-)
-print(response.json())
-```
-This setup allows you to easily deploy and interact with any Transformer model from HuggingFace's model hub, providing a convenient way to integrate AI capabilities into your applications.
-
-
-Remember, for deploying specific models, ensure that you have the necessary dependencies installed and the model files accessible if they are not directly available from HuggingFace's model hub.
-
 
 ### Serve Custom Model
 
@@ -179,32 +71,6 @@ python fastserve.deploy.lightning --filename main.py \
     --machine "CPU"  # T4, A10G or A10G_X_4
 ```
 
-## Containerization
-
-To containerize your FastServe application, a Docker example is provided in the [examples/docker-compose-example](https://github.com/gradsflow/fastserve-ai/tree/main/examples/docker-compose-example) directory. The example is about face recognition and includes a `Dockerfile` for creating a Docker image and a `docker-compose.yml` for easy deployment. Here's a quick overview:
-
-- **Dockerfile**: Defines the environment, installs dependencies from `requirements.txt`, and specifies the command to run your FastServe application.
-- **docker-compose.yml**: Simplifies the deployment of your FastServe application by defining services, networks, and volumes.
-
-To use the example, navigate to the `examples/docker-compose-example` directory and run:
-
-```shell
-docker-compose up --build
-```
-
-This will build the Docker image and start your FastServe application in a container, making it accessible on the specified port.
-
-> **Note:** We provide an example using face recognition. If you need to use other models, you will likely need to change the requirements.txt or the Dockerfile. Don't worry; this example is intended to serve as a quick start. Feel free to modify it as needed.
-
-## Passing Arguments to Uvicorn in `run_server()`
-FastServe leverages Uvicorn, a lightning-fast ASGI server, to serve machine learning models, making FastServe highly efficient and scalable.
-The `run_server()` method supports passing additional arguments to uvicorn by utilizing `*args` and `**kwargs`. This feature allows you to customize the server's behavior without modifying the source code. For example:
-
-```shell
-app.run_server(host='0.0.0.0', port=8000, log_level='info')
-```
-
-In this example, host, port, and log_level are passed directly to uvicorn.run() to specify the server's IP address, port, and logging level. You can pass any argument supported by `uvicorn.run()` to `run_server()` in this manner.
 
 ## Contribute
 

diff --git a/docs/404.md b/docs/404.md
@@ -0,0 +1 @@
+# Oops! The page you are looking for does not exist.
diff --git a/docs/fastserve/containerization.md b/docs/fastserve/containerization.md
@@ -0,0 +1,28 @@
+# Run and deploy with Docker container 🐳
+
+## Containerization
+
+To containerize your FastServe application, a Docker example is provided in the [examples/docker-compose-example](https://github.com/gradsflow/fastserve-ai/tree/main/examples/docker-compose-example) directory. The example is about face recognition and includes a `Dockerfile` for creating a Docker image and a `docker-compose.yml` for easy deployment. Here's a quick overview:
+
+- **Dockerfile**: Defines the environment, installs dependencies from `requirements.txt`, and specifies the command to run your FastServe application.
+- **docker-compose.yml**: Simplifies the deployment of your FastServe application by defining services, networks, and volumes.
+
+To use the example, navigate to the `examples/docker-compose-example` directory and run:
+
+```shell
+docker-compose up --build
+```
+
+This will build the Docker image and start your FastServe application in a container, making it accessible on the specified port.
+
+> **Note:** We provide an example using face recognition. If you need to use other models, you will likely need to change the requirements.txt or the Dockerfile. Don't worry; this example is intended to serve as a quick start. Feel free to modify it as needed.
+
+## Passing Arguments to Uvicorn in `run_server()`
+FastServe leverages Uvicorn, a lightning-fast ASGI server, to serve machine learning models, making FastServe highly efficient and scalable.
+The `run_server()` method supports passing additional arguments to uvicorn by utilizing `*args` and `**kwargs`. This feature allows you to customize the server's behavior without modifying the source code. For example:
+
+```shell
+app.run_server(host='0.0.0.0', port=8000, log_level='info')
+```
+
+In this example, host, port, and log_level are passed directly to uvicorn.run() to specify the server's IP address, port, and logging level. You can pass any argument supported by `uvicorn.run()` to `run_server()` in this manner.
diff --git a/docs/fastserve/models/face_detection.md b/docs/fastserve/models/face_detection.md
@@ -0,0 +1,10 @@
+# Serve Face  Detection
+
+```python
+from fastserve.models import FaceDetection
+
+serve = FaceDetection(batch_size=2, timeout=1)
+serve.run_server()
+```
+
+or, run `python -m fastserve.models --model face-detection --batch_size 2 --timeout 1` from terminal.
diff --git a/docs/fastserve/models/image_classification.md b/docs/fastserve/models/image_classification.md
@@ -0,0 +1,14 @@
+# Serve Image Classification models with FastServe
+
+# Image Classification
+
+```python
+from fastserve.models import ServeImageClassification
+
+app = ServeImageClassification("resnet18", timeout=1, batch_size=4)
+app.run_server()
+```
+
+or, run `python -m fastserve.models --model image-classification --model_name resnet18 --batch_size 4 --timeout 1` from
+terminal.
+
diff --git a/docs/fastserve/models/image_gen.md b/docs/fastserve/models/image_gen.md
@@ -0,0 +1,17 @@
+# Serve GenAI - Image Generation Models
+
+## Serve SDXL Turbo
+
+```python
+from fastserve.models import ServeSDXLTurbo
+
+serve = ServeSDXLTurbo(device="cuda", batch_size=2, timeout=1)
+serve.run_server()
+```
+
+or, run `python -m fastserve.models --model sdxl-turbo --batch_size 2 --timeout 1` from terminal.
+
+This application comes with an UI. You can access it at [http://localhost:8000/ui](http://localhost:8000/ui) .
+
+
+<img src="https://raw.githubusercontent.com/gradsflow/fastserve-ai/main/assets/sdxl.jpg" width=400 style="border: 1px solid #F2F3F5;">
diff --git a/docs/fastserve/models/llms/hf.md b/docs/fastserve/models/llms/hf.md
@@ -0,0 +1,40 @@
+# 🤗 Hugging Face
+
+## Serve HuggingFace Models
+
+Leveraging FastServe, you can seamlessly serve any HuggingFace Transformer model, enabling flexible deployment across various computing environments, from CPU-based systems to powerful GPU and multi-GPU setups.
+
+For some models, it is required to have a HuggingFace API token correctly set up in your environment to access models from the HuggingFace Hub.
+This is not necessary for all models, but you may encounter this requirement, such as accepting terms of use or any other necessary steps. Take a look at your model's page for specific requirements.
+```
+export HUGGINGFACE_TOKEN=<your hf token>
+```
+
+The server can be easily initiated with a specific model. In the example below, we demonstrate using `gpt2`. You should replace `gpt2` with your model of choice. The `model_name` parameter is optional; if not provided, the class attempts to fetch the model name from an environment variable `HUGGINGFACE_MODEL_NAME`. Additionally, you can now specify whether to use GPU acceleration with the `device` parameter, which defaults to `cpu` for CPU usage.
+
+```python
+from fastserve.models import ServeHuggingFace
+
+# Initialize with GPU support if desired by setting `device="cuda"`.
+# For CPU usage, you can omit `device` or set it to `cpu`.
+app = ServeHuggingFace(model_name="gpt2", device="cuda")
+app.run_server()
+```
+
+or, run `python -m fastserve.models --model huggingface --model_name bigcode/starcoder --batch_size 4 --timeout 1 --device cuda` from
+terminal.
+
+To make a request to the server, send a JSON payload with the prompt you want the model to generate text for. Here's an example using requests in Python:
+```python
+import requests
+
+response = requests.post(
+    "http://localhost:8000/endpoint",
+    json={"prompt": "Once upon a time", "temperature": 0.7, "max_tokens": 100}
+)
+print(response.json())
+```
+This setup allows you to easily deploy and interact with any Transformer model from HuggingFace's model hub, providing a convenient way to integrate AI capabilities into your applications.
+
+
+Remember, for deploying specific models, ensure that you have the necessary dependencies installed and the model files accessible if they are not directly available from HuggingFace's model hub.
diff --git a/docs/fastserve/models/llms/local_llm.md b/docs/fastserve/models/llms/local_llm.md
@@ -0,0 +1,13 @@
+# Serve LLMs locally
+
+## Serve LLMs with Llama-cpp
+
+```python
+from fastserve.models import ServeLlamaCpp
+
+model_path = "openhermes-2-mistral-7b.Q5_K_M.gguf"
+serve = ServeLlamaCpp(model_path=model_path, )
+serve.run_server()
+```
+
+or, run `python -m fastserve.models --model llama-cpp --model_path openhermes-2-mistral-7b.Q5_K_M.gguf` from terminal.
diff --git a/docs/fastserve/models/llms/vllm.md b/docs/fastserve/models/llms/vllm.md
@@ -0,0 +1,22 @@
+# Serve LLMs at Scale with vLLM
+
+## Serve vLLM
+
+```python
+from fastserve.models import ServeVLLM
+
+app = ServeVLLM("TinyLlama/TinyLlama-1.1B-Chat-v1.0")
+app.run_server()
+```
+
+You can use the FastServe client that will automatically apply chat template for you -
+
+```python
+from fastserve.client import vLLMClient
+from rich import print
+
+client = vLLMClient("TinyLlama/TinyLlama-1.1B-Chat-v1.0")
+response = client.chat("Write a python function to resize image to 224x224", keep_context=True)
+# print(client.context)
+print(response["outputs"][0]["text"])
+```
diff --git a/src/fastserve/models/__init__.py b/src/fastserve/models/__init__.py
@@ -1,9 +1,9 @@
-from fastserve.models.face_reco import FaceDetection as FaceDetection
-from fastserve.models.huggingface import ServeHuggingFace as ServeHuggingFace
-from fastserve.models.image_classification import (
+from fastserve.models.cv.face_reco import FaceDetection as FaceDetection
+from fastserve.models.cv.image_classification import (
     ServeImageClassification as ServeImageClassification,
 )
-from fastserve.models.llama_cpp import ServeLlamaCpp as ServeLlamaCpp
-from fastserve.models.sdxl_turbo import ServeSDXLTurbo as ServeSDXLTurbo
+from fastserve.models.image_gen.sdxl_turbo import ServeSDXLTurbo as ServeSDXLTurbo
+from fastserve.models.llm.huggingface import ServeHuggingFace as ServeHuggingFace
+from fastserve.models.llm.llama_cpp import ServeLlamaCpp as ServeLlamaCpp
+from fastserve.models.llm.vllm import ServeVLLM as ServeVLLM
 from fastserve.models.ssd import ServeSSD1B as ServeSSD1B
-from fastserve.models.vllm import ServeVLLM as ServeVLLM
diff --git a/src/fastserve/models/__main__.py b/src/fastserve/models/__main__.py
@@ -1,10 +1,10 @@
 import argparse
 
 from fastserve.models import ServeImageClassification
-from fastserve.models.face_reco import FaceDetection
-from fastserve.models.huggingface import ServeHuggingFace
-from fastserve.models.llama_cpp import ServeLlamaCpp
-from fastserve.models.sdxl_turbo import ServeSDXLTurbo
+from fastserve.models.cv.face_reco import FaceDetection
+from fastserve.models.image_gen.sdxl_turbo import ServeSDXLTurbo
+from fastserve.models.llm.huggingface import ServeHuggingFace
+from fastserve.models.llm.llama_cpp import ServeLlamaCpp
 from fastserve.models.ssd import ServeSSD1B
 from fastserve.utils import get_default_device
 

diff --git a/src/fastserve/models/cv/__init__.py b/src/fastserve/models/cv/__init__.py
diff --git a/src/fastserve/models/face_reco.py → src/fastserve/models/cv/face_reco.py b/src/fastserve/models/face_reco.py → src/fastserve/models/cv/face_reco.py
diff --git a/src/fastserve/models/image_classification.py → ...stserve/models/cv/image_classification.py b/src/fastserve/models/image_classification.py → ...stserve/models/cv/image_classification.py
diff --git a/src/fastserve/models/sdxl_turbo.py → src/fastserve/models/image_gen/sdxl_turbo.py b/src/fastserve/models/sdxl_turbo.py → src/fastserve/models/image_gen/sdxl_turbo.py
@@ -6,10 +6,9 @@
 import torch
 from diffusers import AutoPipelineForText2Image
 from fastapi.responses import StreamingResponse
-from pydantic import BaseModel
-
 from fastserve import FastServe
 from fastserve.utils import get_ui_folder
+from pydantic import BaseModel
 
 
 class PromptRequest(BaseModel):

diff --git a/src/fastserve/models/llms/__init__.py b/src/fastserve/models/llms/__init__.py
diff --git a/src/fastserve/models/huggingface.py → src/fastserve/models/llms/huggingface.py b/src/fastserve/models/huggingface.py → src/fastserve/models/llms/huggingface.py
diff --git a/src/fastserve/models/llama_cpp.py → src/fastserve/models/llms/llama_cpp.py b/src/fastserve/models/llama_cpp.py → src/fastserve/models/llms/llama_cpp.py
diff --git a/src/fastserve/models/vllm.py → src/fastserve/models/llms/vllm.py b/src/fastserve/models/vllm.py → src/fastserve/models/llms/vllm.py