generated from aniketmaurya/python-project-template
-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Add new models and update imports * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update * update * fixes * update * update * Add new FastServe models and documentation * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
- Loading branch information
1 parent
b58ec8b
commit a3946b2
Showing
19 changed files
with
170 additions
and
160 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
# Oops! The page you are looking for does not exist. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
# Run and deploy with Docker container 🐳 | ||
|
||
## Containerization | ||
|
||
To containerize your FastServe application, a Docker example is provided in the [examples/docker-compose-example](https://github.com/gradsflow/fastserve-ai/tree/main/examples/docker-compose-example) directory. The example is about face recognition and includes a `Dockerfile` for creating a Docker image and a `docker-compose.yml` for easy deployment. Here's a quick overview: | ||
|
||
- **Dockerfile**: Defines the environment, installs dependencies from `requirements.txt`, and specifies the command to run your FastServe application. | ||
- **docker-compose.yml**: Simplifies the deployment of your FastServe application by defining services, networks, and volumes. | ||
|
||
To use the example, navigate to the `examples/docker-compose-example` directory and run: | ||
|
||
```shell | ||
docker-compose up --build | ||
``` | ||
|
||
This will build the Docker image and start your FastServe application in a container, making it accessible on the specified port. | ||
|
||
> **Note:** We provide an example using face recognition. If you need to use other models, you will likely need to change the requirements.txt or the Dockerfile. Don't worry; this example is intended to serve as a quick start. Feel free to modify it as needed. | ||
## Passing Arguments to Uvicorn in `run_server()` | ||
FastServe leverages Uvicorn, a lightning-fast ASGI server, to serve machine learning models, making FastServe highly efficient and scalable. | ||
The `run_server()` method supports passing additional arguments to uvicorn by utilizing `*args` and `**kwargs`. This feature allows you to customize the server's behavior without modifying the source code. For example: | ||
|
||
```shell | ||
app.run_server(host='0.0.0.0', port=8000, log_level='info') | ||
``` | ||
|
||
In this example, host, port, and log_level are passed directly to uvicorn.run() to specify the server's IP address, port, and logging level. You can pass any argument supported by `uvicorn.run()` to `run_server()` in this manner. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
# Serve Face Detection | ||
|
||
```python | ||
from fastserve.models import FaceDetection | ||
|
||
serve = FaceDetection(batch_size=2, timeout=1) | ||
serve.run_server() | ||
``` | ||
|
||
or, run `python -m fastserve.models --model face-detection --batch_size 2 --timeout 1` from terminal. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
# Serve Image Classification models with FastServe | ||
|
||
# Image Classification | ||
|
||
```python | ||
from fastserve.models import ServeImageClassification | ||
|
||
app = ServeImageClassification("resnet18", timeout=1, batch_size=4) | ||
app.run_server() | ||
``` | ||
|
||
or, run `python -m fastserve.models --model image-classification --model_name resnet18 --batch_size 4 --timeout 1` from | ||
terminal. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
# Serve GenAI - Image Generation Models | ||
|
||
## Serve SDXL Turbo | ||
|
||
```python | ||
from fastserve.models import ServeSDXLTurbo | ||
|
||
serve = ServeSDXLTurbo(device="cuda", batch_size=2, timeout=1) | ||
serve.run_server() | ||
``` | ||
|
||
or, run `python -m fastserve.models --model sdxl-turbo --batch_size 2 --timeout 1` from terminal. | ||
|
||
This application comes with an UI. You can access it at [http://localhost:8000/ui](http://localhost:8000/ui) . | ||
|
||
|
||
<img src="https://raw.githubusercontent.com/gradsflow/fastserve-ai/main/assets/sdxl.jpg" width=400 style="border: 1px solid #F2F3F5;"> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
# 🤗 Hugging Face | ||
|
||
## Serve HuggingFace Models | ||
|
||
Leveraging FastServe, you can seamlessly serve any HuggingFace Transformer model, enabling flexible deployment across various computing environments, from CPU-based systems to powerful GPU and multi-GPU setups. | ||
|
||
For some models, it is required to have a HuggingFace API token correctly set up in your environment to access models from the HuggingFace Hub. | ||
This is not necessary for all models, but you may encounter this requirement, such as accepting terms of use or any other necessary steps. Take a look at your model's page for specific requirements. | ||
``` | ||
export HUGGINGFACE_TOKEN=<your hf token> | ||
``` | ||
|
||
The server can be easily initiated with a specific model. In the example below, we demonstrate using `gpt2`. You should replace `gpt2` with your model of choice. The `model_name` parameter is optional; if not provided, the class attempts to fetch the model name from an environment variable `HUGGINGFACE_MODEL_NAME`. Additionally, you can now specify whether to use GPU acceleration with the `device` parameter, which defaults to `cpu` for CPU usage. | ||
|
||
```python | ||
from fastserve.models import ServeHuggingFace | ||
|
||
# Initialize with GPU support if desired by setting `device="cuda"`. | ||
# For CPU usage, you can omit `device` or set it to `cpu`. | ||
app = ServeHuggingFace(model_name="gpt2", device="cuda") | ||
app.run_server() | ||
``` | ||
|
||
or, run `python -m fastserve.models --model huggingface --model_name bigcode/starcoder --batch_size 4 --timeout 1 --device cuda` from | ||
terminal. | ||
|
||
To make a request to the server, send a JSON payload with the prompt you want the model to generate text for. Here's an example using requests in Python: | ||
```python | ||
import requests | ||
|
||
response = requests.post( | ||
"http://localhost:8000/endpoint", | ||
json={"prompt": "Once upon a time", "temperature": 0.7, "max_tokens": 100} | ||
) | ||
print(response.json()) | ||
``` | ||
This setup allows you to easily deploy and interact with any Transformer model from HuggingFace's model hub, providing a convenient way to integrate AI capabilities into your applications. | ||
|
||
|
||
Remember, for deploying specific models, ensure that you have the necessary dependencies installed and the model files accessible if they are not directly available from HuggingFace's model hub. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
# Serve LLMs locally | ||
|
||
## Serve LLMs with Llama-cpp | ||
|
||
```python | ||
from fastserve.models import ServeLlamaCpp | ||
|
||
model_path = "openhermes-2-mistral-7b.Q5_K_M.gguf" | ||
serve = ServeLlamaCpp(model_path=model_path, ) | ||
serve.run_server() | ||
``` | ||
|
||
or, run `python -m fastserve.models --model llama-cpp --model_path openhermes-2-mistral-7b.Q5_K_M.gguf` from terminal. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
# Serve LLMs at Scale with vLLM | ||
|
||
## Serve vLLM | ||
|
||
```python | ||
from fastserve.models import ServeVLLM | ||
|
||
app = ServeVLLM("TinyLlama/TinyLlama-1.1B-Chat-v1.0") | ||
app.run_server() | ||
``` | ||
|
||
You can use the FastServe client that will automatically apply chat template for you - | ||
|
||
```python | ||
from fastserve.client import vLLMClient | ||
from rich import print | ||
|
||
client = vLLMClient("TinyLlama/TinyLlama-1.1B-Chat-v1.0") | ||
response = client.chat("Write a python function to resize image to 224x224", keep_context=True) | ||
# print(client.context) | ||
print(response["outputs"][0]["text"]) | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,9 +1,9 @@ | ||
from fastserve.models.face_reco import FaceDetection as FaceDetection | ||
from fastserve.models.huggingface import ServeHuggingFace as ServeHuggingFace | ||
from fastserve.models.image_classification import ( | ||
from fastserve.models.cv.face_reco import FaceDetection as FaceDetection | ||
from fastserve.models.cv.image_classification import ( | ||
ServeImageClassification as ServeImageClassification, | ||
) | ||
from fastserve.models.llama_cpp import ServeLlamaCpp as ServeLlamaCpp | ||
from fastserve.models.sdxl_turbo import ServeSDXLTurbo as ServeSDXLTurbo | ||
from fastserve.models.image_gen.sdxl_turbo import ServeSDXLTurbo as ServeSDXLTurbo | ||
from fastserve.models.llm.huggingface import ServeHuggingFace as ServeHuggingFace | ||
from fastserve.models.llm.llama_cpp import ServeLlamaCpp as ServeLlamaCpp | ||
from fastserve.models.llm.vllm import ServeVLLM as ServeVLLM | ||
from fastserve.models.ssd import ServeSSD1B as ServeSSD1B | ||
from fastserve.models.vllm import ServeVLLM as ServeVLLM |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Empty file.
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Empty file.
File renamed without changes.
File renamed without changes.
File renamed without changes.