Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#289 - Add register, update and delete model group functionality to support Model Access Control #332

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
a2ffe54
init
rawwar Nov 3, 2023
7bef831
add search, delete and update
rawwar Nov 4, 2023
61c3090
Merge branch 'main' of https://github.com/opensearch-project/opensear…
rawwar Nov 4, 2023
a89bb39
add tests for register model group
rawwar Nov 4, 2023
3bbea93
update cluster to 2.11
rawwar Nov 4, 2023
ca765ec
test skipif
rawwar Nov 4, 2023
4c6a7fd
fix
rawwar Nov 4, 2023
af60958
add tests
rawwar Nov 4, 2023
cde4e1f
update matrix
rawwar Nov 4, 2023
28040ee
cancel in progress
rawwar Nov 4, 2023
d8fc9b2
update concurrency
rawwar Nov 4, 2023
cda4c08
job level concurrency
rawwar Nov 4, 2023
1ebeb1a
fix
rawwar Nov 4, 2023
1b72089
fix
rawwar Nov 4, 2023
e27daa3
fix tests
rawwar Nov 4, 2023
7edafe9
tests passing
rawwar Nov 4, 2023
2e0f9c4
isort fix
rawwar Nov 4, 2023
f9bc8c8
fix action
rawwar Nov 4, 2023
24ea966
fix action
rawwar Nov 4, 2023
84848e2
fix action
rawwar Nov 4, 2023
08256f6
fix
rawwar Nov 4, 2023
9547db3
update changelog
rawwar Nov 4, 2023
06f097d
fix os dockerfile
rawwar Nov 4, 2023
594baad
test
rawwar Nov 5, 2023
203aa86
pass opensearch version
rawwar Nov 5, 2023
19fdf80
fix
rawwar Nov 5, 2023
6bdcfcd
fix
rawwar Nov 5, 2023
76fcd84
fix
rawwar Nov 5, 2023
50bacd4
update OS dockerfile
rawwar Nov 5, 2023
b5133b9
fix failing tests
rawwar Nov 5, 2023
2b529a3
update dockerfile for 2.11.0
rawwar Nov 5, 2023
b61c1e3
remove disable warning
rawwar Nov 6, 2023
d5d5830
fix upload model
rawwar Nov 6, 2023
8f46cf9
fix lint
rawwar Nov 6, 2023
d2f832e
fix lint
rawwar Nov 6, 2023
286716e
Merge branch 'main' of https://github.com/opensearch-project/opensear…
rawwar Nov 6, 2023
24cb85c
include reference
rawwar Nov 6, 2023
078443f
pr fixes
rawwar Nov 6, 2023
c5cbde6
lint fix
rawwar Nov 6, 2023
c8dd748
fix lint
rawwar Nov 6, 2023
05a42b2
fix tests
rawwar Nov 6, 2023
4e47809
skip
rawwar Nov 6, 2023
bdf1201
fix lint
rawwar Nov 6, 2023
df6c069
fix lint and increase coverage
rawwar Nov 6, 2023
62b5601
fix lint
rawwar Nov 6, 2023
1993945
fix
rawwar Nov 6, 2023
2c6b346
feedback fixes
rawwar Nov 6, 2023
3b16fed
fix
rawwar Nov 9, 2023
d8479ff
lint fix
rawwar Nov 9, 2023
99a3660
fix test cases
rawwar Nov 9, 2023
ba60da7
pr feedback fixes
rawwar Nov 13, 2023
075a763
revert
rawwar Nov 13, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion .ci/opensearch/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,10 +1,18 @@
ARG OPENSEARCH_VERSION
ARG OPENSEARCH_VERSION=latest
FROM opensearchproject/opensearch:$OPENSEARCH_VERSION
rawwar marked this conversation as resolved.
Show resolved Hide resolved

# OPENSEARCH_VERSION needs to be redefined as any arg before FROM is outside build scope.
# Reference: https://docs.docker.com/engine/reference/builder/#understand-how-arg-and-from-interact
ARG OPENSEARCH_VERSION=latest
ARG opensearch_path=/usr/share/opensearch
rawwar marked this conversation as resolved.
Show resolved Hide resolved
ARG opensearch_yml=$opensearch_path/config/opensearch.yml

ARG SECURE_INTEGRATION
RUN echo "plugins.ml_commons.only_run_on_ml_node: false" >> $opensearch_yml;
RUN echo "plugins.ml_commons.native_memory_threshold: 100" >> $opensearch_yml;
RUN if [ "$OPENSEARCH_VERSION" == "2.11.0" ] ; then \
echo "plugins.ml_commons.model_access_control_enabled: true" >> $opensearch_yml; \
rawwar marked this conversation as resolved.
Show resolved Hide resolved
echo "plugins.ml_commons.allow_registering_model_via_local_file: true" >> $opensearch_yml; \
echo "plugins.ml_commons.allow_registering_model_via_url: true" >> $opensearch_yml; \
fi
RUN if [ "$SECURE_INTEGRATION" != "true" ] ; then echo "plugins.security.disabled: true" >> $opensearch_yml; fi
1 change: 0 additions & 1 deletion .ci/run-opensearch.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@
# to form a cluster suitable for running the REST API tests.
#
# Export the NUMBER_OF_NODES variable to start more than 1 node

script_path=$(dirname $(realpath -s $0))
source $script_path/imports.sh
set -euo pipefail
Expand Down
5 changes: 5 additions & 0 deletions .github/workflows/integration.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@ name: Integration tests

on: [push, pull_request]

concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true

jobs:
integration:
name: Integ
Expand All @@ -13,6 +17,7 @@ jobs:
secured: ["true"]
entry:
- { opensearch_version: 2.7.0 }
rawwar marked this conversation as resolved.
Show resolved Hide resolved
- { opensearch_version: 2.11.0 }
rawwar marked this conversation as resolved.
Show resolved Hide resolved

steps:
- name: Checkout
Expand Down
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ Inspired from [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
- Add script to trigger ml-models-release jenkins workflow with generic webhook by @thanawan-atc in ([#211](https://github.com/opensearch-project/opensearch-py-ml/pull/211))
- Add example notebook for tracing and registering a CLIPTextModel to OpenSearch with the Neural Search plugin by @patrickbarnhart in ([#283](https://github.com/opensearch-project/opensearch-py-ml/pull/283))
- Add support for train api functionality by @rawwar in ([#310](https://github.com/opensearch-project/opensearch-py-ml/pull/310))
- Add support for Model Access Control - Register, Update, Search and Delete by @rawwar in ([#332](https://github.com/opensearch-project/opensearch-py-ml/pull/332))

### Changed
- Modify ml-models.JenkinsFile so that it takes model format into account and can be triggered with generic webhook by @thanawan-atc in ([#211](https://github.com/opensearch-project/opensearch-py-ml/pull/211))
Expand Down
2 changes: 2 additions & 0 deletions opensearch_py_ml/ml_commons/ml_commons_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
MODEL_VERSION_FIELD,
TIMEOUT,
)
from opensearch_py_ml.ml_commons.model_access_control import ModelAccessControl
from opensearch_py_ml.ml_commons.model_execute import ModelExecute
from opensearch_py_ml.ml_commons.model_uploader import ModelUploader

Expand All @@ -35,6 +36,7 @@ def __init__(self, os_client: OpenSearch):
self._client = os_client
self._model_uploader = ModelUploader(os_client)
self._model_execute = ModelExecute(os_client)
self.model_access_control = ModelAccessControl(os_client)
rawwar marked this conversation as resolved.
Show resolved Hide resolved

def execute(self, algorithm_name: str, input_json: dict) -> dict:
"""
Expand Down
105 changes: 105 additions & 0 deletions opensearch_py_ml/ml_commons/model_access_control.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
# SPDX-License-Identifier: Apache-2.0
# The OpenSearch Contributors require contributions made to
# this file be licensed under the Apache-2.0 license or a
# compatible open source license.
# Any modifications Copyright OpenSearch Contributors. See
# GitHub history for details.

from typing import List, Optional

from opensearchpy import OpenSearch
from opensearchpy.exceptions import NotFoundError

from opensearch_py_ml.ml_commons.ml_common_utils import ML_BASE_URI
from opensearch_py_ml.ml_commons.validators.model_access_control import (
validate_create_model_group_parameters,
validate_delete_model_group_parameters,
validate_search_model_group_parameters,
validate_update_model_group_parameters,
)


class ModelAccessControl:
API_ENDPOINT = "model_groups"

def __init__(self, os_client: OpenSearch):
self.client = os_client

def register_model_group(
self,
name: str,
description: Optional[str] = None,
access_mode: Optional[str] = "private",
backend_roles: Optional[List[str]] = None,
add_all_backend_roles: Optional[bool] = False,
):
validate_create_model_group_parameters(
name, description, access_mode, backend_roles, add_all_backend_roles
)

body = {"name": name, "add_all_backend_roles": add_all_backend_roles}
if description:
body["description"] = description
if access_mode:
body["access_mode"] = access_mode
if backend_roles:
body["backend_roles"] = backend_roles

return self.client.transport.perform_request(
method="POST", url=f"{ML_BASE_URI}/{self.API_ENDPOINT}/_register", body=body
)

def update_model_group(
self,
update_query: dict,
model_group_id: Optional[str] = None,
):
validate_update_model_group_parameters(update_query, model_group_id)
return self.client.transport.perform_request(
method="PUT",
url=f"{ML_BASE_URI}/{self.API_ENDPOINT}/{model_group_id}",
body=update_query,
)

def search_model_group(self, query: dict):
validate_search_model_group_parameters(query)
return self.client.transport.perform_request(
method="GET", url=f"{ML_BASE_URI}/{self.API_ENDPOINT}/_search", body=query
)

def search_model_group_by_name(
self,
model_group_name: str,
_source: Optional[List] = None,
size: Optional[int] = 1,
):
query = {"query": {"match": {"name": model_group_name}}, "size": size}
if _source:
query["_source"] = _source
return self.search_model_group(query)

def get_model_group_id_by_name(self, model_group_name: str):
try:
res = self.search_model_group_by_name(model_group_name)
if res["hits"]["hits"]:
rawwar marked this conversation as resolved.
Show resolved Hide resolved
return res["hits"]["hits"][0]["_id"]
else:
raise NotFoundError
except NotFoundError:
print(f"No model group found with name:{model_group_name}")
return None
rawwar marked this conversation as resolved.
Show resolved Hide resolved
except Exception as ex:
print(f"Error in get_model_group_id_by_name: {ex}")
return None

Check warning on line 93 in opensearch_py_ml/ml_commons/model_access_control.py

View check run for this annotation

Codecov / codecov/patch

opensearch_py_ml/ml_commons/model_access_control.py#L91-L93

Added lines #L91 - L93 were not covered by tests

def delete_model_group(self, model_group_id: str):
validate_delete_model_group_parameters(model_group_id)
return self.client.transport.perform_request(
method="DELETE", url=f"{ML_BASE_URI}/{self.API_ENDPOINT}/{model_group_id}"
)

def delete_model_group_by_name(self, model_group_name: str):
model_group_id = self.get_model_group_id_by_name(model_group_name)
if model_group_id is None:
rawwar marked this conversation as resolved.
Show resolved Hide resolved
raise NotFoundError(f"Model group {model_group_name} not found")
return self.delete_model_group(model_group_id=model_group_id)
6 changes: 6 additions & 0 deletions opensearch_py_ml/ml_commons/validators/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# SPDX-License-Identifier: Apache-2.0
# The OpenSearch Contributors require contributions made to
# this file be licensed under the Apache-2.0 license or a
# compatible open source license.
# Any modifications Copyright OpenSearch Contributors. See
# GitHub history for details.
97 changes: 97 additions & 0 deletions opensearch_py_ml/ml_commons/validators/model_access_control.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# SPDX-License-Identifier: Apache-2.0
# The OpenSearch Contributors require contributions made to
# this file be licensed under the Apache-2.0 license or a
# compatible open source license.
# Any modifications Copyright OpenSearch Contributors. See
# GitHub history for details.

""" Module for validating model access control parameters """

from typing import List, Optional

ACCESS_MODES = ["public", "private", "restricted"]

NoneType = type(None)


def _validate_model_group_name(name: str):
rawwar marked this conversation as resolved.
Show resolved Hide resolved
if not name or not isinstance(name, str):
raise ValueError("name is required and needs to be a string")
rawwar marked this conversation as resolved.
Show resolved Hide resolved


def _validate_model_group_description(description: Optional[str]):
if not isinstance(description, (NoneType, str)):
raise ValueError("description needs to be a string")


def _validate_model_group_access_mode(access_mode: Optional[str]):
if access_mode is None:
return
if access_mode not in ACCESS_MODES:
raise ValueError(f"access_mode can must be in {ACCESS_MODES} or None")


def _validate_model_group_backend_roles(backend_roles: Optional[List[str]]):
if not isinstance(backend_roles, (NoneType, list)):
raise ValueError("backend_roles should either be None or a list of roles names")


def _validate_model_group_add_all_backend_roles(add_all_backend_roles: Optional[bool]):
if not isinstance(add_all_backend_roles, (NoneType, bool)):
raise ValueError("add_all_backend_roles should be a boolean")


def _validate_model_group_query(query: dict, operation: Optional[str] = None):
if not isinstance(query, dict):
raise ValueError("query needs to be a dictionary")

if operation and not isinstance(operation, str):
raise ValueError("operation needs to be a string")


def validate_create_model_group_parameters(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any specific reason we are not validating all the exception cases while creating model group? Like the ones we have here: https://github.com/opensearch-project/ml-commons/blob/6efb1ebeefb8a7930f6767262457d0d5b568ac75/plugin/src/main/java/org/opensearch/ml/model/MLModelGroupManager.java#L148

Copy link
Contributor Author

@rawwar rawwar Nov 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rbhavna , since, these validations are already being done by ml-commons, having them here as well felt like double validations. Most checks I wrote here just check for type issues and helps to make a valid API call.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, the three exception cases that we are testing here are also being validated by ml-commons. Any reason we still have them here and not the others?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I do feel we shouldn't be having these checks in opensearch-py-ml. Because, if checks on ml-commons change, we again need to update this on the client as well. I was thinking, client libraries only purpose would be to make the calls and if any errors from the opensearch, it should be able to pass them to user without any change. Also, maybe add few utility functions to make things easier for the user.

@dhrubo-os , @rbhavna , what do you think about removing these validations form opensearch-py-ml? Or Should I go ahead with implementing the validations from ml-commons ?

Copy link
Collaborator

@dhrubo-os dhrubo-os Nov 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As opensearch-py-ml is a client plugin for ml-commons, we should have only the client side testing similar like any web application. Some basic input validations which can be easily caught before the request reaching to the server. We don't need to add that type of validations for which we need to make a back-end call like testing if the user is admin or not.

Yeah we need to have the same basic validations in ml-commons too, as py-ml isn't the only client for ml-commons.

name: str,
description: Optional[str] = None,
access_mode: Optional[str] = "private",
backend_roles: Optional[List[str]] = None,
add_all_backend_roles: Optional[bool] = False,
):
_validate_model_group_name(name)
_validate_model_group_description(description)
_validate_model_group_access_mode(access_mode)
_validate_model_group_backend_roles(backend_roles)
_validate_model_group_add_all_backend_roles(add_all_backend_roles)

if access_mode == "restricted":
if not backend_roles and not add_all_backend_roles:
raise ValueError(
"You must specify either backend_roles or add_all_backend_roles=True for restricted access_mode"
)

if backend_roles and add_all_backend_roles:
raise ValueError(

Check warning on line 72 in opensearch_py_ml/ml_commons/validators/model_access_control.py

View check run for this annotation

Codecov / codecov/patch

opensearch_py_ml/ml_commons/validators/model_access_control.py#L72

Added line #L72 was not covered by tests
"You cannot specify both backend_roles and add_all_backend_roles=True at the same time"
)

elif access_mode == "private":
if backend_roles or add_all_backend_roles:
raise ValueError(
"You must not specify backend_roles or add_all_backend_roles=True for a private model group"
)


def validate_update_model_group_parameters(update_query: dict, model_group_id: str):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same doubt. Update model group also validates similar exception cases that are missing here. If we are not validating them at any other place, I would suggest we have them here.

Copy link
Collaborator

@dhrubo-os dhrubo-os Nov 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but I think the limitation is, during update we don't know what are the values already for model group in index. So either we need to get the model group with a server call and then check or we call leave it to the server validation.

In this case I would prefer the second not to complicate things.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes agreed @dhrubo-os

if not isinstance(model_group_id, str):
raise ValueError("Invalid model_group_id. model_group_id needs to be a string")

if not isinstance(update_query, dict):
raise ValueError("Invalid update_query. update_query needs to be a dictionary")


def validate_delete_model_group_parameters(model_group_id: str):
if not isinstance(model_group_id, str):
raise ValueError("Invalid model_group_id. model_group_id needs to be a string")


def validate_search_model_group_parameters(query: dict):
_validate_model_group_query(query)
Loading
Loading