Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] knn_vector field type is not supported #124

Open
causalbody opened this issue May 4, 2023 · 7 comments
Open

[FEATURE] knn_vector field type is not supported #124

causalbody opened this issue May 4, 2023 · 7 comments

Comments

@causalbody
Copy link

The org.springframework.data.elasticsearch.annotations.Field has only dense_vector but not knn_vector. It looks we cannot annotate a field as knn_vector on the model through spring-data-opensearch libs.

@wbeckler
Copy link

How would you suggest this get addressed? Do we need to go up to the spring library to request a change to the options?

@noam-shoef
Copy link

One more vote for that.
I'm going to try and deal with it by not using the class mapping and use the indexOp put mapping with a json instead but this is not ideal.
It seems like the simplest solution would be to have spring adding it.
Ideally, I'd say allow any string so that any other/future fields can be added manually and not need to be added to the enum in order to use it.

@dblock
Copy link
Member

dblock commented Oct 13, 2023

@noam-shoef Want to give this implementation a shot?

@noam-shoef
Copy link

What we came up with is something like this:

Annotation class for knn vector:

@Retention(AnnotationRetention.RUNTIME)
@Target(AnnotationTarget.PROPERTY)
public annotation class KnnVectorField(
    val type: String = "knn_vector",
    val dimensions: Int = 384,
    val methodName: String = "hnsw",
    val spaceType: String = "l2",
    val engine: String = "nmslib",
    val efConstruction: Int = 1024,
    val m: Int = 64
)

A mapping from the annotation:

public fun KnnVectorField.toMapping(): Map<String, Any> = mapOf(
    "type" to type,
    "dimension" to dimensions,
    "method" to mapOf(
        "name" to methodName,
        "space_type" to spaceType,
        "engine" to engine,
        "parameters" to mapOf(
            "ef_construction" to efConstruction,
            "m" to m
        )
    )
)

An override of the mapping function to include handing of the extra annotation:

public fun IndexOperations.createCustomMapping(clazz: KClass<*>): Document {
    val mapping: Document = createMapping(clazz.java)
    val properties = mapping["properties"] as MutableMap<String, Any?>

    val knnVectorFields = clazz.declaredMemberProperties
        .associate { it.name to it.findAnnotation<KnnVectorField>()?.toMapping() }
        .filterNot { it.value.isNullOrEmpty() }

    properties.putAll(knnVectorFields)
    return mapping
}

This enable us to annotate our field with knn annotation and then we create our mapping using the createCustomMapping extension method instead of the usual createMapping and this knows how to handle the annotation.

@dblock
Copy link
Member

dblock commented Oct 16, 2023

Will let @reta comment on the implementation as I don't understand the codebase well enough, but I encourage you to submit a PR with tests & al.

@reta
Copy link
Collaborator

reta commented Oct 16, 2023

Thanks @noam-shoef , the implementation is tricky to do:

  • the integration is built on top of spring-data-elasticsearch and have to conform (largely) to FieldType exposed by its current APIs (which has no KNN vector equivalent as you pointed out)
  • more to that, knn_vector is not a core data type but provided by plugin, so it may not work for all deployments

IndexOperations is just one place, there are other places that need to be modified in order for integration to work properly (like repositories, query side, etc). I will try to look into that.

@noam-shoef
Copy link

@reta @dblock Just to clarify, what I posted is not a proposed solution to be added to the library but a workaround solution for those who want to work with the library and need to add the knn vector field.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants