Add Clustering and TSP apps (#265)

* Prelim commit Signed-off-by: Risbud, Sumedh <sumedh.risbud@intel.com> * NEBM+SCIF merger: first pass Signed-off-by: Risbud, Sumedh <sumedh.risbud@intel.com> * Prelim commit Signed-off-by: Risbud, Sumedh <sumedh.risbud@intel.com> * NEBM+SCIF merger: second pass Signed-off-by: Risbud, Sumedh <sumedh.risbud@intel.com> * Post code-review: removed `debug` from NEBM, added docstrings, reinstated `best_solution` in read_gate Signed-off-by: Risbud, Sumedh <sumedh.risbud@intel.com> * First VRP app commit; incomplete, needs tests Signed-off-by: Risbud, Sumedh <sumedh.risbud@intel.com> * Adding helper functions to generate Q matrices for a) clustering b) tsp. TODO: Add Typing code * Second VRP app commit; functional with VRPy. Includes tests. Signed-off-by: Risbud, Sumedh <sumedh.risbud@intel.com> * Second VRP app commit; solver complete, needs correct Q matrices Signed-off-by: Risbud, Sumedh <sumedh.risbud@intel.com> * Changes to clustering matrix complete. Made changes to TSP API only. Further work needed on TSP logic * New tsp matrix generator with tests * Changed formulation of distance in TSP. Encoding is now accurate * Added proper clustering Q matrix generator (with test) * VRP Solver: Almost there Signed-off-by: Risbud, Sumedh <sumedh.risbud@intel.com> * NEBM+SCIF merger: first pass Signed-off-by: Risbud, Sumedh <sumedh.risbud@intel.com> * Prelim commit Signed-off-by: Risbud, Sumedh <sumedh.risbud@intel.com> * NEBM+SCIF merger: second pass Signed-off-by: Risbud, Sumedh <sumedh.risbud@intel.com> * Post code-review: removed `debug` from NEBM, added docstrings, reinstated `best_solution` in read_gate Signed-off-by: Risbud, Sumedh <sumedh.risbud@intel.com> * Adding helper functions to generate Q matrices for a) clustering b) tsp. TODO: Add Typing code * Resolved conflicts * Prelim commit Signed-off-by: Risbud, Sumedh <sumedh.risbud@intel.com> * NEBM+SCIF merger: first pass Signed-off-by: Risbud, Sumedh <sumedh.risbud@intel.com> * Prelim commit Signed-off-by: Risbud, Sumedh <sumedh.risbud@intel.com> * NEBM+SCIF merger: second pass Signed-off-by: Risbud, Sumedh <sumedh.risbud@intel.com> * Post code-review: removed `debug` from NEBM, added docstrings, reinstated `best_solution` in read_gate Signed-off-by: Risbud, Sumedh <sumedh.risbud@intel.com> * Changes to clustering matrix complete. Made changes to TSP API only. Further work needed on TSP logic * New tsp matrix generator with tests * Changed formulation of distance in TSP. Encoding is now accurate * VRPSolver first milestone: successfully solves VRPs Signed-off-by: Risbud, Sumedh <sumedh.risbud@intel.com> * SCIF CPU backend model minor change to remove `state_hist` and pass tests Signed-off-by: Risbud, Sumedh <sumedh.risbud@intel.com> * Delinting working VRPSolver Signed-off-by: Risbud, Sumedh <sumedh.risbud@intel.com> * Commented out a piece of code dependent on a draft PR Signed-off-by: Risbud, Sumedh <sumedh.risbud@intel.com> * Remove lint and add VRPy to PyProject.TOML Signed-off-by: Risbud, Sumedh <sumedh.risbud@intel.com> * Sparsification attempt #1: DistProxy with sign inversion and max cut-off Signed-off-by: Risbud, Sumedh <sumedh.risbud@intel.com> * Intermediate check point commit for scenario sweep Signed-off-by: Risbud, Sumedh <sumedh.risbud@intel.com> * Profiling and sparsification related improvements to VRPSolver and VRPConfig Signed-off-by: Risbud, Sumedh <sumedh.risbud@intel.com> * Script to sweep various scenarios for performance modelling of VRPSolver Signed-off-by: Risbud, Sumedh <sumedh.risbud@intel.com> * Profiling and sparsification related improvements to VRPSolver and VRPConfig Signed-off-by: Risbud, Sumedh <sumedh.risbud@intel.com> * Fixed the way to check if VRPy is installed Signed-off-by: Risbud, Sumedh <sumedh.risbud@intel.com> * Code clean-up refactoring in LCA module Signed-off-by: Risbud, Sumedh <sumedh.risbud@intel.com> * Corrected TSP Q matrix name Signed-off-by: Risbud, Sumedh <sumedh.risbud@intel.com> * Added edge-pruning based sparsification Signed-off-by: Risbud, Sumedh <sumedh.risbud@intel.com> * Tests and scripts for quantification of the effect of dist-mat sparsity on solution quality Signed-off-by: Risbud, Sumedh <sumedh.risbud@intel.com> * Sparsification attempt #1: DistProxy with sign inversion and max cut-off Signed-off-by: Risbud, Sumedh <sumedh.risbud@intel.com> * Delint VRP solver.py Signed-off-by: Risbud, Sumedh <sumedh.risbud@intel.com> * First commit of clustering and TSP. Clustering is almost complete. Signed-off-by: Risbud, Sumedh <sumedh.risbud@intel.com> * Cleaner unittest for solver Signed-off-by: Risbud, Sumedh <sumedh.risbud@intel.com> * Functioning Clustering and TSP apps Signed-off-by: Risbud, Sumedh <sumedh.risbud@intel.com> * Delinting Signed-off-by: Risbud, Sumedh <sumedh.risbud@intel.com> * Removed VRP from this branch Signed-off-by: Risbud, Sumedh <sumedh.risbud@intel.com> * Removed VRP unittests Signed-off-by: Risbud, Sumedh <sumedh.risbud@intel.com> * Clustering demo jupyter notebook added Signed-off-by: Risbud, Sumedh <sumedh.risbud@intel.com> * TSP demo jupyter notebook added Signed-off-by: Risbud, Sumedh <sumedh.risbud@intel.com> --------- Signed-off-by: Risbud, Sumedh <sumedh.risbud@intel.com> Co-authored-by: Ashish Rao Mangalore <ashish.rao.mangalore@intel.com>
lava-nc · Nov 9, 2023 · 4ae9122 · 4ae9122
1 parent 821b81f
commit 4ae9122
Show file tree

Hide file tree

Showing 25 changed files with 2,290 additions and 11 deletions.
diff --git a/pyproject.toml b/pyproject.toml
@@ -60,6 +60,7 @@ scipy = "^1.10.1"
 nbformat = "^5.7.1"
 seaborn = "^0.12.2"
 
+
 [tool.poetry.dev-dependencies]
 bandit = "1.7.4"
 coverage = "^6.3.2"

diff --git a/src/lava/lib/optimization/apps/clustering/problems.py b/src/lava/lib/optimization/apps/clustering/problems.py
@@ -0,0 +1,157 @@
+# Copyright (C) 2023 Intel Corporation
+# SPDX-License-Identifier: BSD-3-Clause
+# See: https://spdx.org/licenses/
+
+import networkx as ntx
+import numpy as np
+import typing as ty
+
+
+class ClusteringProblem:
+    """Problem specification for a clustering problem.
+
+    N points need to be clustered into M clusters.
+
+    The cluster centers are *given*. Clustering is done to assign cluster IDs
+    to points based on the closest cluster centers.
+    """
+    def __init__(self,
+                 point_coords: ty.List[ty.Tuple[int, int]],
+                 center_coords: ty.Union[int, ty.List[ty.Tuple[int, int]]],
+                 edges: ty.Optional[ty.List[ty.Tuple[int, int]]] = None):
+        """
+        Parameters
+        ----------
+        point_coords : list(tuple(int, int))
+            A list of integer tuples corresponding to the coordinates of
+            points to be clustered.
+        center_coords : list(tuple(int, int))
+            A list of integer tuples corresponding to the coordinates of
+            cluster-centers.
+        edges : (Optional) list(tuple(int, int, float))
+            An optional list of edges connecting points and cluster centers,
+            given as a list of triples (ID1, ID2, weight). See the note
+            below for ID-scheme. If None, assume all-to-all connectivity
+            between points, weighted by their pairwise distances.
+
+        Notes
+        -----
+        IDs 1 to M correspond to cluster centers and (M+1) to (M+N) correspond
+        to the points to be clustered.
+        """
+        super().__init__()
+        self._point_coords = point_coords
+        self._center_coords = center_coords
+        self._num_points = len(self._point_coords)
+        self._num_clusters = len(self._center_coords)
+        self._cluster_ids = list(np.arange(1, self._num_clusters + 1))
+        self._point_ids = list(np.arange(
+            self._num_clusters + 1, self._num_clusters + self._num_points + 1))
+        self._points = dict(zip(self._point_ids, self._point_coords))
+        self._cluster_centers = dict(zip(self._cluster_ids,
+                                         self._center_coords))
+        if edges:
+            self._edges = edges
+        else:
+            self._edges = []
+
+        self._problem_graph = None
+
+    @property
+    def points(self):
+        return self._points
+
+    @points.setter
+    def points(self, points: ty.Dict[int, ty.Tuple[int, int]]):
+        self._points = points
+
+    @property
+    def point_ids(self):
+        return self._point_ids
+
+    @property
+    def point_coords(self):
+        return self._point_coords
+
+    @property
+    def num_points(self):
+        return self._num_points
+
+    @property
+    def edges(self):
+        return self._edges
+
+    @property
+    def cluster_centers(self):
+        return self._cluster_centers
+
+    @cluster_centers.setter
+    def cluster_centers(self, cluster_centers: ty.Dict[int, ty.Tuple[int,
+                        int]]):
+        self._cluster_centers = cluster_centers
+
+    @property
+    def cluster_ids(self):
+        return self._cluster_ids
+
+    @property
+    def center_coords(self):
+        return self._center_coords
+
+    @property
+    def num_clusters(self):
+        return self._num_clusters
+
+    @property
+    def problem_graph(self):
+        """NetworkX problem graph is created and returned.
+
+            If edges are specified, they are taken into account.
+            Returns
+            -------
+            A graph object corresponding to the problem.
+        """
+        if not self._problem_graph:
+            self._generate_problem_graph()
+        return self._problem_graph
+
+    def _generate_problem_graph(self):
+        if len(self.edges) > 0:
+            gph = ntx.DiGraph()
+            # Add the nodes to be visited
+            gph.add_nodes_from(self.point_ids)
+            # If there are user-provided edges, add them between the nodes
+            gph.add_edges_from(self.edges)
+        else:
+            gph = ntx.complete_graph(self.point_ids, create_using=ntx.DiGraph())
+
+        node_type_dict = dict(zip(self.point_ids,
+                                  ["Point"] * len(self.point_ids)))
+        # Associate node type as "Node" and node coordinates as attributes
+        ntx.set_node_attributes(gph, node_type_dict, name="Type")
+        ntx.set_node_attributes(gph, self.points, name="Coordinates")
+
+        # Add vehicles as nodes
+        gph.add_nodes_from(self.cluster_ids)
+        # Associate node type as "Vehicle" and vehicle coordinates as attributes
+        cluster_center_type_dict = dict(zip(self.cluster_ids,
+                                            ["Cluster Center"] * len(
+                                                self.cluster_ids)))
+        ntx.set_node_attributes(gph, cluster_center_type_dict, name="Type")
+        ntx.set_node_attributes(gph, self.cluster_centers, name="Coordinates")
+
+        # Add edges from initial vehicle positions to all nodes (oneway edges)
+        for cid in self.cluster_ids:
+            for pid in self.points:
+                gph.add_edge(cid, pid)
+
+        # Compute Euclidean distance along all edges and assign them as edge
+        # weights
+        # ToDo: Replace the loop with independent distance matrix computation
+        #  and then assign the distances as attributes
+        for edge in gph.edges.keys():
+            gph.edges[edge]["cost"] = np.linalg.norm(
+                np.array(gph.nodes[edge[1]]["Coordinates"]) - np.array(
+                    gph.nodes[edge[0]]["Coordinates"]))
+
+        self._problem_graph = gph
diff --git a/src/lava/lib/optimization/apps/clustering/solver.py b/src/lava/lib/optimization/apps/clustering/solver.py
@@ -0,0 +1,202 @@
+# Copyright (C) 2023 Intel Corporation
+# SPDX-License-Identifier: BSD-3-Clause
+# See: https://spdx.org/licenses/
+
+
+import numpy as np
+from pprint import pprint
+from dataclasses import dataclass
+
+from lava.lib.optimization.problems.problems import QUBO
+from lava.lib.optimization.solvers.generic.solver import OptimizationSolver, \
+    SolverReport
+from lava.lib.optimization.apps.clustering.problems import ClusteringProblem
+from lava.lib.optimization.apps.clustering.utils.q_matrix_generator import \
+    QMatrixClust
+
+import typing as ty
+import numpy.typing as npty
+
+from lava.magma.core.resources import (
+    CPU,
+    Loihi2NeuroCore,
+    NeuroCore,
+)
+from lava.lib.optimization.solvers.generic.solver import SolverConfig
+
+BACKENDS = ty.Union[CPU, Loihi2NeuroCore, NeuroCore, str]
+CPUS = [CPU, "CPU"]
+NEUROCORES = [Loihi2NeuroCore, NeuroCore, "Loihi2"]
+
+BACKEND_MSG = f""" was requested as backend. However,
+the solver currently supports only Loihi 2 and CPU backends.
+These can be specified by calling solve with any of the following:
+backend = "CPU"
+backend = "Loihi2"
+backend = CPU
+backend = Loihi2NeuroCore
+backend = NeuroCoreS
+The explicit resource classes can be imported from
+lava.magma.core.resources"""
+
+
+@dataclass
+class ClusteringConfig(SolverConfig):
+    """Solver configuration for VRP solver.
+
+    Parameters
+    ----------
+    core_solver : CoreSolver
+        Core algorithm that solves a given VRP. Possible values are
+        CoreSolver.VRPY_CPU or CoreSolver.LAVA_QUBO.
+
+    Notes
+    -----
+    VRPConfig class inherits from `SolverConfig` class at
+    `lava.lib.optimization.solvers.generic.solver`. Please refer to the
+    documentation for `SolverConfig` to know more about other arguments that
+    can be passed.
+    """
+
+    do_distance_sparsification: bool = False
+    sparsification_algo: str = "cutoff"
+    max_dist_cutoff_fraction: float = 1.0
+    profile_q_mat_gen: bool = False
+    only_gen_q_mat: bool = False
+
+
+@dataclass
+class ClusteringSolution:
+    """Clustering solution holds two dictionaries:
+        - `clustering_id_map` holds a map from cluster center ID to a list
+        of point IDs
+        - `clustering_coords_map` holds a map from the cluster center
+        coordinates to the point coordinates
+    """
+    clustering_id_map: dict = None
+    clustering_coords_map: dict = None
+
+
+class ClusteringSolver:
+    """Solver for clustering problems, given cluster centers.
+    """
+    def __init__(self, clp: ClusteringProblem):
+        self.problem = clp
+        self._solver = None
+        self._profiler = None
+        self.dist_sparsity = 0.
+        self.dist_proxy_sparsity = 0.
+        self.q_gen_time = 0.
+        self.q_shape = None
+        self.raw_solution = None
+        self.solution = ClusteringSolution()
+
+    @property
+    def solver(self):
+        return self._solver
+
+    @property
+    def profiler(self):
+        return self._profiler
+
+    def solve(self, scfg: ClusteringConfig = ClusteringConfig()):
+        """
+        Solve a clustering problem using a given solver configuration.
+
+        Parameters
+        ----------
+        scfg (ClusteringConfig) : Configuration parameters.
+
+        Notes
+        -----
+        The solver object also stores profiling data as its attributes.
+        """
+        # 1. Generate Q matrix for clustering
+        node_list_for_clustering = self.problem.center_coords + \
+            self.problem.point_coords
+        # number of binary variables = total_num_nodes * num_clusters
+        mat_size = len(node_list_for_clustering) * self.problem.num_clusters
+        q_mat_obj = QMatrixClust(
+            node_list_for_clustering,
+            num_clusters=self.problem.num_clusters,
+            lambda_dist=1,
+            lambda_points=100,
+            lambda_centers=100,
+            fixed_pt=True,
+            fixed_pt_range=(-128, 127),
+            clust_dist_sparse_params={
+                "do_sparse": scfg.do_distance_sparsification,
+                "algo": scfg.sparsification_algo,
+                "max_dist_cutoff_fraction": scfg.max_dist_cutoff_fraction},
+            profile_mat_gen=scfg.profile_q_mat_gen)
+        q_mat = q_mat_obj.matrix.astype(int)
+        self.dist_sparsity = q_mat_obj.dist_sparsity
+        self.dist_proxy_sparsity = q_mat_obj.dist_proxy_sparsity
+        if scfg.profile_q_mat_gen:
+            self.q_gen_time = q_mat_obj.time_to_gen_mat
+            self.q_shape = q_mat.shape
+        # 2. Call Lava QUBO solvers
+        if not scfg.only_gen_q_mat:
+            prob = QUBO(q=q_mat)
+            self._solver = OptimizationSolver(problem=prob)
+            hparams = {
+                'neuron_model': 'nebm-sa-refract',
+                'refract': 10,
+                'refract_scaling': 6,
+                'init_state': np.random.randint(0, 2, size=(mat_size,)),
+                'min_temperature': 1,
+                'max_temperature': 5,
+                'steps_per_temperature': 200
+            }
+            if not scfg.hyperparameters:
+                scfg.hyperparameters.update(hparams)
+            report: SolverReport = self._solver.solve(config=scfg)
+            if report.profiler:
+                self._profiler = report.profiler
+                pprint(f"Clustering execution"
+                       f" took {np.sum(report.profiler.execution_time)}s")
+            # 3. Post process the clustering solution
+            self.raw_solution: npty.NDArray = \
+                report.best_state.reshape((self.problem.num_clusters,
+                                           len(node_list_for_clustering))).T
+        else:
+            self.raw_solution = -1 * np.ones((self.problem.num_clusters,
+                                              len(node_list_for_clustering))).T
+
+        self.post_process_sol()
+
+    def post_process_sol(self):
+        """
+        Post-process the clustering solution returned by `solve()`.
+
+        The clustering solution returned by the `solve` method is a 2-D
+        binary numpy array, wherein the columns correspond to clusters and
+        rows correspond to points or cluster centers. entry (i, j) is 1 if
+        point/cluster center 'i' belongs to cluster 'j'.
+        """
+
+        coord_list = (self.problem.center_coords + self.problem.point_coords)
+        id_map = {}
+        coord_map = {}
+        for j, col in enumerate(self.raw_solution.T):
+            node_idxs = np.nonzero(col)
+            # ID of "this" cluster is the only nonzero row in this column
+            # from row 0 to row 'num_clusters' - 1
+            this_cluster_id = \
+                (node_idxs[0][node_idxs[0] < self.problem.num_clusters] + 1)
+            if len(this_cluster_id) != 1:
+                raise ValueError(f"More than one cluster center found in "
+                                 f"{j}th cluster. Clustering might not have "
+                                 f"converged to a valid solution.")
+            node_idxs = node_idxs[0][node_idxs[0] >= self.problem.num_clusters]
+            id_map.update({this_cluster_id.item(): (node_idxs + 1).tolist()})
+
+            this_center_coords = np.array(coord_list)[this_cluster_id - 1, :]
+            point_coords_this_cluster = np.array(coord_list)[node_idxs, :]
+            point_coords_this_cluster = \
+                [tuple(point) for point in point_coords_this_cluster.tolist()]
+            coord_map.update({
+                tuple(this_center_coords.flatten()): point_coords_this_cluster})
+
+        self.solution.clustering_id_map = id_map
+        self.solution.clustering_coords_map = coord_map