Merge pull request #1726 from NNPDF/hyperopt_loss

Hyperopt loss
NNPDF · Mar 7, 2024 · 4b00608 · 4b00608
2 parents 7cc3771 + 33b6d97
commit 4b00608
Show file tree

Hide file tree

Showing 10 changed files with 744 additions and 96 deletions.
diff --git a/doc/sphinx/source/n3fit/hyperopt.rst b/doc/sphinx/source/n3fit/hyperopt.rst
@@ -34,6 +34,8 @@ The desired features of this figure of merit can be summarized as:
 3. Be reliable even when the number of points is not very large.
 
 
+.. _hyperkfolding-label:
+
 K-folding cross-validation
 --------------------------
 A good compromise between all previous points is the usage of the cross-validation technique
@@ -320,6 +322,10 @@ Changing the hyperoptimization target
 -----------------------------------
 
 Beyond the usual :math:`\chi2`-based optimization figures above, it is possible to utilize other measures as the target for hyperoptimization.
+
+Future tests
+~~~~~~~~~~~~
+
 One possibility is to use a :ref:`future test<futuretests>`-based metric for which the goal is not to get the minimum :math:`\chi2` but to get the same :math:`\chi2` (with PDF errors considered) for different datasets. The idea is that this way we select models of which the prediction is stable upon variations in the dataset.
 In order to obtain the PDF errors used in the figure of merit it is necessary to run multiple replicas, luckily ``n3fit`` provides such a possibility also during hyperoptimization.
 
@@ -372,6 +378,98 @@ The figure of merit will be the difference between the :math:`\chi2` of the seco
    L_{\rm hyperopt} = \chi^{2}_{(1) \rm pdferr} - \chi^{2}_{(2)}
 
 
+New hyperoptimization metrics with fold and replica statistics
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+The hyperopt measures discussed above are all based on performing a single replica fit per fold.
+However, one may also wish to run the hyperoptimization algorithm on fits consisting of many
+replicas per fold. This is a feasible option in ``n3fit``, since it has been optimised to
+efficiently run many replica fits in parallel on GPU.
+
+The combination of :ref:`k-folding <hyperkfolding-label>` and multi-replica experiments
+opens several possibilities for the choice of figure of merit. The simplest option would be to minimize
+the average of :math:`\chi^2` across both replica and k folds, *i.e.*,
+
+.. math::
+    L_{1} = \frac{1}{n_{\rm fold}} \sum_{k=1}^{n_{\rm fold}} \left< \chi^2_{k} \right>_{\rm rep}.
+
+In NNPDF, this hyperoptimisation metrics is selected via the following generic runcard:
+
+.. code-block:: yaml
+
+        dataset_inputs:
+        ...
+
+        kfold:
+          loss_type: chi2
+          replica_statistic: average
+          fold_statistic: average
+          partitions:
+          - datasets:
+            ...
+          - datasets:
+            ...
+
+        parallel_models: true
+
+By combining the ``average``, ``best_worst``, and ``std`` figures of merit discussed in :ref:`hyperkfolding-label`,
+several alternatives may arise. For example, one approach could involve minimizing
+the maximum value of the set of averaged-over-replicas :math:`\chi^2`,
+
+.. math::
+    L_{2} = {\rm max} \left ( \left< \chi^2_{1} \right>_{\rm rep}, \left< \chi^2_{2} \right>_{\rm rep}, ..., \left< \chi^2_{n_{\rm fold}} \right>_{\rm rep}\right),
+
+with correspond runcard ``kfold`` settings:
+
+.. code-block:: yaml
+
+        dataset_inputs:
+        ...
+
+        kfold:
+          loss_type: chi2
+          replica_statistic: average
+          fold_statistic: best_worst
+          partitions:
+          - datasets:
+            ...
+          - datasets:
+            ...
+
+An alternative metric that is
+sensitive to higher moments of the probability distribution
+has been defined in `NNPDF3.0 <https://link.springer.com/article/10.1007/JHEP04(2015)040>`_ [see Eq. (4.6) therein],
+namely, the :math:`\varphi` estimator. In the context of hyperopt, :math:`\varphi^{2}` can be calculated for each k-fold as
+
+.. math::
+  \varphi_{k}^2 = \langle \chi^2_k  [ \mathcal{T}[f_{\rm fit}], \mathcal{D} ] \rangle_{\rm rep} - \chi^2_k [ \langle \mathcal{T}[f_{\rm fit}] \rangle_{\rm rep}, \mathcal{D} ],
+
+where the first term represents the usual averaged-over-replicas :math:`\left< \chi^2_{k} \right>_{\rm rep}`
+calculated based on the dataset used in the fit (:math:`\mathcal{D}`) and
+the theory predictions from each fitted PDF (:math:`f_{\rm fit}`) replica.
+The second term involves the calculation of :math:`\chi2` but now with respect to the theory predictions from the central PDF.
+
+On the basis of :math:`\varphi`, we define the loss as
+
+.. math::
+    L_{3} = \left (\frac{1}{n_{\rm fold}} \sum_{k=1}^{n_{\rm fold}} \varphi_{k}^2 \right)^{-1},
+
+which selects hyperparameters that favor the most conservative extrapolation.
+In NNPDF, this figure of merit is chosen using the following settings:
+
+.. code-block:: yaml
+
+        kfold:
+          loss_type: phi2
+          fold_statistic: average
+            ...
+
+Alternatively, it is currently also possible to combine :math:`L_1` (which is only sensitive to the first moment)
+with :math:`L_3` (which provides information on the second moment).
+For example, one might minimize :math:`L_1` while monitoring the values of :math:`L_3` for a final model selection.
+The optimal approach for this combination is still under development.
+All the above options are implemented in the :class:`~n3fit.hyper_optimization.rewards.HyperLoss` class
+which is instantiated and monitored within :meth:`~n3fit.model_trainer.ModelTrainer.hyperparametrizable` method.
+
 Restarting hyperoptimization runs
 ---------------------------------
 

diff --git a/n3fit/runcards/examples/Basic_hyperopt.yml b/n3fit/runcards/examples/Basic_hyperopt.yml
@@ -81,7 +81,7 @@ hyperscan_config:
       activations: ['sigmoid', 'tanh']
 
 kfold:
-  target: average
+  fold_statistic: average
   penalties:
     - saturation
     - patience

diff --git a/n3fit/src/n3fit/checks.py b/n3fit/src/n3fit/checks.py
@@ -9,6 +9,7 @@
 
 from n3fit.hyper_optimization import penalties as penalties_module
 from n3fit.hyper_optimization import rewards as rewards_module
+from n3fit.hyper_optimization.rewards import IMPLEMENTED_LOSSES, IMPLEMENTED_STATS
 from reportengine.checks import CheckError, make_argcheck
 from validphys.core import PDF
 from validphys.pdfbases import check_basis
@@ -255,15 +256,33 @@ def check_kfold_options(kfold):
             raise CheckError(
                 f"The penalty '{penalty}' is not recognized, ensure it is implemented in hyper_optimization/penalties.py"
             )
-    loss_target = kfold.get("target")
-    if loss_target is not None:
-        if not hasattr(rewards_module, loss_target):
+
+    loss_type = kfold.get("loss_type")
+    if loss_type is not None:
+        if loss_type not in IMPLEMENTED_LOSSES:
+            raise CheckError(
+                f"Loss type '{loss_type}' is not recognized, "
+                "ensure it is implemented in the HyperLoss class in hyper_optimization/rewards.py."
+                "Options so far are 'chi2' or 'phi2'."
+            )
+    replica_statistic = kfold.get("replica_statistic")
+    if replica_statistic is not None:
+        if replica_statistic not in IMPLEMENTED_STATS:
             raise CheckError(
-                f"The hyperoptimization target '{loss_target}' loss is not recognized, "
-                "ensure it is implemented in hyper_optimization/rewards.py"
+                f"The replica statistic '{replica_statistic}' is not recognized, "
+                "ensure it is implemented in the HyperLoss class in hyper_optimization/rewards.py"
             )
+    fold_statistic = kfold.get("fold_statistic")
+    if fold_statistic is not None:
+        if fold_statistic not in IMPLEMENTED_STATS:
+            raise CheckError(
+                f"The fold statistic '{fold_statistic}' is not recognized, "
+                "ensure it is implemented in the HyperLoss class in hyper_optimization/rewards.py"
+            )
+
     partitions = kfold["partitions"]
     # Check specific errors for specific targets
+    loss_target = kfold.get("fold_statistic")  # TODO: haven't updated this
     if loss_target == "fit_future_tests":
         if len(partitions) == 1:
             raise CheckError("Cannot use target 'fit_future_tests' with just one partition")

diff --git a/n3fit/src/n3fit/hyper_optimization/hyper_scan.py b/n3fit/src/n3fit/hyper_optimization/hyper_scan.py
@@ -14,6 +14,7 @@
 """
 import copy
 import logging
+from typing import Callable
 
 import hyperopt
 from hyperopt.pyll.base import scope
@@ -24,6 +25,11 @@
 
 log = logging.getLogger(__name__)
 
+# Hyperopt uses these strings for a passed and failed run
+# it also has statuses "new", "running" and "suspended", but we don't use them
+HYPEROPT_STATUSES = {True: "ok", False: "fail"}
+
+
 HYPEROPT_SEED = 42
 
 
@@ -118,7 +124,7 @@ def hyper_scan_wrapper(replica_path_set, model_trainer, hyperscanner, max_evals=
         parameters of the best trial as found by ``hyperopt``
     """
     # Tell the trainer we are doing hpyeropt
-    model_trainer.set_hyperopt(True, keys=hyperscanner.hyper_keys, status_ok=hyperopt.STATUS_OK)
+    model_trainer.set_hyperopt(True, keys=hyperscanner.hyper_keys)
     # Generate the trials object
     trials = FileTrials(replica_path_set, parameters=hyperscanner.as_dict())
     # Initialize seed for hyperopt

diff --git a/n3fit/src/n3fit/hyper_optimization/penalties.py b/n3fit/src/n3fit/hyper_optimization/penalties.py
@@ -40,6 +40,11 @@ def saturation(pdf_model=None, n=100, min_x=1e-6, max_x=1e-4, flavors=None, **_k
         flavors: list(int)
             indices of the flavors to inspect
 
+    Returns
+    -------
+        NDArray
+            array of saturation penalties for each replica
+
     Example
     -------
     >>> from n3fit.hyper_optimization.penalties import saturation
@@ -72,8 +77,9 @@ def saturation(pdf_model=None, n=100, min_x=1e-6, max_x=1e-4, flavors=None, **_k
     return extra_loss
 
 
-def patience(stopping_object=None, alpha=1e-4, **_kwargs):
-    """Adds a penalty for fits that have finished too soon, which
+def patience(stopping_object, alpha: float = 1e-4, **_kwargs):
+    """
+    Adds a penalty for fits that have finished too soon, which
     means the number of epochs or its patience is not optimal.
     The penalty is proportional to the validation loss and will be 0
     when the best epoch is exactly at max_epoch - patience
@@ -85,6 +91,11 @@ def patience(stopping_object=None, alpha=1e-4, **_kwargs):
         alpha: float
             dumping factor for the exponent
 
+    Returns
+    -------
+        NDArray
+            patience penalty for each replica
+
     Example
     -------
     >>> from n3fit.hyper_optimization.penalties import patience
@@ -94,11 +105,11 @@ def patience(stopping_object=None, alpha=1e-4, **_kwargs):
     3.434143467595683
 
     """
-    epoch_best = np.take(stopping_object.e_best_chi2, 0)
+    epoch_best = np.array(stopping_object.e_best_chi2)
     patience = stopping_object.stopping_patience
     max_epochs = stopping_object.total_epochs
     diff = abs(max_epochs - patience - epoch_best)
-    vl_loss = np.take(stopping_object.vl_chi2, 0)
+    vl_loss = np.array(stopping_object.vl_chi2)
     return vl_loss * np.exp(alpha * diff)
 
 
@@ -109,6 +120,11 @@ def integrability(pdf_model=None, **_kwargs):
 
     The penalty increases exponentially with the growth of the integrability number
 
+    Returns
+    -------
+        NDArray
+            array of integrability penalties for each replica
+
     Example
     -------
     >>> from n3fit.hyper_optimization.penalties import integrability
@@ -121,8 +137,19 @@ def integrability(pdf_model=None, **_kwargs):
     """
     pdf_instance = N3PDF(pdf_model.split_replicas())
     integ_values = integrability_numbers(pdf_instance)
-    integ_overflow = np.sum(integ_values[integ_values > fitveto.INTEG_THRESHOLD])
-    if integ_overflow > 50.0:
-        # before reaching an overflow, just give a stupidly big number
-        return np.exp(50.0)
+
+    # set components under the threshold to 0
+    integ_values[integ_values <= fitveto.INTEG_THRESHOLD] = 0.0
+
+    # sum over flavours
+    integ_overflow = np.sum(integ_values, axis=-1)  # -1 rather than 1 so it works with 1 replica
+
+    # Limit components to 50 to avoid overflow
+    if isinstance(integ_overflow, np.ndarray):
+        # Case: multi-replica scenario
+        integ_overflow[integ_overflow > 50.0] = 50.0
+    elif isinstance(integ_overflow, (float, np.float64)):
+        # Case: single replica scenario
+        integ_overflow = min(integ_overflow, 50.0)
+
     return np.exp(integ_overflow) - 1.0