Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

diffNuisances.py: adding per-nuisance delta NLL #827

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

kcormi
Copy link
Collaborator

@kcormi kcormi commented Mar 22, 2023

This PR makes a few changes to diffNuisances.py, the largest of them being the ability to print and plot explicitly the change in log-likelihood between the background-only and signal+background fits for every given parameter.

Because the likelihood factorizes into a Poisson over each bin and a constraint term over each nuisance, the contribution of each bin and/or nuisance can be directly determined. These differences simply sum to give the total delta NLL.

Here, by passing a workspace we evaluate the pdf constraint term for each nuisance at its background-only best fit point and S+B best fit point to get the delta NLL. This is optional, if no workspace is passed, then diffNuisances.py simply runs as it used to.

A plot of the deltaNLL is also made ordered from largest to smallest DeltaNLL and showing a cumulative line. This can help to quickly identify if the change in postfit nuisances is contributing to a significance, and which nuisances in particular are contributing.

Other smaller changes:

  1. Added a parameter --max-nuis which limits the number of nuisances per plot. If more nuisances exist multiple plots are created of each type, each containing only up to the maximum number of nuisances per plot. I've also increased the bottom margin of the plots to help make the nuisance names visible.
  2. I've moved the diffNuisances.py script from test/ to scripts/ and made it executable, so that it can be run as a command-line tool. I've also removed the version under the data/tutorial/longexercise/, which had become slightly out-of-date, and updated the documentation in the exercise to call the script without needing the explicit python invocation.

A few thoughts for future PRs:

I'd like to add a similar plot of the dNLL contribution per bin including a cumulative (per region) line. But this is probably better suited to be added somewhere else, perhaps FitDiagnostics directly, but I think better is probably in PostFitShapesFromWorkspace to avoid jamming everything into FitDiagnostics. I'm open to suggestions.

For future developments, it might be useful to separate the table formatting into some functions which will be more generalizable and reusable. OTOH, if we incorporate more modern tools like pandas, then it is already set up to do things like this.

@nucleosynthesis
Copy link
Contributor

Does this only work for binned (template) based analyses or also parametric and unbinned? If the former, we probably need the tool to halt if it gets something other than that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants