Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tutorial ABCD method with RooParamtricHist #1002

Merged

Conversation

cesarecazzaniga
Copy link
Contributor

Pull request to propose a small tutorial to illustrate the application of RooParametricHist for a per-bin ABCD method in Combine.

Material added in the PR:

documentation for the tutorial
codes to generate input datasets, workspaces and cards to run the tutorial

To generate your own input data, run:

```
python utils/produce_input_histograms_and_analyse.py
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be python3 (unfortunately, some systems will still default to python 2, for which this won't work)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated with python3 command in instructions

To run the workspace creation script:

```
python utils/create_workspace.py -m 1500
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about pyhton3.

But also there seems to be a small mismatch between this script (which looks for files under ./generated_histograms/ and the produce_input_histograms_and_analyze.py script, which creates them in the current working directory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have harmonized where files are saved and fetched in the code such that this is automatised and users have not to specify the path

print ("Reading histogram: ", hist_nameC)
print ("Reading histogram: ", hist_nameD)
histA = input_file.Get(hist_nameA)
histA.SetDirectory(0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm getting a crash here when running over the signal file, because it seems the histogram is saved as A/h_sgn_mPhi_1500_A, but the script is trying to read it as A/h_sgn_A.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

solved issue

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still seeing the same issue. I don't see a change in either the name of the root histograms that are produced in the previous step or the ones that are checked for here. Did I miss an update, or maybe it didn't get committed?



```
python utils/create_datacards.py -m 1500
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python3

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


```

The datacards can be combined then using the usual command:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for these commands, they will first have to change directory into example_analysis/datacards/mPhi1500/ so this should either be included in the tutorial, or the scripts modified to create them in the working directory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have added commands

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see these added either. Are they pushed?


```

Using the output ```higgsCombineTest.FitDiagnostics.mH1500.root```, one can run the script ```$CMSSW_BASE/src/HiggsAnalysis/CombinedLimit/test/mlfitNormsToText.py``` to get the predictions for the normalizations.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this script needs the fitDiagnosticsTest.root output file, rather than the one listed here. It might also be helpful to give the full command explicitlly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added in the tutorial text

@cesarecazzaniga
Copy link
Contributor Author

I went through your comments, and i have modified the tutorial accordingly. First, in the generation of the files i have fixed the seed such that we should all get the same results. Second, i have noticed a small issue in the datacards generation code that was creating the large r fitted value you were noticing, and i fixed it. I have added the results one should get to the tutorial and the new plots. Moreover i have added, as suggested, the commands to really reproduce each step.

Copy link
Collaborator

@kcormi kcormi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Cesare! Some of the issues are fixed, but I still had a few issues when running through the tutorial to be fixed.

## Generate input data
<a id="inputs"></a>

The histograms for the $z$ observable in the different regions A,B,C,D can be produced using the ```produce_input_histograms_and_analyse.py``` script in ```utils/produce_input_histograms_and_analyse.py```. In the script the expected rates for different signal hypotheses (as a function of $\Phi$ mass $m_{\Phi} \in \{1500, 2000, 3000, 4000, 5000 \}$ GeV) and the background yields are specified, as well as the distributions in $x,y,z$ of the signals and backgrounds. In the following steps of the tutorial we will just consider one of the mass points generated, $m_{\Phi} = 1500$ GeV, but the same analysis can be run separatelly on other mass points as well. In $x,y$, the signal and the background are assumed to be distributed as multivariate gaussians, with the background centred at $(0,2,0.2)$ in $(x,y)$ while the signals centred in the upper-right corner of the plane ($x,y>0.5$). For the $z$ feature, the background and the signal distributions are sampled from an exponential, for the signal the tails of the exponential get enhanced with the mass parameter $m_{\Phi}$.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

separatelly -> separately

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

print ("Reading histogram: ", hist_nameC)
print ("Reading histogram: ", hist_nameD)
histA = input_file.Get(hist_nameA)
histA.SetDirectory(0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still seeing the same issue. I don't see a change in either the name of the root histograms that are produced in the previous step or the ones that are checked for here. Did I miss an update, or maybe it didn't get committed?


```

The datacards can be combined then using the usual command:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see these added either. Are they pushed?

</details>


Using the output ```fitDiagnosticsTest.mH1500.root```, one can run the script ```$CMSSW_BASE/src/HiggsAnalysis/CombinedLimit/test/mlfitNormsToText.py``` to get the predictions for the normalizations:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output file in this case does not include the mass parameter, its just fitDiagnosticsTest.root

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is strange, since i get exactly that outfile name running this command: combine -M FitDiagnostics combined_mPhi_1500_2018.txt -m 1500 --saveShapes --saveWithUncertainties --saveNormalizations. I would expect the .mH1500. part since the mass parameter is specified

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have updated, now is consistent with what you also get in terms of filename (was running with an older version)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure? The filenames produced for the files which contain the limit tree (which do always contain the mass parameter, but is not relevant here) do not follow the same output formatting as the fitDiagnostic file which saves the shapes and RooFitResults. The higgsCombineTest.FitDiagnostics.mH1500.root file should also be produced but that just contains the limit tree, the fitDiagnosticsTest.root file doesn't rely on the mass value and only uses the -n argument.

Moreover, you can run the script in ```$CMSSW_BASE/src/HiggsAnalysis/CombinedLimit/data/tutorials/longexercise/postFitPlot.py``` to get pre-fit and post-fit plots in the signal region (in the combined datacard ```ch4```):

```
python3 $CMSSW_BASE/src/HiggsAnalysis/CombinedLimit/data/tutorials/longexercise/postFitPlot.py --input_file fitDiagnosticsTest.mH1500.root --shape_type <shapes_type> --region <region>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same filename issue as above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now changed, i was running with a slightly older version of combine where the mass was added in the name of the file

@cesarecazzaniga
Copy link
Contributor Author

added uncommitted changes for create-workspace.py, updated documentation to match naming of files from most recent version of combine, updating results to match the ones from the latest version

Copy link
Collaborator

@kcormi kcormi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Cesare! This is working for me now. I have one remaining minor comment. After that, I think this can be merged unless any one else has some comments.

It might be nice at some point to expand this to e.g. show how to setup different mass points with a single datacard using keywords. But I don't think that those kinds of future developments should stand in the way of including this as is for now.


```

The datacards will be created in the directory ```example_analysis/datacards/mPhi1500/``` inside the tutorial directory. The datacards can be combined then using the usual command:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just make sure to add a statement to go change into this directory (or add the cd command below, or both).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have updated the text as you suggested by adding explicit instruction to enter the directory

@kcormi kcormi merged commit a3e3b6c into cms-analysis:main Sep 26, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants