Skip to content

Commit

Permalink
Merge pull request #751 from cms-analysis/rebase_742_onto_112x
Browse files Browse the repository at this point in the history
rebase of 742
  • Loading branch information
amarini authored Apr 6, 2022
2 parents df58aa8 + 750d148 commit 23a4bb2
Show file tree
Hide file tree
Showing 2 changed files with 18 additions and 2 deletions.
2 changes: 1 addition & 1 deletion docs/part3/commonstatsmethods.md
Original file line number Diff line number Diff line change
Expand Up @@ -659,7 +659,7 @@ The following algorithms are supported:

- **`AD`**: Compute a goodness-of-fit measure for binned fits using the *Anderson-Darling* test. It is based on the integral of the difference between the cumulative distribution function and the empirical distribution function over all bins. It also gives the tail ends of the distribution a higher weighting.

The output tree will contain a branch called **`limit`** which contains the value of the test-statistic in each toy. You can make a histogram of this test-statistic $t$ and from this distribution ($f(t)$) and the single value obtained in the data ($t_{0}$) you can calculate the p-value $$p = \int_{t=t_{0}}^{\mathrm{+inf}} f(t) dt $$.
The output tree will contain a branch called **`limit`** which contains the value of the test-statistic in each toy. You can make a histogram of this test-statistic $t$ and from this distribution ($f(t)$) and the single value obtained in the data ($t_{0}$) you can calculate the p-value $$p = \int_{t=t_{0}}^{\mathrm{+inf}} f(t) dt $$. Note: in rare cases the test statistic value for the toys can be undefined (for AS and KD), and in this case we set the test statistic value to -1. When plotting the test statistic distribution, those toys should be excluded. This is automatically taken care of if you use the GoF collection script in CombineHarvester described below.

When generating toys, the default behavior will be used. See the section on [toy generation](http://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/part3/runningthetool/#toy-data-generation) for options on how to generate/fit nuisance parameters in these tests. It is recomended to use the *frequentist toys* (`--toysFreq`) when running the **`saturated`** model, and the default toys for the other two tests.

Expand Down
18 changes: 17 additions & 1 deletion src/GoodnessOfFit.cc
Original file line number Diff line number Diff line change
Expand Up @@ -353,7 +353,11 @@ Double_t GoodnessOfFit::EvaluateADDistance(RooAbsPdf& pdf, RooAbsData& data, Roo
observable.setVal(observableval);
// observable.bin
current_cdf_val = cdf->getVal();
empirical_df += d->second/s_data;
if (d->second==0 && s_data ==0){
empirical_df = -1.;
} else {
empirical_df += d->second/s_data;
}

if (plotDir_ && makePlots_) {
hCdf->SetBinContent(bin+1, current_cdf_val);
Expand All @@ -366,6 +370,12 @@ Double_t GoodnessOfFit::EvaluateADDistance(RooAbsPdf& pdf, RooAbsData& data, Roo
std::cout << "Observable: " << observableval << "\tdata: " << d->second << "\tedf: " << empirical_df << "\tcdf: " << current_cdf_val << "\tdistance: " << distance << "\n";
}
if (distance > test_stat) test_stat = distance;
if (empirical_df < 0.){
if(bin<1){
std::cout << "Warning, KS statistic not well defined in absence of data events. Setting test statistic to -1\n";
}
test_stat = empirical_df; //To set negative test stat in case the sum of data entries is 0.
}
}else{
bin_prob = current_cdf_val-last_cdf_val;
distance = s_data*pow((empirical_df-current_cdf_val), 2)/current_cdf_val/(1.-current_cdf_val)*bin_prob;
Expand All @@ -377,6 +387,12 @@ Double_t GoodnessOfFit::EvaluateADDistance(RooAbsPdf& pdf, RooAbsData& data, Roo
}
// from L. Demortier, CDF/ANAL/JET/CDFR/3419
test_stat += distance;
if(empirical_df < 0.){
if(bin<1){
std::cout << "Warning, AD statistic not well defined in absence of data events. Setting test statistic to -1\n";
}
test_stat = empirical_df; //To set negative test stat in case the sum of data entries is 0.
}
}
if (plotDir_ && makePlots_) {
hDiff->SetBinContent(bin+1, distance);
Expand Down

0 comments on commit 23a4bb2

Please sign in to comment.