Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LHCb Z 13 TeV dimuon 2022 #1996

Merged
merged 57 commits into from
Jun 5, 2024
Merged

LHCb Z 13 TeV dimuon 2022 #1996

merged 57 commits into from
Jun 5, 2024

Conversation

achiefa
Copy link
Contributor

@achiefa achiefa commented Mar 10, 2024

This PR implements the new measurements from LHCb for Z production at 13 TeV in the new commondata format. Please, observe the following:

  • The keys in the metadata relative to the FK tables will be modified once FK tables will be produced.
  • I'm not sure of the keys in "nnpdf_metadata", so please check if it is correct.
  • Recall that the double distribution in the boson transverse momentum and rapidity has the issue with negative define correlation matrix. Hence, you won't find yaml file for this particular distribution.

@achiefa achiefa self-assigned this Mar 10, 2024
@scarlehoff scarlehoff changed the base branch from new_commondata_utils to master March 11, 2024 07:16
@scarlehoff scarlehoff changed the base branch from master to new_commondata_utils March 11, 2024 07:16
@scarlehoff
Copy link
Member

Hi @achiefa coyld you cherry-pick or rebase the last few commits? (so that the branch contain only your changes).

@achiefa
Copy link
Contributor Author

achiefa commented Mar 11, 2024

Hi @scarlehoff, done. Now the branch history should be clean. I apologise for that - yesterday night I didn't pay much attention on the rebase procedure and I messed it up.

@RoyStegeman
Copy link
Member

Should this be on top of new_commodnata_utils or master? Currently it contains commits that are in master but not in new_commodnata_utils.

@achiefa
Copy link
Contributor Author

achiefa commented Mar 11, 2024

My understanding from the last pheno meeting was to rebase on top of master, but I may be wrong. Should I rebase on top of new_commondata_utils ?

@RoyStegeman
Copy link
Member

I wasn't there so I don't know what was discussed, but rebasing on master indeed seems reasonable to me

@RoyStegeman RoyStegeman changed the base branch from new_commondata_utils to master March 11, 2024 10:28
@RoyStegeman
Copy link
Member

Is there a report (ideally comparefits) that includes this dataset?

@achiefa
Copy link
Contributor Author

achiefa commented Mar 11, 2024

No, there isn't. How can I produce such a report?

Copy link
Member

@scarlehoff scarlehoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment you won't be able to produce a report using this dataset since the theory is missing.

There's a few things to fix though.

@@ -0,0 +1,104 @@
setname: "LHCB_Z_13TEV_DIMUON_2022"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
setname: "LHCB_Z_13TEV_DIMUON_2022"
setname: "LHCB_Z_13TEV_DIMUON_2022"

This can be either Z0 (for Z production) or Z0J (for Z + J production).
you cannot have both processes in the same dataset even if they are coming from the same paper. Please break it down into two distinct datasets.

Also, I'd avoid using 2022 in the name of the dataset unless it is necessary to disambiguate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The paper deals with Z production only. So I'll change it to Z0.

The 2022label I thought was sort of necessary to tell it apart from the older LHCB_Z0 datasets which date back to 2016 and that are already implemented.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ZpT is Z + jet, that's why we have those as Z0J

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but legacy datasets fro the same experiment are Z0. See for instance LHCB_Z0_13TEV. So I suppose I should do the same here, shouldn't I?

Moreover, since the same name is used for the implementation linked above, should I keep the 2022 label?

Copy link
Member

@scarlehoff scarlehoff Mar 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With respect to the 2022 label, I guess we can live with that. But I'd prefer something that includes some information (like for instance the integrated luminosity, since is what we have in a few places already XYPB).

With respect to the process names, at the moment we have no Z+J process in for LHCB so this would be the first one.

Note that the process label Z0 (or Z0J) refers in practice to the leading order process being considered.
For the Z rapidity this corresponds to $p \ p > Z > l^{+} \ l^{-}$ which we are calling Z0 while for the Z pt is instead $p \ p > Z + j$ where the measurement is the pT of the jet (and, if double differencial, pt and rapidity) and $j$ is a jet, and we are calling this one Z0J.

The difference is important since, for instance, this corresponds to distinct processes when computing predictions with matrix / nnlojet / madgraph.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, now I understand. It wasn't clear to me that Z0 and Z0J must be broken down into two different datasets (i.e. two different folders), even if they come from the same measurement. I would suggest to specify that in the documentation, because it's not trivial for a novice (as I am).

RE the 2022 label, I can change it to the integrated luminosity if you wish. Note that the legacy implementation for Z0 does not report this information in the name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation needs a lot of updates ^^U

setname: "LHCB_Z_13TEV_DIMUON_2022"

nnpdf_metadata:
nnpdf31_process: "Z0PT"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
nnpdf31_process: "Z0PT"
nnpdf31_process: "NC DY"

This is probably what you want here. But check the other Z0 and Z0J datasets to check what to write here.
This information is used for the theory covariance matrix so it is important that it matches between datasets.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just checked ATLAS_Z0J_8TEV and LHCB_Z0_13TEV and it seems that nnpdf31_process is DY NC for both Z0 and Z0J.

Anyway, I appreciate that you also provide an explanation for what these values actually do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are used to group datadets. For instance for the th covariance same processes correlate muR and muF while different processes correlate only muF.
Beyond that it is used also for plotting and organizational purposes.

buildmaster/LHCB_Z_13TEV_DIMUON_2022/metadata.yaml Outdated Show resolved Hide resolved
label: '$\frac{d\sigma}{dp_T^Z}$',
units: "[pb]"}
observable_name: PT
process_type: Z0PT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This process type doesn't exist yet.

You will need to add a new process type for this dataset https://github.com/NNPDF/nnpdf/blob/master/validphys2/src/validphys/process_options.py (at the bottom of the module you can find the ones currently implemented)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is not really clear to me. In LHCB_Z0_13TEV, process_type is set to EWK_RAP for the rapidity distribution. However, this does not appear at the bottom of the module that you mentioned. The same holds for ATLAS_Z0J_8TEV, although here for the double distributions in pT. Am I missing something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically old datasets had the kinematic variables set to be always 3 variables and always the same one (for a given process, note that in this case "process" refer to both the physical process and the observable/measurement; so two datasets can have the same nnpdf31_process* and different process_type... it's confusing I know).

Anyway, the process_type basically defined how the variables are read by validphys and autogenerate some plotting labels. In the LHCB datasets this process_type still refers to the old process type because the kinematic variables are the same and written in the same order as before...
I guess we can talk about this on Wednesday but basically the summary is that in order to add (new) DY data you will need to add a new process to that list that is able to understand the kinematic variables that you are using.

file: kinematics_ZpT_y.yaml
data_central: data_ZpT_y.yaml
data_uncertainties: uncertainties_ZpT_y.yaml
kinematic_coverage: [pT, yZ, sqrt_s]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think (but again, check with the other LHCB_Z datasets that @Radonirinaunimi implemented) that for the rapidity the variable to use is y.

buildmaster/LHCB_Z_13TEV_DIMUON_2022/metadata.yaml Outdated Show resolved Hide resolved
@RoyStegeman
Copy link
Member

RoyStegeman commented Mar 11, 2024

At the moment you won't be able to produce a report using this dataset since the theory is missing.

Ah, I assumed that came with the PR. I should have read the first message more carefully.

@RoyStegeman
Copy link
Member

Could you please move this from buildmaster to validphys/datafiles/new_commondata?

@@ -252,6 +252,7 @@ def _displusjet_xq2map(kin_dict):
"HQP_PTQ": HQP_PTQ,
"HERAJET": HERAJET,
"HERADIJET": dataclasses.replace(HERAJET, name="HERADIJET", description="DIS + jj production"),
"DY_Z_Y" : dataclasses.replace(DY_W_ETA, name="DY_Z_Y", description="DY Z -> ll rapidity")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made use of the process type implemented by MC, but I changed name and description. Moreover, note that the accepted kinematic variable in DY_W_ETA is eta. Should I use y instead, since LHCb data is in the boson rapidity?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both are ok. Actually, we could call it also DY_RAP or something like that and unify both with a common description.

But changing these (internal) names are "as easy" as a search & replace.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I rebase on top of master again? Currently DY_W_ETA is not in process_options.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if you need to use it yes.

@@ -50,18 +51,19 @@ implemented_observables:
label: '$\frac{d^2\sigma}{d p_T^Z y^Z}$',
units: "[pb]"}
observable_name: DIMUON-PT-Y
process_type: EWK_PTRAP
process_type: _
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I left blank, because we are not going to use the double differential distribution. But if you wish I can implement the relative process type anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we not going to use it?

(if we know already that we are not going to use it... why do we want to have it implemented?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that the correlation matrix provided by the experimentalists is not positive definite. And ERN hasn't heard back from them for a while, so we cannot use this data set as it is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, can you remove it then?

Just leave it implemented in the filter, but don't output this part.
That way, once we get (if we get it) we have most of the work already done.

Otherwise we risk using it by mistake.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean removing this distribution from the metadata? Observe that this part is not outputted in the yaml files exactly for that reason.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, and don't include the kinematics/data/uncertainties files (which at least in the metadata are referenced)

treatment: MULT
type: CORR
bins:
- Statistical_uncertainty:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a feeling that I misunderstood how uncertainties must be listed in the file. Could please check if they are implemented correctly, @scarlehoff ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The uncertainties need to be listed one per bin:

bins:
- stats: 0.10
   some_name: 11.0
   some_other_name: 143
- stats: 0.20 
   some_name: 143

and so on.
The statistical uncertainty should always be named stat.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I've got correlated statistical uncertainties, so I guess I should write stat_corr_1, stat_corr_2, ..., right?

max: 2.2
sqrts:
min: null
mid: 13
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mid: 13
mid: 13

ah, gotcha! The problem you have is that you are using sqrts in TeV, if you write 13000 that should solve the problem you're seeing.

The same is probably true in The rapidity case as well. Make sure that the units are always GeV.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, really my bad. I've fixed that and now points are all within 1. Thanks for spotting it!

Comment on lines +2 to +5
- pT:
min: 0.0
mid: 1.1
max: 2.2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@enocera I'm a bit worried about the kinematics here. For sure no fixed order calculation will be reliable at pT ~ 0, but I'm not even sure pT ~ 10 will be ok...

Copy link
Contributor

@enocera enocera Mar 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. The cut on pT that we have for the ATLAS and CMS 8 TeV data sets is at 30 GeV, therefore anything lower than that cannot be described by FO pert QCD (but needs resummation). So are you saying that the maximum pT in that measurement is 10 GeV? If this is the case, then we may forget about it until we do pT resummation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the paper we may have, say, 3 bins above pT~30 GeV. Anyways, at this point I'd complete the implementation of the data set - that may be useful for the future.

@scarlehoff scarlehoff marked this pull request as ready for review May 8, 2024 14:09
@achiefa
Copy link
Contributor Author

achiefa commented Jun 5, 2024

I think the PR is now complete. A few comments before merging:

  • I resolved the conflict with the master branch in process_options. Specifically:
    • There is now a single implementation for each of _dyboson_xq2map and _dybosonpt_xq2map. I kept my implementation.
    • process_options now contains two DY-process classes: DY_PT and DY_2L. The specification for a particular process type is done using dataclasses.replace in the PROCESS dictionary. For instance, DY_W_ETA is obtained from DY_2L, and the dataset implemented by @comane is safe.
  • I made the report on the kinematic coverage for both LHCb and CMS (the latter was implemented by Mark). You can find the report here: report
  • I've attached the data-theory comparison at NNLO for the Z0 data set.

output

@scarlehoff
Copy link
Member

Thanks! I think this can be merged now.

I made the report on the kinematic coverage for both LHCb and CMS (the latter was implemented by Mark). You can find the report here: report

This looks good (in that it looks like what I would expect: CMS is central while LHCB accesses the extremes).

@scarlehoff scarlehoff merged commit 6691f56 into master Jun 5, 2024
6 checks passed
@scarlehoff scarlehoff deleted the LHCB_Z_13TEV_DIMUON_2022 branch June 5, 2024 10:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants