ThOr Functors
Abstract
- Build upon a basic DaVinci script to include more complex functors.
- Understand some basic principles about these complex functors.
- Know how to do basic calculations with functors.
Minimal Script
In the last section you learnt about lbexec and how to run a minimal DaVinci job. This required a Python file containing a main function and a yaml file to specify the configurables. Below is an example of a minimal script we will build upon in this lesson. You can go ahead and try this example first, just copy the python and yaml file into a directory on lxplus and run the command at the top of the python script.
This new script, rather than just outputting the decay tree to the terminal, will generate an ntuple (ROOT file) containing the actual physics measurement promised in the physics introduction!
The trigger line we will be using for that selects both \(B_s^0\to (D_s^- \to K^- K^+ \pi^-) \pi^+\) and \(B^0\to (D_s^- \to K^- K^+ \pi^-) \pi^+\). We will need to include a decay descriptor in our DaVinci file, and it needs to match the one used when the trigger line was written.
The DST has a lot of information inside it. Besides many different events, from which only some would be candidates for our decay, it also has a lot of physics information about the collected (or simulated) data. To include that information on our ROOT file we use objects called Functors. Functors can also compute additional information based on the vertex fit, for instance.
The following DaVinci script will go through a DST of simulated
\(B_s^0\to (D_s^- \to K^- K^+ \pi^-) \pi^+\),
select information from both the daughters and their respective mothers
(specifically, the mass, the total momentum and the transverse momentum)
and write a TTree Hlt2B2OC_BdToDsmPi_DsmToKpKmPim/DecayTree
into a ROOT file named tuple.root
using the class Funtuple
. Take a look:
Running this script
Remember to change the name of your python file in the command (DV_basic.py
here) and similarly for your yaml (tuple_options.yaml
here)!
detdesc
vs DD4hep
One thing to notice here is that, when running MC, the platform on which the DaVinci version runs needs to be indicated (specified with the -c
argument).
In this case, it must contain +detdesc
at the end. This is because the geometry of the detector
and the conditions of data-taking are stored in Run 3 in two different places.
detdesc
is an older method, still used by MC (the Simulation group are working on updating this).DD4hep
is the more modern method, used in data, and the default for LHCb software if the platform is not specified.
'''
lb-run -c x86_64_v3-el9-gcc13+detdesc-opt+g DaVinci/v65r0 lbexec DV_basic:main tuple_options.yaml
'''
import Functors as F
import FunTuple.functorcollections as FC
from DaVinci import Options, make_config
from DaVinci.algorithms import create_lines_filter
from FunTuple import FunctorCollection
from FunTuple import FunTuple_Particles as Funtuple
from PyConf.reading import get_particles, get_pvs
from RecoConf.event_filters import require_pvs
def main(options: Options):
line = "Hlt2B2OC_BdToDsmPi_DsmToKpKmPim"
data = get_particles(f"/Event/HLT2/{line}/Particles")
line_prefilter = create_lines_filter(name=f"PreFilter_{line}", lines=[line])
pvs = get_pvs()
fields = {
"Bs": "[B0 -> (D_s- -> K- K+ pi-) pi+]CC", #Note here, we call the parent a B0 even though it's a Bs, because it needs to match the reconstruction in the trigger line
"Ds": "[B0 -> ^(D_s- -> K- K+ pi-) pi+]CC",
"Bs_pi": "[B0 -> (D_s- -> K- K+ pi-) ^pi+]CC",
"Ds_pi": "[B0 -> (D_s- -> K- K+ ^pi-) pi+]CC",
"Km": "[B0 -> (D_s- -> ^K- K+ pi-) pi+]CC",
"Kp": "[B0 -> (D_s- -> K- ^K+ pi-) pi+]CC",
}
all_vars = FunctorCollection({
"M": F.MASS,
"P": F.P,
"PT": F.PT
})
variables = {"ALL": all_vars}
funtuple = Funtuple(
name=line,
tuple_name="DecayTree",
fields=fields,
variables=variables,
inputs=data,
)
algs = {line: [line_prefilter, require_pvs(pvs), funtuple]}
return make_config(options, algs)
input_files:
- root://eoslhcb.cern.ch//eos/lhcb/wg/dpa/wp7/Run3SK/exampleDST/00257716_00000011_1.hlt2.dst
input_type: ROOT
output_type: ROOT
input_raw_format: 0.5
simulation: true
input_process: Hlt2
input_stream: b2oc
conddb_tag: sim10-2024.W37.39-v00.00-md100
dddb_tag: dddb-20240427
lumi: false
data_type: '2024'
evt_max: 1000
print_freq: 100
ntuple_file: tuple.root
These were explained in the last section so now we will focus on including some more functors that might be useful for an analysis.
Note on ThOr functors
You might see LoKi functors in older documentation, these were the previous functors used in DaVinci Run 1 and 2. Now we have ThOr functors that stand for Throughput Oriented functors. Detailed documentation on how these functors work can be found here.
Functors
There is a detailed list of all documented functors here. There is also a great glossary resource found here. Below are some explanations of some commonly used functors that are vital for any analysis at LHCb.
Impact Parameter (IP)
The IP is the shortest distance between a track and some vertex (e.g. PV).
Impact Parameter Chi Squared (IPCHI2)
It provides a quantitative measure of the likelihood that the track originated from the vertex, taking into account the uncertainties associated with the impact parameter measurements. For example, if IPCHI2 = 0, the track is exactly consistent with originating from that vertex. When IPCHI2 gets larger, it indicates a larger inconsistency between the track and the vertex.
IPCHI2 fineprint
The IPCHI2 is the increase of \(\chi^2\) of the vertex fit with and without the considered particle. It behaves approximately like the square of IP over its uncertainty, but is more stable. Consider the IP and its uncertainty for a track that was included in the fit of the primary vertex. There's an anti-correlation between the IP and the PV position; the track pulls the PV towards it. Now take a particle that was not included in the fit (because further away, or because it's a composite particle like your B). Here there's no such correlation. The quantity IP/uncertainty has thus different meanings depending on how the track was used. IPCHI2 doesn't have this problem: It's always comparing the PV with and without the particle.
Distance Of Closest Approach
This gives the shortest distance between a pair of tracks. So for example our decay of \(B_{s}^0 \to (D_{s}^- \to K^- K^+ \pi^-) \pi^+\) we might be interested in knowing the distance of closest approach between the children that form our \(D_{s}\) candidate or between the other \(\pi\). Whilst theoretically this should be 0 (if they came from the same decay vertex), errors due to extrapolation, resolution and noise can give small DOCAs.
DIRection Angle (DIRA)
This is the angle between a line drawn from the PV to the decay vertex and the sum of the 4 momentum of its decay products. The F.DIRA
functor actually returns cos(DIRA), because if cos(DIRA) ≈ 1 then there is a better pointing of momentum back to PV which means it's more likely to be a real B candidate.
PID Variables
We have two main types of PID variables that we use, more basic functors given by:
F.PID_P
-> Proton identificationF.PID_K
-> Kaon identificationF.PID_E
-> Electron identificationF.PID_MU
-> Muon identificationF.PID_PI
-> Pion identification
These PID variables are explained nicely here. The basic idea is that each of the three subdetectors that provide PID information (RICH, CALO, MUON system) provides a log likelihood of a track being of a certain species. These are multiplied together to form one log likelihood. The difference of this value for each particle you are trying to identify, with respect to the pion mass hypothesis, is calculated. These are known as DLL's (Delta Log Likelihoods). So PID_K
compares the log likelihood for a kaon with respect to the pion mass hypothesis etc.
There are also more advanced PID functors called PROBNN
s.
To expand further than the simpler log likelihood approach of PID variables, PROBNN
s were developed.
These also take into account correlations between detector subsystems and tracking.
An MVA is trained with these features to produce probabilities for each particle.
These come in the form of
F.PROBNN_K
F.PROBNN_PI
F.PROBNN_P
F.PROBNN_E
F.PROBNN_MU
F.PROBNN_GHOST
These are more powerful PID variables that are commonly made use of in an analysis.
GHOSTs
A real track in LHCb should correspond to an actual particle. Its hits should align well and point back to a physics vertex. Unfortunately this isn't always the case and even after running your trigger line over the data and tupling we get these so called ghost tracks. These are formed from a random combination of hits within the sub-detectors and could be from hits by multiple particles or even noise. These don't correspond to a real particle so using functors such as F.PROBNN_GHOST
can be extremely useful to help remove background in your sample.
Functor Calculations
Not only can we just apply these functors as they are, we can also do calculations with them. For example we can use the functors:
F.MAX
F.MIN
F.SUM
to perform calculations of other variables in DaVinci. This means you end up with it all calculated in your tuple as an extra branch. For example if you wanted to get the maximum PT of the daughters from one of your composite particles you could pass:
"MAX_PT": F.MAX(F.PT)
and apply this to either the \(B_{s}\) or \(D_{s}\).
There is also an important structure you will see such as :
"NHITS": F.VALUE_OR(-1) @ F.NHITS @ F.TRACK
This should be read from right to left. What this is saying is we use the functor F.TRACK
which gets us our track object, then applies F.NHITS
upon that track object to give us the number of hits for each track. Finally there is F.VALUE_OR(-1)
which just means if this F.NHITS
on this F.TRACK
doesn't return a value we will give it a value of -1.
Functor Operators
You can also use operators such as & (logical AND) to require multiple conditions to be true. Additionally, you can use comparison operators such as < (less than) and > (greater than) to specify numerical thresholds. Below is an example of using F.NINTREE
to count the number of basic particle daughters in a decay tree that:
- Have an associated track (
F.TRACK
). - Have transverse momentum (
PT
) greater than 1500 MeV.
"N_HIGHPT_TRACKS": F.NINTREE(F.ISBASICPARTICLE & (F.HAS_VALUE @ F.TRACK) & (F.PT > 1500)),
Example DaVinci Script
We will now demonstrate some of these new functors in action. We can use the same yaml file from before and here is an example Python script incorporating some of these new functors and some brief definitions of each one aside them.
A note about units
By default in the LHCb software we use MeV, nano seconds and millimetres!
The only other change in this script below is how we assign different functors to each particle in our decay.
Previously we just used this all_vars
collection that just applied F.MASS
, F.P
and F.PT
to all of the particles in our decay.
But we might have functors we only want to apply to composite particles such as the \(B_{s}\) or the \(D_{s}\).
You can appreciate you would want to apply the DOCA variables to these since it returns the distance of closest approach of it's daughters.
Equally we might have variables we only want to apply to tracks such as PROBNN
variables since it is the kaons, pions, protons etc that actually leave the hits down the detector.
So we now have 4 functor collections: all_vars
, base_composite_variables
, Ds_extra_doca
and track_variables
.
We then apply these in our variables collection like so:
variables = {
"ALL": all_vars, # Variables for all particles
"Bs": base_composite_variables, # Variables specific to Bs
"Ds": base_composite_variables + Ds_extra_doca, # Variables specific to Ds
"Bs_pi": track_variables, # Variables for pion
"Ds_pi": track_variables, # Variables for pion
"Kp": track_variables, # Variables for kaon
"Km": track_variables, # Variables for kaon
}
ALL
category that applies this collection to all of the particles in our candidate!
Your Turn!
Now it's your turn to run this updated Python script with the previous yaml file. Make sure you are happy with the functors being used and try and add some more from the functor documentation here. You can use the same yaml file as before and just update the python file like below.
OWNPV vs BPV
You may see other examples using some functor such as F.BPVFD(pvs)
compared to F.OWNPVFD
. BPV just stands for Best Primary Vertex and then the functor needs you to pass in the list of pvs. OWNPV is equivalent but has been shown to save CPU time by saving you passing the PV list in.
BPV
is generally considered deprecated and should be avoided in lieu of OWNPV
generally.
'''
lb-run -c x86_64_v3-el9-gcc13+detdesc-opt+g DaVinci/v65r0 lbexec DV_intermediate:main tuple_options.yaml
'''
import Functors as F
import FunTuple.functorcollections as FC
from DaVinci import Options, make_config
from DaVinci.algorithms import create_lines_filter
from FunTuple import FunctorCollection
from FunTuple import FunTuple_Particles as Funtuple
from PyConf.reading import get_particles, get_pvs
from RecoConf.event_filters import require_pvs
def main(options: Options):
line = "Hlt2B2OC_BdToDsmPi_DsmToKpKmPim"
data = get_particles(f"/Event/HLT2/{line}/Particles")
line_prefilter = create_lines_filter(name=f"PreFilter_{line}", lines=[line])
pvs = get_pvs()
fields = {
"Bs": "[B0 -> (D_s- -> K- K+ pi-) pi+]CC",
"Ds": "[B0 -> ^(D_s- -> K- K+ pi-) pi+]CC",
"Bs_pi": "[B0 -> (D_s- -> K- K+ pi-) ^pi+]CC",
"Ds_pi": "[B0 -> (D_s- -> K- K+ ^pi-) pi+]CC",
"Km": "[B0 -> (D_s- -> ^K- K+ pi-) pi+]CC",
"Kp": "[B0 -> (D_s- -> K- ^K+ pi-) pi+]CC",
}
all_vars = FunctorCollection({
"M": F.MASS,
"P": F.P,
"PT": F.PT,
"ENERGY": F.ENERGY,
"PX": F.PX,
"PY": F.PY,
"PZ": F.PZ,
"ID": F.PARTICLE_ID, # PDG ID of the particle
"Q": F.CHARGE, # Electric charge
"ETA": F.ETA, # Pseudorapidity
"PHI": F.PHI, # Azimuthal angle
"CHI2": F.CHI2, # χ²
"CHI2DOF": F.CHI2DOF, # χ² of degrees of freedom
"OWNPVIP": F.OWNPVIP, # Impact parameter wrt own PV
"OWNPVIPCHI2": F.OWNPVIPCHI2, # Impact parameter χ² wrt own PV
})
base_composite_variables = FunctorCollection({
"VTXCHI2NDOF": F.CHI2DOF, # Vertex fit χ²/ndf
"END_VX": F.END_VX, # x-coordinate of decay vertex
"END_VY": F.END_VY, # y-coordinate of decay vertex
"END_VZ": F.END_VZ, # z-coordinate of decay vertex
# OWNPV values
"OWNPV_X": F.OWNPVX, # x-coordinate of best PV
"OWNPV_Y": F.OWNPVY, # y-coordinate of best PV
"OWNPV_Z": F.OWNPVZ, # z-coordinate of best PV
"OWNPV_DIRA": F.OWNPVDIRA, # Direction angle cosine wrt own PV
"OWNPV_FD": F.OWNPVFD, # Flight distance wrt own PV
"OWNPV_FDCHI2": F.OWNPVFDCHI2, # Flight distance χ² wrt own PV
"OWNPV_VDRHO": F.OWNPVVDRHO, # Radial flight distance wrt own PV
"OWNPV_VDZ": F.OWNPVVDZ, # z-direction flight distance
"OWNPV_LTIME": F.OWNPVLTIME, # Proper lifetime
"OWNPV_DLS": F.OWNPVDLS, # Decay length significance
# DOCA
"DOCA12": F.DOCA(1, 2), # DOCA between first and second daughter
"DOCA12CHI2": F.DOCACHI2(1, 2), # DOCA χ² between first and second daughter
# Daughter Max, Min and Sums
"MAX_PT": F.MAX(F.PT), # Maximum PT of daughters
"MIN_PT": F.MIN(F.PT), # Minimum PT of daughters
"SUM_PT": F.SUM(F.PT), # Sum of daughters' PT
"MAX_P": F.MAX(F.P), # Maximum momentum of daughters
"MIN_P": F.MIN(F.P), # Minimum momentum of daughters
"SUM_P": F.SUM(F.P), # Sum of daughters' momentum
"MAX_OWNPVIPCHI2": F.MAX(F.OWNPVIPCHI2), # Max IP χ² of daughters
"MIN_OWNPVIPCHI2": F.MIN(F.OWNPVIPCHI2), # Min IP χ² of daughters
"SUM_OWNPVIPCHI2": F.SUM(F.OWNPVIPCHI2), # Sum of daughters' IP χ²
"MAXDOCACHI2": F.MAXDOCACHI2, # Maximum DOCA χ² between any daughters
"MAXDOCA": F.MAXDOCA, # Maximum DOCA between any daughters
"MAXSDOCACHI2": F.MAXSDOCACHI2, # Maximum signed DOCA χ²
"MAXSDOCA": F.MAXSDOCA, # Maximum signed DOCA
})
# Need extra DOCA combinations since Ds decays to 3 final state particles
Ds_extra_doca = FunctorCollection({
"DOCA13": F.DOCA(1, 3),
"DOCA23": F.DOCA(2, 3),
"DOCA13CHI2": F.DOCACHI2(1, 3),
"DOCA23CHI2": F.DOCACHI2(2, 3),
})
track_variables = FunctorCollection({
# Standard PID
"PIDp": F.PID_P, # Proton PID likelihood
"PIDK": F.PID_K, # Kaon PID likelihood
"PIDPi": F.PID_PI, # Pion PID likelihood
"PIDe": F.PID_E, # Electron PID likelihood
"PIDmu": F.PID_MU, # Muon PID likelihood
# PROBNNs
"PROBNN_pi": F.PROBNN_PI, # Neural net probability of being a pion
"PROBNN_p": F.PROBNN_P, # Neural net probability of being a proton
"PROBNN_K": F.PROBNN_K, # Neural net probability of being a kaon
"PROBNN_e": F.PROBNN_E, # Neural net probability of being an electron
"PROBNN_mu": F.PROBNN_MU, # Neural net probability of being a muon
"PROBNN_GHOST": F.PROBNN_GHOST, # Neural net probability of being a ghost track
"ISMUON": F.ISMUON, # Boolean: is it identified as a muon 0 or 1?
# Additional track related info
# F.TRACK gets the track object
# F.NDOF gets the NDOF for that track
# F.VALUE_OR(-1) means if no value exists return -1 instead of failing
"TRNDOF": F.VALUE_OR(-1) @ F.NDOF @ F.TRACK, # NDOF in track fit, if this is higher then more hits used in fit
"NHITS": F.VALUE_OR(-1) @ F.NHITS @ F.TRACK, # Total number of hits in all detectors
"NVPHITS": F.VALUE_OR(-1) @ F.NVPHITS @ F.TRACK, # Total number of hits in VELO phi sensors
"NUTHITS": F.VALUE_OR(-1) @ F.NUTHITS @ F.TRACK, # Total number of hits in UT
"NFTHITS": F.VALUE_OR(-1) @ F.NFTHITS @ F.TRACK, # Total number of hits in Fibre Tracker (SciFi)
"TRACKHISTORY": F.VALUE_OR(-1) @ F.TRACKHISTORY @ F.TRACK, # Track reconstruction history
})
variables = {
"ALL": all_vars, # Variables for all particles
"Bs": base_composite_variables, # Variables specific to Bs
"Ds": base_composite_variables + Ds_extra_doca, # Variables specific to Ds
"Bs_pi": track_variables, # Variables for pion
"Ds_pi": track_variables, # Variables for pion
"Kp": track_variables, # Variables for kaon
"Km": track_variables, # Variables for kaon
}
funtuple = Funtuple(
name=line,
tuple_name="DecayTree",
fields=fields,
variables=variables,
inputs=data,
)
algs = {line: [line_prefilter, require_pvs(pvs), funtuple]}
return make_config(options, algs)
This is just one example of running DaVinci but there is a whole DaVinci tutorial section that can be found here. We can then checkout the root file.
Open up the root file using root:
root -l tuple.root
Navigate to the name of our tree:
Hlt2B2OC_BdToDsmPi_DsmToKpKmPim->cd()
List trees:
.ls
Print out the leaves:
DecayTree->GetListOfLeaves()->Print()
Practise plotting some variables:
DecayTree->Draw("Bs_M")
VS Code Extension
If you use VS Code you can also use an extension called "ROOT File Viewer" which works nicely for looking at small root files!
This should produce you a plot looking something like this (where for this example we've increased the number of candidates in the options yaml file):
Awesome! If you have that working then you can move to the next section on functor collections!