Skip to content

ThOr Functors

Abstract

  • Build upon a basic DaVinci script to include more complex functors.
  • Understand some basic principles about these complex functors.
  • Know how to do basic calculations with functors.

Minimal Script

In the last section you learnt about lbexec and how to run a minimal DaVinci job. This required a Python file containing a main function and a yaml file to specify the configurables. Below is an example of a minimal script we will build upon in this lesson. You can go ahead and try this example first, just copy the python and yaml file into a directory on lxplus and run the command at the top of the python script.

This new script, rather than just outputting the decay tree to the terminal, will generate an ntuple (ROOT file) containing the actual physics measurement promised in the physics introduction!

The trigger line we will be using for that selects both \(B_s^0\to (D_s^- \to K^- K^+ \pi^-) \pi^+\) and \(B^0\to (D_s^- \to K^- K^+ \pi^-) \pi^+\). We will need to include a decay descriptor in our DaVinci file, and it needs to match the one used when the trigger line was written.

The DST has a lot of information inside it. Besides many different events, from which only some would be candidates for our decay, it also has a lot of physics information about the collected (or simulated) data. To include that information on our ROOT file we use objects called Functors. Functors can also compute additional information based on the vertex fit, for instance.

The following DaVinci script will go through a DST of simulated \(B_s^0\to (D_s^- \to K^- K^+ \pi^-) \pi^+\), select information from both the daughters and their respective mothers (specifically, the mass, the total momentum and the transverse momentum) and write a TTree Hlt2B2OC_BdToDsmPi_DsmToKpKmPim/DecayTree into a ROOT file named tuple.root using the class Funtuple. Take a look:

Running this script

Remember to change the name of your python file in the command (DV_basic.py here) and similarly for your yaml (tuple_options.yaml here)!

detdesc vs DD4hep

One thing to notice here is that, when running MC, the platform on which the DaVinci version runs needs to be indicated (specified with the -c argument). In this case, it must contain +detdesc at the end. This is because the geometry of the detector and the conditions of data-taking are stored in Run 3 in two different places.

  • detdesc is an older method, still used by MC (the Simulation group are working on updating this).
  • DD4hep is the more modern method, used in data, and the default for LHCb software if the platform is not specified.
'''
lb-run -c x86_64_v3-el9-gcc13+detdesc-opt+g DaVinci/v65r0 lbexec DV_basic:main tuple_options.yaml
'''
import Functors as F
import FunTuple.functorcollections as FC
from DaVinci import Options, make_config
from DaVinci.algorithms import create_lines_filter
from FunTuple import FunctorCollection
from FunTuple import FunTuple_Particles as Funtuple
from PyConf.reading import get_particles, get_pvs
from RecoConf.event_filters import require_pvs


def main(options: Options):
    line = "Hlt2B2OC_BdToDsmPi_DsmToKpKmPim"
    data = get_particles(f"/Event/HLT2/{line}/Particles")
    line_prefilter = create_lines_filter(name=f"PreFilter_{line}", lines=[line])
    pvs = get_pvs()

    fields = {
        "Bs": "[B0 -> (D_s- -> K- K+ pi-) pi+]CC", #Note here, we call the parent a B0 even though it's a Bs, because it needs to match the reconstruction in the trigger line
        "Ds": "[B0 -> ^(D_s- -> K- K+ pi-) pi+]CC",
        "Bs_pi": "[B0 -> (D_s- -> K- K+ pi-) ^pi+]CC",
        "Ds_pi": "[B0 -> (D_s- -> K- K+ ^pi-) pi+]CC",
        "Km": "[B0 -> (D_s- -> ^K- K+ pi-) pi+]CC",
        "Kp": "[B0 -> (D_s- -> K- ^K+ pi-) pi+]CC",
    }

    all_vars = FunctorCollection({
        "M": F.MASS,
        "P": F.P,
        "PT": F.PT
    })

    variables = {"ALL": all_vars}

    funtuple = Funtuple(
        name=line,
        tuple_name="DecayTree",
        fields=fields,
        variables=variables,
        inputs=data,
    )

    algs = {line: [line_prefilter, require_pvs(pvs), funtuple]}

    return make_config(options, algs)
input_files:
- root://eoslhcb.cern.ch//eos/lhcb/wg/dpa/wp7/Run3SK/exampleDST/00257716_00000011_1.hlt2.dst
input_type: ROOT
output_type: ROOT
input_raw_format: 0.5
simulation: true
input_process: Hlt2
input_stream: b2oc
conddb_tag: sim10-2024.W37.39-v00.00-md100
dddb_tag: dddb-20240427
lumi: false
data_type: '2024'
evt_max: 1000
print_freq: 100
ntuple_file: tuple.root

These were explained in the last section so now we will focus on including some more functors that might be useful for an analysis.

Note on ThOr functors

You might see LoKi functors in older documentation, these were the previous functors used in DaVinci Run 1 and 2. Now we have ThOr functors that stand for Throughput Oriented functors. Detailed documentation on how these functors work can be found here.

Functors

There is a detailed list of all documented functors here. There is also a great glossary resource found here. Below are some explanations of some commonly used functors that are vital for any analysis at LHCb.

Impact Parameter (IP)

The IP is the shortest distance between a track and some vertex (e.g. PV). Schematic diagram of the IP

Impact Parameter Chi Squared (IPCHI2)

It provides a quantitative measure of the likelihood that the track originated from the vertex, taking into account the uncertainties associated with the impact parameter measurements. For example, if IPCHI2 = 0, the track is exactly consistent with originating from that vertex. When IPCHI2 gets larger, it indicates a larger inconsistency between the track and the vertex.

IPCHI2 fineprint

The IPCHI2 is the increase of \(\chi^2\) of the vertex fit with and without the considered particle. It behaves approximately like the square of IP over its uncertainty, but is more stable. Consider the IP and its uncertainty for a track that was included in the fit of the primary vertex. There's an anti-correlation between the IP and the PV position; the track pulls the PV towards it. Now take a particle that was not included in the fit (because further away, or because it's a composite particle like your B). Here there's no such correlation. The quantity IP/uncertainty has thus different meanings depending on how the track was used. IPCHI2 doesn't have this problem: It's always comparing the PV with and without the particle.

Distance Of Closest Approach

This gives the shortest distance between a pair of tracks. So for example our decay of \(B_{s}^0 \to (D_{s}^- \to K^- K^+ \pi^-) \pi^+\) we might be interested in knowing the distance of closest approach between the children that form our \(D_{s}\) candidate or between the other \(\pi\). Whilst theoretically this should be 0 (if they came from the same decay vertex), errors due to extrapolation, resolution and noise can give small DOCAs.

Schematic diagram of the DOCA

DIRection Angle (DIRA)

This is the angle between a line drawn from the PV to the decay vertex and the sum of the 4 momentum of its decay products. The F.DIRA functor actually returns cos(DIRA), because if cos(DIRA) ≈ 1 then there is a better pointing of momentum back to PV which means it's more likely to be a real B candidate.

Schematic diagram of the DIRA

PID Variables

We have two main types of PID variables that we use, more basic functors given by:

  • F.PID_P -> Proton identification
  • F.PID_K -> Kaon identification
  • F.PID_E -> Electron identification
  • F.PID_MU -> Muon identification
  • F.PID_PI -> Pion identification

These PID variables are explained nicely here. The basic idea is that each of the three subdetectors that provide PID information (RICH, CALO, MUON system) provides a log likelihood of a track being of a certain species. These are multiplied together to form one log likelihood. The difference of this value for each particle you are trying to identify, with respect to the pion mass hypothesis, is calculated. These are known as DLL's (Delta Log Likelihoods). So PID_K compares the log likelihood for a kaon with respect to the pion mass hypothesis etc.

There are also more advanced PID functors called PROBNNs. To expand further than the simpler log likelihood approach of PID variables, PROBNNs were developed. These also take into account correlations between detector subsystems and tracking. An MVA is trained with these features to produce probabilities for each particle. These come in the form of

  • F.PROBNN_K
  • F.PROBNN_PI
  • F.PROBNN_P
  • F.PROBNN_E
  • F.PROBNN_MU
  • F.PROBNN_GHOST

These are more powerful PID variables that are commonly made use of in an analysis.

GHOSTs

A real track in LHCb should correspond to an actual particle. Its hits should align well and point back to a physics vertex. Unfortunately this isn't always the case and even after running your trigger line over the data and tupling we get these so called ghost tracks. These are formed from a random combination of hits within the sub-detectors and could be from hits by multiple particles or even noise. These don't correspond to a real particle so using functors such as F.PROBNN_GHOST can be extremely useful to help remove background in your sample.

Functor Calculations

Not only can we just apply these functors as they are, we can also do calculations with them. For example we can use the functors:

F.MAX
F.MIN
F.SUM

to perform calculations of other variables in DaVinci. This means you end up with it all calculated in your tuple as an extra branch. For example if you wanted to get the maximum PT of the daughters from one of your composite particles you could pass:

"MAX_PT": F.MAX(F.PT)

and apply this to either the \(B_{s}\) or \(D_{s}\).

There is also an important structure you will see such as :

"NHITS": F.VALUE_OR(-1) @ F.NHITS @ F.TRACK

This should be read from right to left. What this is saying is we use the functor F.TRACK which gets us our track object, then applies F.NHITS upon that track object to give us the number of hits for each track. Finally there is F.VALUE_OR(-1) which just means if this F.NHITS on this F.TRACK doesn't return a value we will give it a value of -1.

Functor Operators

You can also use operators such as & (logical AND) to require multiple conditions to be true. Additionally, you can use comparison operators such as < (less than) and > (greater than) to specify numerical thresholds. Below is an example of using F.NINTREE to count the number of basic particle daughters in a decay tree that:

  • Have an associated track (F.TRACK).
  • Have transverse momentum (PT) greater than 1500 MeV.
"N_HIGHPT_TRACKS": F.NINTREE(F.ISBASICPARTICLE & (F.HAS_VALUE @ F.TRACK) & (F.PT > 1500)),

Example DaVinci Script

We will now demonstrate some of these new functors in action. We can use the same yaml file from before and here is an example Python script incorporating some of these new functors and some brief definitions of each one aside them.

A note about units

By default in the LHCb software we use MeV, nano seconds and millimetres!

The only other change in this script below is how we assign different functors to each particle in our decay. Previously we just used this all_vars collection that just applied F.MASS, F.P and F.PT to all of the particles in our decay. But we might have functors we only want to apply to composite particles such as the \(B_{s}\) or the \(D_{s}\). You can appreciate you would want to apply the DOCA variables to these since it returns the distance of closest approach of it's daughters. Equally we might have variables we only want to apply to tracks such as PROBNN variables since it is the kaons, pions, protons etc that actually leave the hits down the detector. So we now have 4 functor collections: all_vars, base_composite_variables, Ds_extra_doca and track_variables. We then apply these in our variables collection like so:

variables = {
    "ALL": all_vars,                                    # Variables for all particles
    "Bs": base_composite_variables,                     # Variables specific to Bs
    "Ds": base_composite_variables + Ds_extra_doca,     # Variables specific to Ds 
    "Bs_pi": track_variables,                           # Variables for pion
    "Ds_pi": track_variables,                           # Variables for pion 
    "Kp":    track_variables,                           # Variables for kaon 
    "Km":    track_variables,                           # Variables for kaon 
}
We specify the particle in our decay and which collections we want it to have. Remember we still have our ALL category that applies this collection to all of the particles in our candidate!

Your Turn!

Now it's your turn to run this updated Python script with the previous yaml file. Make sure you are happy with the functors being used and try and add some more from the functor documentation here. You can use the same yaml file as before and just update the python file like below.

OWNPV vs BPV

You may see other examples using some functor such as F.BPVFD(pvs) compared to F.OWNPVFD. BPV just stands for Best Primary Vertex and then the functor needs you to pass in the list of pvs. OWNPV is equivalent but has been shown to save CPU time by saving you passing the PV list in.

BPV is generally considered deprecated and should be avoided in lieu of OWNPV generally.

'''
lb-run -c x86_64_v3-el9-gcc13+detdesc-opt+g DaVinci/v65r0 lbexec DV_intermediate:main tuple_options.yaml
'''
import Functors as F
import FunTuple.functorcollections as FC
from DaVinci import Options, make_config
from DaVinci.algorithms import create_lines_filter
from FunTuple import FunctorCollection
from FunTuple import FunTuple_Particles as Funtuple
from PyConf.reading import get_particles, get_pvs
from RecoConf.event_filters import require_pvs


def main(options: Options):
    line = "Hlt2B2OC_BdToDsmPi_DsmToKpKmPim"
    data = get_particles(f"/Event/HLT2/{line}/Particles")
    line_prefilter = create_lines_filter(name=f"PreFilter_{line}", lines=[line])
    pvs = get_pvs()

    fields = {
        "Bs": "[B0 -> (D_s- -> K- K+ pi-) pi+]CC",
        "Ds": "[B0 -> ^(D_s- -> K- K+ pi-) pi+]CC",
        "Bs_pi": "[B0 -> (D_s- -> K- K+ pi-) ^pi+]CC",
        "Ds_pi": "[B0 -> (D_s- -> K- K+ ^pi-) pi+]CC",
        "Km": "[B0 -> (D_s- -> ^K- K+ pi-) pi+]CC",
        "Kp": "[B0 -> (D_s- -> K- ^K+ pi-) pi+]CC",
    }

    all_vars = FunctorCollection({
        "M": F.MASS,
        "P": F.P,
        "PT": F.PT,
        "ENERGY": F.ENERGY,
        "PX": F.PX,
        "PY": F.PY,
        "PZ": F.PZ,
        "ID": F.PARTICLE_ID,            # PDG ID of the particle
        "Q": F.CHARGE,                  # Electric charge
        "ETA": F.ETA,                   # Pseudorapidity
        "PHI": F.PHI,                   # Azimuthal angle
        "CHI2": F.CHI2,                 # χ²
        "CHI2DOF": F.CHI2DOF,           # χ² of degrees of freedom
        "OWNPVIP": F.OWNPVIP,           # Impact parameter wrt own PV
        "OWNPVIPCHI2": F.OWNPVIPCHI2,   # Impact parameter χ² wrt own PV

    })

    base_composite_variables = FunctorCollection({
        "VTXCHI2NDOF": F.CHI2DOF,         # Vertex fit χ²/ndf
        "END_VX": F.END_VX,               # x-coordinate of decay vertex
        "END_VY": F.END_VY,               # y-coordinate of decay vertex
        "END_VZ": F.END_VZ,               # z-coordinate of decay vertex
        # OWNPV values
        "OWNPV_X": F.OWNPVX,              # x-coordinate of best PV
        "OWNPV_Y": F.OWNPVY,              # y-coordinate of best PV
        "OWNPV_Z": F.OWNPVZ,              # z-coordinate of best PV
        "OWNPV_DIRA": F.OWNPVDIRA,        # Direction angle cosine wrt own PV
        "OWNPV_FD": F.OWNPVFD,            # Flight distance wrt own PV
        "OWNPV_FDCHI2": F.OWNPVFDCHI2,    # Flight distance χ² wrt own PV
        "OWNPV_VDRHO": F.OWNPVVDRHO,      # Radial flight distance wrt own PV
        "OWNPV_VDZ": F.OWNPVVDZ,          # z-direction flight distance
        "OWNPV_LTIME": F.OWNPVLTIME,      # Proper lifetime
        "OWNPV_DLS": F.OWNPVDLS,          # Decay length significance
        # DOCA
        "DOCA12": F.DOCA(1, 2),           # DOCA between first and second daughter
        "DOCA12CHI2": F.DOCACHI2(1, 2),   # DOCA χ² between first and second daughter
        # Daughter Max, Min and Sums
        "MAX_PT": F.MAX(F.PT),            # Maximum PT of daughters
        "MIN_PT": F.MIN(F.PT),            # Minimum PT of daughters
        "SUM_PT": F.SUM(F.PT),            # Sum of daughters' PT
        "MAX_P": F.MAX(F.P),              # Maximum momentum of daughters
        "MIN_P": F.MIN(F.P),              # Minimum momentum of daughters
        "SUM_P": F.SUM(F.P),              # Sum of daughters' momentum
        "MAX_OWNPVIPCHI2": F.MAX(F.OWNPVIPCHI2),  # Max IP χ² of daughters
        "MIN_OWNPVIPCHI2": F.MIN(F.OWNPVIPCHI2),  # Min IP χ² of daughters
        "SUM_OWNPVIPCHI2": F.SUM(F.OWNPVIPCHI2),  # Sum of daughters' IP χ²
        "MAXDOCACHI2": F.MAXDOCACHI2,      # Maximum DOCA χ² between any daughters
        "MAXDOCA": F.MAXDOCA,              # Maximum DOCA between any daughters
        "MAXSDOCACHI2": F.MAXSDOCACHI2,    # Maximum signed DOCA χ²
        "MAXSDOCA": F.MAXSDOCA,            # Maximum signed DOCA

    })

    # Need extra DOCA combinations since Ds decays to 3 final state particles
    Ds_extra_doca = FunctorCollection({
        "DOCA13": F.DOCA(1, 3),
        "DOCA23": F.DOCA(2, 3),
        "DOCA13CHI2": F.DOCACHI2(1, 3),
        "DOCA23CHI2": F.DOCACHI2(2, 3),
    })

    track_variables = FunctorCollection({
        # Standard PID
        "PIDp": F.PID_P,              # Proton PID likelihood
        "PIDK": F.PID_K,              # Kaon PID likelihood
        "PIDPi": F.PID_PI,            # Pion PID likelihood
        "PIDe": F.PID_E,              # Electron PID likelihood
        "PIDmu": F.PID_MU,            # Muon PID likelihood
        # PROBNNs
        "PROBNN_pi": F.PROBNN_PI,        # Neural net probability of being a pion
        "PROBNN_p": F.PROBNN_P,          # Neural net probability of being a proton
        "PROBNN_K": F.PROBNN_K,          # Neural net probability of being a kaon
        "PROBNN_e": F.PROBNN_E,          # Neural net probability of being an electron
        "PROBNN_mu": F.PROBNN_MU,        # Neural net probability of being a muon
        "PROBNN_GHOST": F.PROBNN_GHOST,  # Neural net probability of being a ghost track
        "ISMUON": F.ISMUON,              # Boolean: is it identified as a muon 0 or 1?
        # Additional track related info
        # F.TRACK gets the track object
        # F.NDOF gets the NDOF for that track
        # F.VALUE_OR(-1) means if no value exists return -1 instead of failing
        "TRNDOF": F.VALUE_OR(-1) @ F.NDOF @ F.TRACK,                # NDOF in track fit, if this is higher then more hits used in fit
        "NHITS": F.VALUE_OR(-1) @ F.NHITS @ F.TRACK,                # Total number of hits in all detectors
        "NVPHITS": F.VALUE_OR(-1) @ F.NVPHITS @ F.TRACK,            # Total number of hits in VELO phi sensors
        "NUTHITS": F.VALUE_OR(-1) @ F.NUTHITS @ F.TRACK,            # Total number of hits in UT
        "NFTHITS": F.VALUE_OR(-1) @ F.NFTHITS @ F.TRACK,            # Total number of hits in Fibre Tracker (SciFi)
        "TRACKHISTORY": F.VALUE_OR(-1) @ F.TRACKHISTORY @ F.TRACK,  # Track reconstruction history
    })

    variables = {
        "ALL": all_vars,                                    # Variables for all particles
        "Bs": base_composite_variables,                     # Variables specific to Bs
        "Ds": base_composite_variables + Ds_extra_doca,     # Variables specific to Ds 
        "Bs_pi": track_variables,                           # Variables for pion
        "Ds_pi": track_variables,                           # Variables for pion 
        "Kp":    track_variables,                           # Variables for kaon 
        "Km":    track_variables,                           # Variables for kaon 
    }

    funtuple = Funtuple(
        name=line,
        tuple_name="DecayTree",
        fields=fields,
        variables=variables,
        inputs=data,
    )

    algs = {line: [line_prefilter, require_pvs(pvs), funtuple]}
    return make_config(options, algs)

This is just one example of running DaVinci but there is a whole DaVinci tutorial section that can be found here. We can then checkout the root file.

Open up the root file using root:

root -l tuple.root

Navigate to the name of our tree:

Hlt2B2OC_BdToDsmPi_DsmToKpKmPim->cd()

List trees:

.ls

Print out the leaves:

DecayTree->GetListOfLeaves()->Print()

Practise plotting some variables:

DecayTree->Draw("Bs_M")

VS Code Extension

If you use VS Code you can also use an extension called "ROOT File Viewer" which works nicely for looking at small root files!

This should produce you a plot looking something like this (where for this example we've increased the number of candidates in the options yaml file):

Example of plotting the Bs_M mass peak from the root file.

Awesome! If you have that working then you can move to the next section on functor collections!