FAQ
How are structures in the Probes and Drugs portal curated?
The structures are uploaded to the database in their original form as provided by a data source. The main reason is that we want to show the actual situation in the field of bioactive compounds. Another reason is that very often it is very hard to assess the right stereochemical form of a compound or e.g., the active form of a compound (compound X its salt).
How were the compound sets in the Probes & Drugs Portal selected?
There was no special screening process concerning the selection of compound sets currently contained in the Probes and Drugs portal. The sets were selected according to their availability, and popularity within scientific community. Not all compound sets we wanted to incorporate into our database could be incorporated because of their availability or license restrictions. The Probes and Drugs portal is open to incorporation of new compound sets.
How is a compound tagged as a probe?
All compounds from probe compound sets are tagged as probes.
How is a compound tagged as a drug?
A compound is tagged as a drug when it belongs to one of the drug sets (DrugBank, DrugCentral, NIH Approved oncology drugs, ChEMBL Approved Drugs) or when it is classified as a drug by one of the external data sources (currently ChEMBL and Guide to PHARMACOLOGY).
What is the difference between original, standardized and nonisomeric compound?
The original compound is a compound extracted from a source file without any changes; the standardized compound is a compound created from the original compound by standardizing the structure (unsalting, neutralization, if possible, common functional groups standardization, the active pharmaceutical ingredient, API, detection in multi-component systems), preserving its stereochemistry; and the nonisomeric compound is just the standardized compound with stripped stereochemistry. That means that for each original compound there is exactly one standardized and one nonisomeric compound, but there can be more original compounds for one standardized compound and more standardized compounds for one nonisomeric compound. Standardized/nonisomeric compound not only joins compounds with the same API but also joins all of their associated data.
By default, standardized compounds are presented on the P&D portal with the possibility to switch between different compound types. However, users should be aware that in the original compounds’ view, many compounds can lack data associated with a different compound form (e.g., different salt or its API); on the other, in the nonisomeric compounds’ view, even stereoisomers with different biological properties can be merged together, e.g., merging together a chemical probe and its inactive analogue (control compound).
What are the differences between experimental and calculated probes?
Experimental denotes compounds from published probe characterisation experiments. These papers include profiling data for target modulation potency, selectivity, and possible secondary targets. Also important to note is that the experimental data largely originate from a single laboratory and are thus likely to have more consistent and reproducible data (e.g. where intra-laboratory assay variation is controlled via sufficient replicates and internal standards).
We use the term calculated here to denote in silico evaluation using the combination of public data with a custom scoring function (we intentionally avoided the term predicted in this context because machine learning methods were not used). The evaluation of probes at scale involves comparing public data at different stringencies according to availability. The seven key criteria are: 1) <100 nM target potency in vitro, 2) <1 uM for cell-based assays with evidence of direct target engagement, 3) target selectivity >100-fold, 4) absence of structural alerts indicating chemical liabilities 5) identification of an inactive analogue as control with significantly lower potency or inactive against the primary target, 6) an orthogonal probe with a different chemotype against the same primary target, and 7) SAR data to increases confidence in specific target modulation. The simplest approach to assigning a compound as a probe is to use these criteria (or a subset thereof). However, this can be nuanced by weightings (e.g. for potency and selectivity) as well as more complex scoring (e.g. functions favouring compounds with a wider range of target profiling data). In contrast to experimental probes, the data for the evaluation of the calculated probes from different sources can be less consistent and may not be acquired with the objective of developing and validating a probe per see.
What is the P&D probe-likeness score?
P&D probe-likeness score is a score, ranging from 0 to 100%, calculated for compounds labelled as probes for each of their directly annotated human target of interest (if specified). It consists of 6 parameters: potency, selectivity, cell-potency, control compound, orthogonal probe and structural alerts, and a synergy part of the score. The definition of each parameter and its' weight within the score can be found in the table below. Finally, the P&D probe-likeness score is simply a sum of all of its parts. If a P&D probe-likeness score for a compound-target pair is larger than 70% (not equal or larger, but LARGER), the compound is labelled as P&D approved. The score is designed in a way that the 70% threshold can't be surpassed if the compound is not potent/cell-potent on target and selective at the same time. You can easily filter all P&D approved chemical probes using the P&D approved tag.
Parameter
Weight
Description
Potency
<6.5 = 0%
6.5 <= X <= 7.0 = 10% -> 20% (linear)
>7.0 = 20%
Potency of a compound on target in a biochemical assay. If a compound's potency on target is at least 7 in -log(M) units (100 nM, common probe-like criterion value), its score is equal to 20%. However, even compounds with a little lower potency can serve as great chemical probes. Therefore, the potency can go as low as 6.5 (316 nM) with a linear decrease of the score to 10%. If the compound's potency is lower than this value, the score is equal to 0%. If a compound's potency in biochemical assay is unknown, it is substituted for an on target cell-potency value, if available.
Selectivity
<10-fold = 0%
10-fold <= X <= 30-fold = 10% -> 20% (linear)
>30-fold = 20%
Selectivity of compound compared to it's nearest target neighbor. If a compound's selectivity on target compared to its nearest less potent neighbor is at least 30-fold (common probe-like criterion value), its score is equal to 20%. However, even compounds with a little lower selectivity can serve as great chemical probes. Therefore, the selectivity can go as low as 10-fold with a linear decrease of the score to 10%. If the compound's selectivity is lower than this value, the score is equal to 0%. If a compound's selectivity can't be assesed, i.e., it has no other annotated target/inactive target, the score is also equal to 0%. The selectivity is calculated only in comparison to human protein targets, both single proteins and complexes (if they don't contain the assesed protein as a subunit). If a compound is selective for multiple targets blonging to the same target subfamily it is lebaelled as family-selective.
Cell-potency
<5.5 = 0%
5.5 <= X <= 6.0 = 10% -> 20% (linear)
>6.0 = 20%
Potency of a compound on target in a cell-based assay. If a compound's potency on target in a cell-based assay is at least 6 in -log(M) units (1 micorM, common probe-like criterion value), its score is equal to 20%. However, even compounds with a little lower cell-potency can serve as great chemical probes. Therefore, the potency can go as low as 5.5 (3.16 microM) with a linear decrease of the score to 10%. If the compound's cell-potency is lower than this value, or unknown, the score is equal to 0%.
Potency-Selectivity Synergy
10%
Synergy between potency, selectivity and cell-potency. If a compound has a non-zero potency, selectivity and cell-potency score, the synergy score is equal to 10%, otherwise to 0%. Therefore, if one of the scores (potency, selectivity or cell potency) is equal to 0%, the compound can't surpass the >70% P&D probe-likeness score value to be labelled as P&D approved.
Control compound
10%
Chemical probe's control compound. If a control compound for a chemical probe is defined, the score is equal to 10%, otherwise to 0%.
Orthogonal probe
10%
Chemical probe's orthogonal probe. If a structurally distinct chemical probe for the same target is defined, the score is equal to 10%, otherwise to 0%. Orthogonal probes can't share the same structural scaffold (Bemis-Murcko scaffold extracted from a generic compound - all atoms converted to carbon atoms, all bonds converted to single) with maximum similarity 40% (Tanimoto similarity using Morgan fingerprint with radius 2 - similar to ECFP4).
Structural alert
10% (PAINS filters, aggregators, nuisance compounds)
-30% (obsolete compounds)
The presence of a structural alert. If a compound is not labelled as a PAINS compound (SMARTS match), an aggregator (exact structural match) or a nuisance compound (exact structural match), the score is equal to 10%, if it is labelled as obsolete/historic compound, the score is equal to -30%, otherwise to 0%.
What is the High-Quality Chemical Probes (HQCP) set?
The High-Quality Chemical Probes set (HQCP) emerged as an outcome of our data study published in RSC Medicinal Chemistry with the title "Will the chemical probes please stand up?". HQCP partly consists of chemical probes that are members of one of the high-quality probe sets. These P&D sets are:
- Bromodomains chemical toolbox
- Chemical Probes for Understudied Kinases
- Gray Laboratory Probes
- Open Science Probes (SGC Donated Chemical Probes)
- opnMe Portal
- Protein methyltransferases chemical toolbox
- SGC Probes
Next, we employed the P&D probe-likeness score (PDPS, see What is the P&D probe-likeness score?) for the addition of P&D approved experimental probes plus those P&D approved calculated probes that are in at least one established tool compound set. We have thus partitioned 5 compound sets from P&D (in the original paper there are only 4, in the meantime, the EUbOPEN Chemogenomics Library was added):
- Concise Guide to Pharmacology - a set extracted from a biennial series of publications providing concise overviews of the key properties of ∼1800 human drug targets with an emphasis on selective pharmacology.
- EubOPEN Chemogenomics Library - a combination of EUbOPEN consortium chemical probes with compounds employing less stringent criteria for defining small molecules used in chemogenomics compound sets enables covering a larger target space.
- Kinase chemogenomics set (KCGS) - a collection of narrow-spectrum small molecule kinase inhibitors assembled by the SGC-UNC to study the biology of dark kinases. This is the most diverse and highly annotated public collection of kinase inhibitors.
- Kinase inhibitors (best-in-class) - extracted from a series of Molecular Cell papers by Wang and Gray summarising recently-reported kinase inhibitors.
- Novartis Chemogenetic Library (NIBR MoA Box) - compiled via data mining and institutional crowdsourcing. It is regularly updated and used widely both within Novartis and by their external collaborators.
Below, you can find the criteria employed for the selection of the HQCP set. The compound is assigned into the set if:
- it belongs to one of the high-quality probe sets
- or its Chemical Probes portal rating for use in cells or for use in organisms is at least 75% (i.e. three out of four stars in the original Chemical Probes Portal rating system)
- or it’s a P&D approved experimental probe
- or it’s a P&D approved probe belonging to one of the non-commercial high-quality sets
- and it’s not labelled as a historical (obsolete/nuisance) compound
HQCP set can be “easily” re-created using the P&D portal’s filtering system. If you incorporate all filters that are mentioned above, you will get a query containing 18 interconnected filter (with proper Boolean operators) and 4 brackets. To call the query you can use this link including all necessary parameters.
What is the bio-chem and cell-based selectivity?
The selectivity of a compound quantifies the concentration window between the first and second associated targets based on their compound’s potency (i.e., concentration in the form of pKi, pIC50, pEC50, etc.). On P&D, it is simply represented by the fold change between the first and second concentration, e.g., if, on the negative logarithmic scale, the first potency value is 7 (100 nM) and the second 5 (10 µM), the fold change is 100x (the compound is 100x more selective on the first target). The selectivity is calculated separately for values measured in biochemical (bio-chem selectivity) and cell-based (cell-based selectivity) assays, and only for human targets. In case, there is only one target with an assigned activity value associated with the compound, it depends on whether it also has at least one inactive target association. If not, the selectivity is not calculated/assigned, if yes, value 4 (i.e., 100 µM potency) is arbitrarily used as the bottom threshold for the calculation.
What is a structural alert?
Structural alert is a specific tag which should tell a user to be aware about a possible problematic
behavior of a compound in the context of biological screening. It is associated either with a
compound’s biological properties (e.g., non-selective or not sufficiently potent compound) or its
structural features that may cause unwanted effects within an assay (e.g., non-specific reaction with
a protein). Currently integrated types of structural alerts can be found in the table below:
Alert
Identification
Sources
PAINS (Pan Assay INterference)
SMARTS match
Aggregators
Exact structure match
What is a chemical space network?
Chemical space network (CSN) is a representation of chemical space in a form of a network where some of the displayed compounds/points (in extreme cases, all of
them or none) are interconnected by edges (links) according to a defined molecular similarity threshold. Its default
value on the P&D portal is set to 0.7, i.e., 70% similarity, but any number in range from 0.3 to 1 can be used.
Another parameter that influences the number of edges in the CSN is the number of maximum nearest neighbours (Max NN). Using this parameter, the maximum number of most similar compounds connected to a particular compound can be limited (even though there could be more meeting the similarity threshold parameter). In case any compound is already connected to its the maximum number of NNs, but also belongs to other compound's NNs (not already connected), then such pair is also connected together. In case you want to connect all compounds meeting the similarity threshold, leave the max NN parameter empty.
For further information about CSNs see Chemical space networks: a powerful new paradigm for the description of chemical space by Maggiora and Bajorath.
Why should I register at the Probes & Drugs portal?
Only when you are registered and logged in, you can create your own custom sets (more on custom sets here) which can be further used in new queries. There will also be more functions available only for registered users in the future.
What are custom sets?
Custom sets are arbitrary, user defined, compound sets intended to store advanced queries with possibility to manually add/remove single compounds.
Since custom sets are bound to a user, only registered and logged users can create them. First, a custom set has to be initialized in the Custom Sets tab. Once a custom set is created, single or multiple (batch) compounds can be added from the Compounds tab. Single by clicking on an arrow in a top right corner when hovering over a compound's image; multiple by clicking on the larger arrow on the right side of the second navigation tab. Compounds can be removed from a custom set only in a particular custom set view (Custom Sets > click on a custom set) using a cross icon. Again, both single and multiple compounds can be removed.
Currently, the maximum number of custom sets per user is limited to 5.
Which web browsers are recommended for the Probes & Drugs portal?
P&D portal is a complex web application that utilizes latest functionality integrated in HTML5 and CSS3 technologies. For that reason, using some of the older or not so common web browsers can lead to an unexpected appearance and behaviour of the portal. We recommend to use one of the following browsers with the specified or a newer version:
For smoothest user experience, we recommend to use Chrome.
What is the Potency-Selectivity score? (discontinued)
The Potency-Selectivity was discontinued in P&D 02.2021.