Data sources
Besides data extracted from source compound set files, there is a lot of data harvested from external data sources (names, external identifiers, bioactivity data, associated pathways, target and pathway ontologies).
In the table below, all used external data sources and services with information about harvested data are described:
Source
Version
Comment
License
34
All potency values (pValues) for binding and functional assays are extracted for concrete targets, cell-lines as well as whole organisms. When more than one value for a ligand-target complex is available, the average of these values is calculated.
2024.02
All activity data are extracted, even the ones where a bioactivity value is not known.
89
Reactome pathways are matched according to target UniProt IDs.
08.2024
Data are matched according to target UniProt IDs.
08.2024
UniChem service is used to harvest external IDs from all available external sources. For harvesting the chembl_webresource_client python package is used.
08.2024
PubChem is the main source for the manual extraction of compound structures. Generally, when a compound misses its structure (or the structure is wrong), it is found according to a name provided by a supplier/provider. In many cases, compounds are also identified by their PubChem CIDs/SIDs.
08.2024
From MolPort, information about the in-stock availability of compounds is used.
08.2024
From Mcule, information about the in-stock availability of compounds is used.
08.2024
Compounds are matched according to their BindingDB ligand IDs harvested from UniChem.
02.2021
From Chemspace, information about the in-stock availability of compounds is used.