List of chemical databases
Appearance
This is a list of websites that contain lists of chemicals, or databases of chemical information. There is further detail on the content of these and other resources in a Wikibook of information sources.
Abbreviation | Full name | Operator | Selects | Contains | ID prefix | Quality | Link | Entries |
---|---|---|---|---|---|---|---|---|
ACToR | Environmental Protection Agency | toxicology information; occurrence | "ACToR". | 893,280 | ||||
AtomWork | Inorganic Material Database | National Institute for Materials Science | crystal structures | "AtomWork". | 82,000 | |||
Beilstein | Beilstein database | Elsevier | organic compounds | properties | closed access | |||
BIAdb | Benzylisoquinoline Alkaloid Database | "BIAdb". | 846 | |||||
BindingDB | The Binding Database | Skaggs School of Pharmacy and Pharmaceutical Sciences at the University of California, San Diego | noncovalent association of molecules in solution | ChEMBL SMILES InChiKey targets | "BindingDB". | |||
BindingMOAD | Binding Mother of All Databases | protein ligand structures | "BindingMOAD". | 36047 | ||||
BMDB | Bovine Metabolome Database | Collaborative Drug Discovery | BMDB | manually selected and checked | "BMDB". | 7859 | ||
BMRB | Biological Magnetic Resonance Data Bank | University of Wisconsin | biological molecules including ligands, cofactors, peptides, saccharides | NMR spectroscopy | "BMRB". | |||
BRENDA | Technical University of Braunschweig | enzymes ligands | "BRENDA". | |||||
Carotenoids Database | carotenoids | CA | "Carotenoids". | 1195 | ||||
CCCBDB | Computational Chemistry Comparison and Benchmark DataBase | National Institute of Standards and Technology | gas phase molecules | "CCCDBD" | 2069 | |||
CCRIS | Chemical Carcinogenesis Research Information System | National Library of Medicine | substances that affect tumors | CCRIS | from primary literature, reviewed by experts | "CCRIS subset of PubChem". | 9562[1][2] | |
CDD Public | drug candidates | limited access | 3,000,000 | |||||
ChEBI | Chemical Entities of Biological Interest | ELIXIR | small chemical compounds | from PDBeChem ChEMBL KEGG IntEnz | "ChEBI". | 60,000 | ||
Chematica | Merck | organic chemicals | reaction pathway calculation; Beilstein CAS SMILES | proprietary | 7,000,000 | |||
ChEMBL | Chemicals from European Molecular Biology Laboratory | EMBL | molecules with drug-like properties | "ChEMBL". | 1,961,000 | |||
cheML.io | Departments of Computer Science and Chemistry at Nazarbayev University | de novo molecules generated by ML models | SMILES, computed properties | artificially generated | "cheML.io".[3] | 2,800,000 | ||
ChemDB | chemical database | small molecules | "ChemDB". | 5,000,000 | ||||
ChemExper | Chemexper Chemical Directory | catalogue chemicals | CASno Structure SMILES | "ChemExper". | ||||
Chemxpert Database | Chemxpert Chemical Database | small molecules database | buyers,suppliers | "ChemxpertDB". | 10,00000 | |||
Chemical Book | East West University | commercially available compounds | CASno, suppliers, properties | "Chemical Book". | 200,000 | |||
Chemical Register | from 20,000 vendors | CASno mainly from larger-scale suppliers | "Chemical Register". | 1,750,000 | ||||
ChemIDplus | National Library of Medicine | other NLM databases; regulated substances | CASNo UNII structure | CMNPD | https://chem.nlm.nih.gov/chemidplus/chemidlite.jsp | 400,000 | ||
ChemSpider | Royal Society of Chemistry | from 275 data sources | "ChemSpider". | 88,000,000 | ||||
ChemIndex | chemical database | substances | CAS Search; suppliers | "Chemindex". | ||||
Clival Database | Clinical Trail Database | Clinical Trail Data Solutions | 50,000 molecules clinical trail data | Phase 0 to IV indications | "clival". | |||
CMNPD | Comprehensive Marine Natural Products Database | Peking University | from literature and other databases | structural classification; species | CMNPD | curated | https://www.cmnpd.org/ | 31,561 |
COD | Crystallography Open Database | Vilnius University | small molecules (open source) | crystal structure atomic coordinates | COD | curated | "COD". | 478,715 |
Common Chemistry | American Chemical Society | structure CAS SMILES InCh | https://commonchemistry.cas.org/[4] | ~500,000 | ||||
Compendium of Pesticide Common Names | British Crop Production Council | Pesticides with ISO common names | structure, CASNo, IUPAC name, SMILES, InChI | curated | "Compendium of Pesticide Common Names". | 1,800 | ||
CompTox | CompTox Chemicals Dashboard | US Environmental Protection Agency | chemicals evaluated for potential health risks | "CompTox". | ||||
CosIng | Cosmetic Ingredients | European Commission | cosmetic ingredients | "CosIng". | ||||
CrystalWorks | Science and Technology Facilities Council | "CrystalWorks". | ||||||
CSD | Cambridge Structural Database | Cambridge Crystallographic Data Centre | "CSD". | 1,038,250 | ||||
CSDB | Carbohydrate Structure Database | Zelinsky Institute of Organic Chemistry | carbohydrates | structures references | CSDB ID | "CSDB". | ||
CTD | Comparative Toxicogenomics Database | Department of Biological Sciences at North Carolina State University | MeSH CASNo ChEBI PubChem genes, pathways | "CTD". | ||||
DDB | Dortmund Data Bank | pure compounds, mixtures, gas hydrates | physical properties | "DDB". | ||||
Dissociation Constants | IUPAC Digitized pKa Dataset | IUPAC | dissociation constants | "Dissociation Constants". GitHub. | ||||
DETHERM | DECHEMA | thermophysical properties | "DETHERM". | 75,000 | ||||
DrugBank | University of Alberta | drugs | "DrugBank". | |||||
DrugCentral | University of New Mexico | pharmaceuticals | products containing substance | "DrugCentral". | ||||
DTP/NCI | DTP Open Compound collection | National Cancer Institute Development Therapeutics Program | Cancer therapeutics | Cancer Chemotherapy National Service Center number | "DTP/NCI". | 250,000 | ||
ECHA | REACH database | European Chemicals Agency | EINECS ELINCS NLP | CASNo HPhrases pictograms tonnage | "ECHA/REACH". | 245,000 | ||
EAWAG-BBD | Biocatalysis/Biodegradation Database | Eawag: Swiss Federal Institute of Aquatic Science and Technology | CAS SMILES pubchem pathways | "EAWAG-BBD". | 1396 | |||
eMolecules | drug screening chemicals | list of suppliers and catalog numbers | "eMolecules". | 8,000,000[5] | ||||
ENCS | Japanese Existing and New Chemical Substances Inventory | regulated chemicals | "ENCS (in Japanese)". | |||||
Evaluated Kinetic Data | IUPAC | rate constants | curated | "Evaluated Kinetic Data". | ||||
FDA SRS | Food and Drug Administration Substance Registration System | U.S. National Library of Medicine | ingredients in FDA regulated products | UNII inchikey | "FDA SRS". | 781,000 | ||
FEMA | Flavor Ingredient Library | Flavor and Extract Manufacturers Association | CAS CFR FEMA number | "FEMA". | ||||
FooDB | Food Database | University of Alberta | Food components and additives | "FooDB". | 70926 | |||
GlyTouCan | international glycan structure repository | Ministry of Education, Culture, Sports, Science & Technology [which country?] |
glycans | WURCS GlycoCT PubChem CID | G | "Glycan Repository". | 122194 | |
Gmelin | Gmelin database | Elsevier | inorganic and organometallic compounds | closed access | 1,500,000 | |||
G-SRS | Global Substance Registration System | CAS PubChem ChEMBL INN UNII | "G-SRS". | 109,260 | ||||
GMD | Golm Metabolome Database | GC/MS of metabolites | "GMD". | |||||
Guide to PHARMACOLOGY | IUPHAR | drugs and targets | INN CAS ChEBI ChEMBL DrugBank PubChem | "Guide to PHARMACOLOGY". | ||||
Henry's law constants | Max Planck Institute for Chemistry | volatile compounds | Henry's law constants | from literature | "Henry's law constants". | 46434 | ||
HMDB | Human Metabolome Database | Genome Canada | metabolites found in the human body | biochemical data, clinical data | HMDB | "HMDB". | 114,222[6] | |
HugeMDB | Huge Molecular Database | Elegant Mathematics LLC | Small molecules (most of entries have <100 atoms) | major conformers with its 3D and easy search on them | M | good correlated with PubChem on data that is available on PubChem | "HugeMDB". | 102 million |
ICSC | ILO International Chemical Safety Cards | International Labour Organization | CAS, EC number, UNnumber | "ICSC". | 1784 | |||
ICSD | Inorganic Crystal Structure Database | FIZ Karlsruhe GmbH | "ICSD". | 161,030 | ||||
IEDB | Immune Epitope Database | National Institute of Allergy and Infectious Diseases | Epitopes mainly peptides and carbohydrates | "IEDB". | 3,002 non-peptides | |||
IUPAC-NIST Solubility Database | https://srdata.nist.gov/solubility/index.aspx | |||||||
JECDB | Japan Existing Chemical Database | CAS EINECS RTECS SDBS TSCA graph of number of articles per year | "JECDB". | |||||
J-GLOBAL | Nikaji | Japan Science and Technology Agency | "J-GLOBAL". | |||||
KEGG | Kyoto Encyclopedia of Genes and Genomes | Kyoto University Bioinformatics Center | Compounds Glycans (also enzymes, reactions, pathways) | CAS ChEBI ChEMBL MASSBANK NIKKAJI PubChem PDB-CCD | "KEGG". | |||
Ki Database | PDSP | ligand binding | "Ki Database". | |||||
KNApSAcK | Nara Institute of Science and Technology | InChI CAS SMILES organisms | C00 | "KNApSAcK". | ||||
LINCS | Library of Integrated Network-based Cellular Signatures | small molecules | PubChem ChEMBL SMILES InChI | LSM | "LINCS". | 43,700 | ||
LipidBank | Japanese Conference on the Biochemistry of Lipids | lipids | "LipidBank". | 7,009 | ||||
LMSD | LIPID MAPS Structure Database | Lipids | HMDB ChEBI PubChem InChI | LMFA | "LMSD". | 44701 | ||
LOLI | List of Lists | safety data sheets, regulation | "LOLI". | |||||
Mcule | supplied chemicals | InChI, SMILES, SDF, physichochemical properties | "Mcule". | 45,000,000 | ||||
MediaDB | Institute for Systems Biology | growth media | "MediaDB". | 288 | ||||
Merck Index | Royal Society of Chemistry | drugs | "Merck-Index". | 11,500 | ||||
MeSH | Medical Subject Headings | US National Library of Medicine | biomedical thesaurus | hierarchy of descriptors to literature with MeSH ID | "MeSH". | |||
MetaCyc | SRI International | metabolic pathways; metabolites | "MetaCyc". | |||||
MetaboLights | EMBL-EBI | MTBL | "MetaboLights". | |||||
MetaNetX | SIB Swiss Institute of Bioinformatics | metabolic networks, metabolites, biochemical reactions, cellular compartments | metabolic models, SBML, InChI, InChIKey, SMILES | MNXM | unified namespace for metabolites and biochemical reactions in the context of metabolic models | "MetaNetX". | 240 metabolic models, 1292154 metabolites, 74613 reactions, 44 compartments | |
METLIN | Metabolite and Chemical Entity Database | tandem mass spectrometry of metabolites | "METLIN". | 960,000 | ||||
MINAS | Metal Ions in Nucleic AcidS | University of Zurich | https://www.minas.uzh.ch/ | |||||
ModelSeed | KEGG
MetaCyc metabolic pathways |
CPD | "ModelSeed". | |||||
MolPort | catalog chemicals | "MolPort". | ||||||
MoNA | Mass Bank of North America | mass spectra | splash legg chemspider pubchem chebi CAS | "MoNA". | 200,000 | |||
npatlas | The Natural Products Atlas | Simon Fraser University | microbial and fungal products | smiles, organism | NPA | npatlas[7] | 33434 | |
NIOSH pocket guide | NIOSH Pocket Guide to Chemical Hazards | National Institute for Occupational Safety and Health | commonly used chemicals | exposure limits | "NIOSH". 2 August 2024. | 677 | ||
NIST Webbook | NIST Chemistry Webbook | National Institute of Standards and Technology | spectra CAS ionization energy mass spectrum, InChI | C+CAS | "NIST Webbook". | |||
NMRShiftDB | University of Cologne | organic | nuclear magnetic resonance spectra | "NMRShiftDB". | 43,581 | |||
NORMAN SLE | NORMAN Suspect List Exchange | environmental monitoring | "NORMAN SLE". | 110,000 | ||||
OMG | Open Macromolecular Genome | Jackson group at University of Illinois at Urbana-Champaign | synthetically accessible linear homopolymers | SMILES of linear homopolymers | Github / Zenodo | 12,886,131 | ||
ORD | Open Reaction Database | ORD consortium | Organic reactions | machine-readable reaction schemes | "ORD"[8] | 2,000,000 | ||
OrgSyn | Organic Syntheses | Organic Syntheses, Inc. | Reliable chemical reactions | Searchable experimental procedures | Peer reviewed | "OrgSyn search". | ||
PDB PDBe | Protein Data Bank in Europe | EMBL-EBI | has some chemicals as well as proteins | "PDBe". | ||||
PATENTSCOPE | WIPO | "PATENTSCOPE". | 16,000,000 | |||||
PDB | RSCB Protein Data Bank | "PDB". | 166,891 | |||||
PharmGKB | Shriram Center for Bioengineering and Chemical Engineering | drugs targets | prescribing info | curated | "PharmGKB". | |||
PHAROS | Illuminating the Druggable Genome | National Institutes of Health | drug ligands; targets[9] | https://pharos.nih.gov/ | 355932 ligands
20412 targets | |||
Phenol-Explorer | polyphenols found in food | "Phenol-Explorer". | 500 | |||||
Phosida | PHOsphorylation SIte DAtabase | protein modifications | "Phosida". | |||||
PoLyInfo | Polymer Database | National Institute for Materials Science | physical properties | "PoLyInfo". | 26,000 | |||
PPDB | Pesticide Properties Database | Agriculture & Environment Research Unit, University of Hertfordshire | Pesticides and their metabolites | Chemical structure, physicochemical properties, human health and ecotoxicological data | curated | "PPDB". | 2000[10] | |
Probes and Drugs | ||||||||
ProCarDB | Prokaryotic Bacterial Carotenoid DataBase | IMTECH | spectra references | "ProCarDB". | 1800 | |||
PubChem | National Library of Medicine National Center for Biotechnology Information | from 748 data sources | Structures, Names and Identifiers, Chemical and Physical Properties, Spectral Information, Related Records, Chemical Vendors, Pharmacology and Biochemistry, Use and Manufacturing, Safety and Hazards, Toxicity, Literature, Patents, Biomolecular Interactions and Pathways, Biological Test Results | "PubChem". | 103,000,000 | |||
Reaxys | Elsevier | chemical compounds | Searchable chemical reactions | "About Reaxys". | 118,000,000 | |||
Ref-DB | Re-referenced Protein Chemical shift Database | proteins from BioMagResBank | Re-referenced NMR shift | "Ref-DB". | 2162 | |||
Rhea | Swiss Institute of Bioinformatics | biochemical reactions | ChEBI | curated | "Rhea". | |||
RÖMPP | Thieme Gruppe | "RÖMPP". | ||||||
RTECS | Registry of Toxic Effects of Chemical Substances | Dassault Systèmes | Toxicity, Literature | "Biovia-RTECS". 8 September 2023. | 160,000 | |||
RxNav | U.S. National Library of Medicine | drugs | interactions | "RxNav". | ||||
SaguaroChem | De Novo Chem | Chemical reactions from the patent literature | Chemical reaction SMILES, annotated procedures, characterization data, reference metadata | Curated from patent literature | "SaguaroChem". 4 July 2024. | 2,091,105 | ||
SciFinder | Chemical Abstracts Service of American Chemical Society | organic, inorganic chemicals, proteins | CASNo | paid access only | 130,000,000 | |||
ScrubChem | scraped from PubChem | "ScrubChem". | 2,282,992 | |||||
SDBS | Spectral Database for
Organic Compounds |
National Institute of Advanced Industrial Science and Technology (AIST), Japan | Organic compounds | Spectra:IR Raman MASS ESR 1H NMR 13C NMR | SDBS No | curated | "SDBS". | 34,000 |
Serum Metabolome Database | The Metabolomics Innovation Centre | found in blood serum | "Serum Metabolome DB". | 4,651 | ||||
Solvent Selection Tool | ACS Green Chemistry Institute | Solvents | Principal components analysis of physical properties | curated | "Solvent Selection Tool". | 272[11] | ||
SPRESIweb | InfoChem Gesellschaft für chemische Information mbH | organic molecules and reactions | organic structures | from literature | "SPRESI". | 5,800,000 | ||
SpringerMaterials | Springer | solid materials | CAS InChI physical properties | from literature | "SpringerMaterials". | 155,165 + 494,942 | ||
STITCH | EMBL | from Biocarta, BioCyc, GO, KEGG, and Reactome | Chemical-Protein Interactions | curated and predicted | "STITCH". | 500,000 | ||
SuperDRUG2 | Structural Bioinformatics Group | drugs targets | targets, dose, side effects, Canonical SMILES, Standard InChI, Standard InChIKey, DrugBank, ChEMBL, DrugCentral, KEGG, PubChem, CASRN | SD | "SuperDRUG2". | 4,600 | ||
Super Natural II | natural product chemicals | SMILES vendors | SN00 | "Super Natural II". | 325,508 | |||
SureChEMBL | European Molecular Biology Laboratory | substances in patents | patent text | "SureChEMBL". | ||||
SwissLipids | Swiss Institute of Bioinformatics | lipids | SLM: | "SwissLipids". | ||||
TDR Targets | Tropical Disease Research | Trypanosomatics Laboratory | drugs and targets | "TDR Targets". | 2,000,000 | |||
TTD | Therapeutic Targets Database | Zhejiang University | drugs and targets | SMILES InChI CAS PubChem | "TTD". | 37,316 | ||
T3DB | Toxin and Toxin-Target Database
Toxic Exposome Database |
University of Alberta | toxins and toxin targets | T3D | "T3DB". | 3,678 | ||
UniChem | EMBL-EBI | pointers to existing chemicals; indexes 41 databases[12] | Structure; StdInChI; links to databases | automated loads | ""Compound Sources Search"". | >2000000 | ||
UniProt | UniProt Knowledgebase | proteins | sequence, modifications, location, organism, similar | "UniProt". | ||||
US DOT | US Department of transport | Emergency response guidebook
DOT + others |
bulk transported chemicals | UNnumber United Nations ID number, hazard response guide | "Emergency response guidebook" (PDF). | 3000 | ||
UV/VIS Spectral Atlas | The MPI-Mainz UV/VIS spectral atlas of gaseous molecules of atmospheric interest | Max Planck Institute for Chemistry | gaseous molecules | absorption cross sections | from literature | "UV/VIS Spectral Atlas". | 7313 | |
YMDB | Yeast Metabolome Database | The Metabolomics Innovation Centre | metabolites of yeast | 48 data fields | YMDB | "YMDB". | 16042 | |
ZINC | ZINC is not commercial | University of California, San Francisco | purchasable substances | EPA DSS TOX, ChEMBL, HMDB, KEGG, PDB, SMILES | "ZINC".[13] | 37 x 109 |
References
[edit]- ^ "Chemical Carcinogenesis Research Information System (CCRIS) - PubChem Data Source". pubchem.ncbi.nlm.nih.gov. Retrieved 2020-08-07.
- ^ "Download CCRIS (Chemical Carcinogenesis Research Information System) Data". www.nlm.nih.gov. Retrieved 2020-08-07.
- ^ Zhumagambetov, Rustam; Kazbek, Daniyar; Shakipov, Mansur; Maksut, Daulet; Peshkov, Vsevolod A.; Fazli, Siamac (2020-12-17). "cheML.io: an online database of ML-generated molecules". RSC Advances. 10 (73): 45189–45198. Bibcode:2020RSCAd..1045189Z. doi:10.1039/D0RA07820D. ISSN 2046-2069. PMC 9058596. PMID 35516285.
- ^ Jacobs, Andrea; Williams, Dustin; Hickey, Katherine; Patrick, Nathan; Williams, Antony J.; Chalk, Stuart; McEwen, Leah; Willighagen, Egon; Walker, Martin; Bolton, Evan; Sinclair, Gabriel; Sanford, Adam (13 May 2022). "CAS Common Chemistry in 2021: Expanding Access to Trusted Chemical Information for the Scientific Community". Journal of Chemical Information and Modeling. 62 (11): 2737–2743. doi:10.1021/acs.jcim.2c00268. PMC 9199008. PMID 35559614.
- ^ "Vision - eMolecules". www.emolecules.com. Retrieved 2020-07-27.
- ^ "Human Metabolome Database: About the Human Metabolome Database". hmdb.ca. Retrieved 2020-07-27.
- ^ Van Santen, Jeffrey A.; Jacob, Grégoire; Singh, Amrit Leen; et al. (2019). "The Natural Products Atlas: An Open Access Knowledge Base for Microbial Natural Products Discovery". ACS Central Science. 5 (11): 1824–1833. doi:10.1021/acscentsci.9b00806. PMC 6891855. PMID 31807684.
- ^ Kearnes, Steven M.; Maser, Michael R.; Wleklinski, Michael; et al. (2021). "The Open Reaction Database". Journal of the American Chemical Society. 143 (45): 18820–18826. doi:10.1021/jacs.1c09820.
- ^ "Pharos: Illuminating the Druggable Genome". pharos.nih.gov. Retrieved 2024-10-02.
- ^ Lewis, Kathleen A.; Tzilivakis, John; Warner, Douglas J.; Green, Andrew (2016). "An international database for pesticide risk assessments and management". Human and Ecological Risk Assessment. 22 (4): 1050–1064. Bibcode:2016HERA...22.1050L. doi:10.1080/10807039.2015.1133242. hdl:2299/17565. S2CID 87599872.
- ^ Diorazio, Louis J.; Hose, David R. J.; Adlington, Neil K. (2016). "Toward a More Holistic Framework for Solvent Selection". Organic Process Research & Development. 20 (4): 760–773. doi:10.1021/acs.oprd.6b00015.
- ^ "UniChem". www.ebi.ac.uk. Retrieved 2024-10-02.
- ^ Tingle, Benjamin I.; Tang, Khanh G.; Castanon, Mar; Gutierrez, John J.; Khurelbaatar, Munkhzul; Dandarchuluun, Chinzorig; Moroz, Yurii S.; Irwin, John J. (2023). "ZINC-22─A Free Multi-Billion-Scale Database of Tangible Compounds for Ligand Discovery". Journal of Chemical Information and Modeling. 63 (4): 1166–1176. doi:10.1021/acs.jcim.2c01253. PMC 9976280. PMID 36790087.