Biomedical data science
Biomedical data science is a multidisciplinary field which leverages large volumes of data to promote biomedical innovation and discovery. Biomedical data science draws from various fields including Biostatistics, Biomedical informatics, and machine learning, with the goal of understanding biological and medical data. It can be viewed as the study and application of data science to solve biomedical problems.[1] Modern biomedical datasets often have specific features which make their analyses difficult, including:
- Large numbers of feature (sometimes billions), typically far larger than the number of samples (typically tens or hundreds)
- Noisy and missing data
- Privacy concerns (e.g., electronic health record confidentiality)
- Requirement of interpretability from decision makers and regulatory bodies
Many biomedical data science projects apply machine learning to such datasets.[2][3] These characteristics, while also present in many data science applications more generally, make biomedical data science a specific field. Examples of biomedical data science research include:
- Computational genomics
- Computational imaging[3][4]
- Electronic health records data mining
- Biomedical network science[5]
Training in Biomedical Data Science
[edit]The National Library of Medicine of the US National Institutes of Health (NIH) identified key biomedical data scientist attributes in an NIH-wide review: general biomedical subject matter knowledge; programming language expertise; predictive analytics, modeling, and machine learning; team science and communication; and responsible data stewardship.[6]
University Departments and Programs
[edit]- Johns Hopkins University’s Department of Biomedical Engineering offers biomedical data science training at the undergraduate, master's, and PhD levels. They were the first university to offer programs at both undergraduate and graduate levels.
- Dartmouth College's Geisel School of Medicine houses the Department of Biomedical Data Science where Quantitative Biomedical Sciences programs are available at the master's and PhD levels.
- Imperial College London’s Faculty of Medicine and Data Science Institute offer an MRes in Biomedical Research (Data Science).
- Mount Sinai’s Icahn School of Medicine offers a Master of Science in Biomedical Data Science.
- Stanford University’s Department of Biomedical Data Science offers multiple biomedical informatics graduate programs (MS, PhD, and MD/PhD).
- The University of Exeter’s College of Healthcare and Medicine offers an MSc in Health Data Science.
Biomedical Data Science Research in Academia
[edit]Scholarly Journals
[edit]The first journal dedicated to biomedical data science appeared in 2018 – Annual Review of Biomedical Data Science.
“The Annual Review of Biomedical Data Science provides comprehensive expert reviews in biomedical data science, focusing on advanced methods to store, retrieve, analyze, and organize biomedical data and knowledge. The scope of the journal encompasses informatics, computational, and statistical approaches to biomedical data, including the sub-fields of bioinformatics, computational biology, biomedical informatics, clinical and clinical research informatics, biostatistics, and imaging informatics. The mission of the journal is to identify both emerging and established areas of biomedical data science, and the leaders in these fields.”[7]
Other journals have a more general scope than biomedical data science, but regularly publish biomedical data science research such as Health Data Science[8] and Nature Machine Intelligence.[9] Data science would not exist without curated datasets and the field has seen the rise of journals that are dedicated to describing and validating such datasets, some of which are useful for biomedical applications, including Scientific Data,[10] Biomedical Data,[11] and Data.[12]
Example
[edit]The Human Genome Project (HGP), which uncovered the DNA sequences that compose human genes, would not have been possible without biomedical data science. Significant computational resources were required to process the data in the HGP, as the human genome contains over 6 billion DNA base pairs.[13] Scientists constructed the genome by piecing together small fragments of DNA, and computing overlaps between these sequences alone required over 10,000 CPU hours. At this massive data scale, scientists relied on advanced algorithms to perform data processing steps such as sequence assembly and sequence alignment for quality control.[14] Some of these algorithms, such as BLAST, are still used in modern bioinformatics. Scientists in the HGP also had to address complexities often associated with biomedical data including noisy data, such as DNA read errors, and privacy rights of the research subjects.[15] The HGP, completed in 2004, has had immense impact both biologically, shedding light on human evolution, and medically, launching the field of bioinformatics and leading to technologies such as genetic screening and gene therapy.
References
[edit]- ^ Altman, Russ; Levitt, Michael (2018). "What is Biomedical Data Science and Do We Need an Annual Review of It?". Annual Review of Biomedical Data Science. 1: i–iii. doi:10.1146/annurev-bd-01-041718-100001. S2CID 134950609.
- ^ Baldi, Pierre (2018). "Deep learning in biomedical data science". Annual Review of Biomedical Data Science. 1: 181–205. doi:10.1146/annurev-biodatasci-080917-013343. S2CID 67381478.
- ^ a b Ronneberger, Olaf; Fischer, Philipp; Brox, Thomas (2015). "U-net: Convolutional networks for biomedical image segmentation". International Conference on Medical Image Computing and Computer-Assisted Intervention. arXiv:1505.04597.
- ^ Duncan, James S; Insana, Michael F; Ayache, Nicholas (2020). "Biomedical imaging and analysis in the age of big data and deep learning [scanning the issue]". Proceedings of the IEEE. 108: 3–10. doi:10.1109/JPROC.2019.2956422. S2CID 210077608.
- ^ Su, Chang; Tong, Jie; Zhu, Yongjun; Cu, Peng; Wang, Fei (2020). "Network embedding in biomedical data science". Briefings in Bioinformatics. 21 (1): 182–197. doi:10.1093/bib/bby117. PMID 30535359.
- ^ Zaringhalam, Maryam; Federer, Lisa; Huerta, Michael. "Core Skills for Biomedical Data Scientists" (PDF). US National Library of Medicine. US National Institutes of Health. Retrieved 21 February 2022.
- ^ "Annual Review of Biomedical Data Science". annualreviews.org. Retrieved 2022-02-21.
- ^ "Health Data Science". spj.sciencemag.org. Retrieved 2022-07-05.
- ^ "Nature Machine Intelligence". nature.com. Retrieved 2022-07-05.
- ^ "Scientific Data". nature.com. Retrieved 2022-07-05.
- ^ "Biomedical Data Journal". biomed-data.eu. Retrieved 2022-07-05.
- ^ "Data". mdpi.com. Retrieved 2022-07-05.
- ^ Piovesan, Allison; Pelleri, Maria C; Antonaros, Francesca; Strippoli, Pierluigi; Vitale, Lorenza (2019). "On the length, weight and GC content of the human genome". BMC Research Notes. 12 (1): 106. doi:10.1186/s13104-019-4137-z. PMC 6391780. PMID 30813969.
- ^ Altschul, Stephen F; Gish, Warren; Miller, Webb; Myers, Eugene W; Lipman, David J (1990). "Basic local alignment search tool". Journal of Molecular Biology. 215 (3): 403–410. doi:10.1016/S0022-2836(05)80360-2. PMID 2231712. S2CID 14441902.
- ^ Venter, J. Craig; et al. (2001). "The sequence of the human genome". Science. 291 (5507): 1304–1351. Bibcode:2001Sci...291.1304V. doi:10.1126/science.1058040. PMID 11181995.