Submit Your Research

Digital Public Health

Health Data Science welcomes research that furthers its mission—data for better health—including research concerning digital public health.

Learn how to submit

Journal profile

The open access journal Health Data Science, published in association with PKU, publishes innovative, scientifically-rigorous research to advance health data science. 

Editorial board

Health Data Science's Editorial Board is led by Qimin Zhan (Chinese Academy of Engineering and Peking University) and is comprised of active researchers and experts in health data science from around the world.


Visit our news page to read about the latest developments with Health Data Science, including news releases and the announcement of our 2021 Reviewers of the Year!

Latest Articles

More articles
Review Article

Knowledge Graph Applications in Medical Imaging Analysis: A Scoping Review

Background. There is an increasing trend to represent domain knowledge in structured graphs, which provide efficient knowledge representations for many downstream tasks. Knowledge graphs are widely used to model prior knowledge in the form of nodes and edges to represent semantically connected knowledge entities, which several works have adopted into different medical imaging applications. Methods. We systematically searched over five databases to find relevant articles that applied knowledge graphs to medical imaging analysis. After screening, evaluating, and reviewing the selected articles, we performed a systematic analysis. Results. We looked at four applications in medical imaging analysis, including disease classification, disease localization and segmentation, report generation, and image retrieval. We also identified limitations of current work, such as the limited amount of available annotated data and weak generalizability to other tasks. We further identified the potential future directions according to the identified limitations, including employing semisupervised frameworks to alleviate the need for annotated data and exploring task-agnostic models to provide better generalizability. Conclusions. We hope that our article will provide the readers with aggregated documentation of the state-of-the-art knowledge graph applications for medical imaging to encourage future research.


Surveillance of Noncommunicable Diseases: Opportunities in the Era of Big Data

Research Article

Stratification of Patients with Diabetes Using Continuous Glucose Monitoring Profiles and Machine Learning

Background. Continuous glucose monitoring (CGM) offers an opportunity for patients with diabetes to modify their lifestyle to better manage their condition and for clinicians to provide personalized healthcare and lifestyle advice. However, analytic tools are needed to standardize and analyze the rich data that emerge from CGM devices. This would allow glucotypes of patients to be identified to aid clinical decision-making. Methods. In this paper, we develop an analysis pipeline for CGM data and apply it to 148 diabetic patients with a total of 8632 days of follow up. The pipeline projects CGM data to a lower-dimensional space of features representing centrality, spread, size, and duration of glycemic excursions and the circadian cycle. We then use principal components analysis and -means to cluster patients’ records into one of four glucotypes and analyze cluster membership using multinomial logistic regression. Results. Glucotypes differ in the degree of control, amount of time spent in range, and on the presence and timing of hyper- and hypoglycemia. Patients on the program had statistically significant improvements in their glucose levels. Conclusions. This pipeline provides a fast automatic function to label raw CGM data without manual input.

Research Article

Large-Scale Social Media Analysis Reveals Emotions Associated with Nonmedical Prescription Drug Use

Background. The behaviors and emotions associated with and reasons for nonmedical prescription drug use (NMPDU) are not well-captured through traditional instruments such as surveys and insurance claims. Publicly available NMPDU-related posts on social media can potentially be leveraged to study these aspects unobtrusively and at scale. Methods. We applied a machine learning classifier to detect self-reports of NMPDU on Twitter and extracted all public posts of the associated users. We analyzed approximately 137 million posts from 87,718 Twitter users in terms of expressed emotions, sentiments, concerns, and possible reasons for NMPDU via natural language processing. Results. Users in the NMPDU group express more negative emotions and less positive emotions, more concerns about family, the past, and body, and less concerns related to work, leisure, home, money, religion, health, and achievement compared to a control group (i.e., users who never reported NMPDU). NMPDU posts tend to be highly polarized, indicating potential emotional triggers. Gender-specific analyses show that female users in the NMPDU group express more content related to positive emotions, anticipation, sadness, joy, concerns about family, friends, home, health, and the past, and less about anger than males. The findings are consistent across distinct prescription drug categories (opioids, benzodiazepines, stimulants, and polysubstance). Conclusion. Our analyses of large-scale data show that substantial differences exist between the texts of the posts from users who self-report NMPDU on Twitter and those who do not, and between males and females who report NMPDU. Our findings can enrich our understanding of NMPDU and the population involved.

Review Article

A Review of Three-Dimensional Medical Image Visualization

Importance. Medical images are essential for modern medicine and an important research subject in visualization. However, medical experts are often not aware of the many advanced three-dimensional (3D) medical image visualization techniques that could increase their capabilities in data analysis and assist the decision-making process for specific medical problems. Our paper provides a review of 3D visualization techniques for medical images, intending to bridge the gap between medical experts and visualization researchers. Highlights. Fundamental visualization techniques are revisited for various medical imaging modalities, from computational tomography to diffusion tensor imaging, featuring techniques that enhance spatial perception, which is critical for medical practices. The state-of-the-art of medical visualization is reviewed based on a procedure-oriented classification of medical problems for studies of individuals and populations. This paper summarizes free software tools for different modalities of medical images designed for various purposes, including visualization, analysis, and segmentation, and it provides respective Internet links. Conclusions. Visualization techniques are a useful tool for medical experts to tackle specific medical problems in their daily work. Our review provides a quick reference to such techniques given the medical problem and modalities of associated medical images. We summarize fundamental techniques and readily available visualization tools to help medical experts to better understand and utilize medical imaging data. This paper could contribute to the joint effort of the medical and visualization communities to advance precision medicine.

Research Article

Cost-Utility Analysis of Screening for Diabetic Retinopathy in China

Background. Diabetic retinopathy (DR) has been primarily indicated to cause vision impairment and blindness, while no studies have focused on the cost-utility of telemedicine-based and community screening programs for DR in China, especially in rural and urban areas, respectively. Methods. We developed a Markov model to calculate the cost-utility of screening programs for DR in DM patients in rural and urban settings from the societal perspective. The incremental cost-utility ratio (ICUR) was calculated for the assessment. Results. In the rural setting, the community screening program obtained 1 QALY with a cost of $4179 (95% CI 3859 to 5343), and the telemedicine screening program had an ICUR of $2323 (95% CI 1023 to 3903) compared with no screening, both of which satisfied the criterion of a significantly cost-effective health intervention. Likewise, community screening programs in urban areas generated an ICUR of $3812 (95% CI 2906 to 4167) per QALY gained, with telemedicine screening at an ICUR of $2437 (95% CI 1242 to 3520) compared with no screening, and both were also cost-effective. By further comparison, compared to community screening programs, telemedicine screening yielded an ICUR of 1212 (95% CI 896 to 1590) per incremental QALY gained in rural setting and 1141 (95% CI 859 to 1403) in urban setting, which both meet the criterion for a significantly cost-effective health intervention. Conclusions. Both telemedicine and community screening for DR in rural and urban settings were cost-effective in China, and telemedicine screening programs were more cost-effective.