Health Data Science

Submit Your Research on Ethics Issues in Health Data Science!

The journal publishes research on the application of cutting-edge technologies and analytic approaches for the health field, including research concerning ethics issues.

Learn how to submit

Journal profile

The open access journal Health Data Science, published in association with PKU, publishes innovative, scientifically-rigorous research to advance health data science. 

Editorial board

Health Data Science's Editorial Board is led by Qimin Zhan (Chinese Academy of Engineering and Peking University) and is comprised of active researchers and experts in health data science from around the world.

Why publish with us?

• Rapid publication: We use the best systems and processes to ensure efficiency and quality.

• Open access: Articles are free to publish through December 2024 and will always be free to read for everyone.

• Impact: Journal articles are promoted by our expert marketing team.

Latest Articles

More articles
Research Article

Characterizing Discourse about COVID-19 Vaccines: A Reddit Version of the Pandemic Story

It has been one year since the outbreak of the COVID-19 pandemic. The good news is that vaccines developed by several manufacturers are being actively distributed worldwide. However, as more and more vaccines become available to the public, various concerns related to vaccines become the primary barriers that may hinder the public from getting vaccinated. Considering the complexities of these concerns and their potential hazards, this study is aimed at offering a clear understanding about different population groups’ underlying concerns when they talk about COVID-19 vaccines—particularly those active on Reddit. The goal is achieved by applying LDA and LIWC to characterize the pertaining discourse with insights generated through a combination of quantitative and qualitative comparisons. Findings include the following: (1) during the pandemic, the proportion of Reddit comments predominated by conspiracy theories outweighed that of any other topics; (2) each subreddit has its own user bases, so information posted in one subreddit may not reach that from other subreddits; and (3) since users’ concerns vary across time and subreddits, communication strategies must be adjusted according to specific needs. The results of this study manifest challenges as well as opportunities in the process of designing effective communication and immunization programs.

Research Article

A Framework for Assessing Import Costs of Medical Supplies and Results for a Tuberculosis Program in Karakalpakstan, Uzbekistan

Background. Import of medical supplies is common, but limited knowledge about import costs and their structure introduces uncertainty to budget planning, cost management, and cost-effectiveness analysis of health programs. We aimed to estimate the import costs of a tuberculosis (TB) program in Uzbekistan, including the import costs of specific imported items. Methods. We developed a framework that applies costing and cost accounting to import costs. First, transport costs, customs-related costs, cargo weight, unit weights, and quantities ordered were gathered for a major shipment of medical supplies from the Médecins Sans Frontières (MSF) Procurement Unit in Amsterdam, the Netherlands, to a TB program in Karakalpakstan, Uzbekistan, in 2016. Second, air freight, land freight, and customs clearance cost totals were estimated. Third, total import costs were allocated to different cargos (standard, cool, and frozen), items (e.g., TB drugs), and units (e.g., one tablet) based on imported weight and quantity. Data sources were order invoices, waybills, the local MSF logistics department, and an MSF standard product list. Results. The shipment contained 1.8 million units of 85 medical items of standard, cool, and frozen cargo. The average import cost for the TB program was 9.0% of the shipment value. Import cost varied substantially between cargos (8.9–28% of the cargo value) and items (interquartile range 4.5–35% of the item value). The largest portion of the total import cost was caused by transport (82–99% of the cargo import cost) and allocated based on imported weight. Ten (14%) of the 69 items imported as standard cargo were associated with 85% of the standard cargo import cost. Standard cargo items could be grouped based on contributing to import costs predominantly through unit weight (e.g., fluids), imported quantity (e.g., tablets), or the combination of unit weight and imported quantity (e.g., items in powder form). Conclusion. The cost of importing medical supplies to a TB program in Karakalpakstan, Uzbekistan, was sizable, variable, and driven by a subset of imported items. The framework used to measure and account import costs can be adapted to other health programs.

Review Article

Social and Behavioral Determinants of Health in the Era of Artificial Intelligence with Electronic Health Records: A Scoping Review

Background. There is growing evidence that social and behavioral determinants of health (SBDH) play a substantial effect in a wide range of health outcomes. Electronic health records (EHRs) have been widely employed to conduct observational studies in the age of artificial intelligence (AI). However, there has been limited review into how to make the most of SBDH information from EHRs using AI approaches. Methods. A systematic search was conducted in six databases to find relevant peer-reviewed publications that had recently been published. Relevance was determined by screening and evaluating the articles. Based on selected relevant studies, a methodological analysis of AI algorithms leveraging SBDH information in EHR data was provided. Results. Our synthesis was driven by an analysis of SBDH categories, the relationship between SBDH and healthcare-related statuses, natural language processing (NLP) approaches for extracting SBDH from clinical notes, and predictive models using SBDH for health outcomes. Discussion. The associations between SBDH and health outcomes are complicated and diverse; several pathways may be involved. Using NLP technology to support the extraction of SBDH and other clinical ideas simplifies the identification and extraction of essential concepts from clinical data, efficiently unlocks unstructured data, and aids in the resolution of unstructured data-related issues. Conclusion. Despite known associations between SBDH and diseases, SBDH factors are rarely investigated as interventions to improve patient outcomes. Gaining knowledge about SBDH and how SBDH data can be collected from EHRs using NLP approaches and predictive models improves the chances of influencing health policy change for patient wellness, ultimately promoting health and health equity.

Review Article

Analysis of COVID-19 Guideline Quality and Change of Recommendations: A Systematic Review

Background. Hundreds of coronavirus disease 2019 (COVID-19) clinical practice guidelines (CPGs) and expert consensus statements have been developed and published since the outbreak of the epidemic. However, these CPGs are of widely variable quality. So, this review is aimed at systematically evaluating the methodological and reporting qualities of COVID-19 CPGs, exploring factors that may influence their quality, and analyzing the change of recommendations in CPGs with evidence published. Methods. We searched five electronic databases and five websites from 1 January to 31 December 2020 to retrieve all COVID-19 CPGs. The assessment of the methodological and reporting qualities of CPGs was performed using the AGREE II instrument and RIGHT checklist. Recommendations and evidence used to make recommendations in the CPGs regarding some treatments for COVID-19 (remdesivir, glucocorticoids, hydroxychloroquine/chloroquine, interferon, and lopinavir-ritonavir) were also systematically assessed. And the statistical inference was performed to identify factors associated with the quality of CPGs. Results. We included a total of 92 COVID-19 CPGs developed by 19 countries. Overall, the RIGHT checklist reporting rate of COVID-19 CPGs was 33.0%, and the AGREE II domain score was 30.4%. The overall methodological and reporting qualities of COVID-19 CPGs gradually improved during the year 2020. Factors associated with high methodological and reporting qualities included the evidence-based development process, management of conflicts of interest, and use of established rating systems to assess the quality of evidence and strength of recommendations. The recommendations of only seven (7.6%) CPGs were informed by a systematic review of evidence, and these seven CPGs have relatively high methodological and reporting qualities, in which six of them fully meet the Institute of Medicine (IOM) criteria of guidelines. Besides, a rapid advice CPG developed by the World Health Organization (WHO) of the seven CPGs got the highest overall scores in methodological (72.8%) and reporting qualities (83.8%). Many CPGs covered the same clinical questions (it refers to the clinical questions on the effectiveness of treatments of remdesivir, glucocorticoids, hydroxychloroquine/chloroquine, interferon, and lopinavir-ritonavir in COVID-19 patients) and were published by different countries or organizations. Although randomized controlled trials and systematic reviews on the effectiveness of treatments of remdesivir, glucocorticoids, hydroxychloroquine/chloroquine, interferon, and lopinavir-ritonavir for patients with COVID-19 have been published, the recommendations on those treatments still varied greatly across COVID-19 CPGs published in different countries or regions, which may suggest that the CPGs do not make sufficient use of the latest evidence. Conclusions. Both the methodological and reporting qualities of COVID-19 CPGs increased over time, but there is still room for further improvement. The lack of effective use of available evidence and management of conflicts of interest were the main reasons for the low quality of the CPGs. The use of formal rating systems for the quality of evidence and strength of recommendations may help to improve the quality of CPGs in the context of the COVID-19 pandemic. During the pandemic, we suggest developing a living guideline of which recommendations are supported by a systematic review for it can facilitate the timely translation of the latest research findings to clinical practice. We also suggest that CPG developers should register the guidelines in a registration platform at the beginning for it can reduce duplication development of guidelines on the same clinical question, increase the transparency of the development process, and promote cooperation among guideline developers all over the world. Since the International Practice Guideline Registry Platform has been created, developers could register guidelines prospectively and internationally on this platform.

Review Article

Cognitive Computing-Based CDSS in Medical Practice

Importance. The last decade has witnessed the advances of cognitive computing technologies that learn at scale and reason with purpose in medicine studies. From the diagnosis of diseases till the generation of treatment plans, cognitive computing encompasses both data-driven and knowledge-driven machine intelligence to assist health care roles in clinical decision-making. This review provides a comprehensive perspective from both research and industrial efforts on cognitive computing-based CDSS over the last decade. Highlights. (1) A holistic review of both research papers and industrial practice about cognitive computing-based CDSS is conducted to identify the necessity and the characteristics as well as the general framework of constructing the system. (2) Several of the typical applications of cognitive computing-based CDSS as well as the existing systems in real medical practice are introduced in detail under the general framework. (3) The limitations of the current cognitive computing-based CDSS is discussed that sheds light on the future work in this direction. Conclusion. Different from medical content providers, cognitive computing-based CDSS provides probabilistic clinical decision support by automatically learning and inferencing from medical big data. The characteristics of managing multimodal data and computerizing medical knowledge distinguish cognitive computing-based CDSS from other categories. Given the current status of primary health care like high diagnostic error rate and shortage of medical resources, it is time to introduce cognitive computing-based CDSS to the medical community which is supposed to be more open-minded and embrace the convenience and low cost but high efficiency brought by cognitive computing-based CDSS.

Research Article

Mobile Phone-Based Population Flow Data for the COVID-19 Outbreak in Mainland China

Background. Human migration is one of the driving forces for amplifying localized infectious disease outbreaks into widespread epidemics. During the outbreak of COVID-19 in China, the travels of the population from Wuhan have furthered the spread of the virus as the period coincided with the world’s largest population movement to celebrate the Chinese New Year. Methods. We have collected and made public an anonymous and aggregated mobility dataset extracted from mobile phones at the national level, describing the outflows of population travel from Wuhan. We evaluated the correlation between population movements and the virus spread by the dates when the number of diagnosed cases was documented. Results. From Jan 1 to Jan 22 of 2020, a total of 20.2 million movements of at-risk population occurred from Wuhan to other regions in China. A large proportion of these movements occurred within Hubei province (84.5%), and a substantial increase of travels was observed even before the beginning of the official Chinese Spring Festival Travel. The outbound flows from Wuhan before the lockdown were found strongly correlated with the number of diagnosed cases in the destination cities (log-transformed). Conclusions. The regions with the highest volume of receiving at-risk populations were identified. The movements of the at-risk population were strongly associated with the virus spread. These results together with province-by-province reports have been provided to governmental authorities to aid policy decisions at both the state and provincial levels. We believe that the effort in making this data available is extremely important for COVID-19 modelling and prediction.