Accessible Precision Health Information for Every American in Real Time
Sixty-eighty percent of the preventable risk of health disparities in marginalised populations is attributable to social, structural, and environmental factors such as food or housing insecurity, systemic racism, or chronic stress. 1,2. The complexity, multidimensionality, and heterogeneity of these determinants have made systematic approaches to addressing them challenging (Fig. 1). To better characterise and, ultimately, improve health and well-being through strategies tailored to individual context and need, cutting-edge precision health approaches are increasingly relying on large-scale person-generated health data from smartphones and wearables. 3,4. An unprecedented assessment of recursive, networked, and latent associations between everyday life and health can be made through the application of artificial intelligence and machine learning to data generated by individuals themselves, including their social, structural, and environmental exposures, behaviours, biometrics, and health outcomes. As a result, precision health presents a significant chance to close the gap in health outcomes between more privileged and less advantaged populations.
Although there is room for improvement in health equity, this is unlikely to occur without benchmark training datasets of person-generated health data being made available to the research community. This would allow for the creation of precision health models that are equally effective across diverse populations. An AI or ML system is only as good as its training data, and that data is inextricably linked to its validity and generalizability. In order to instil the highest standards of scientific transparency and rigour into model development, validation, and evaluation, the ideal benchmark dataset should feature high-quality, well-characterized data that comprehensively represent the target population.
Convenience sampling and/or ‘bring your own device’ designs are used in the majority of commercial studies and in the US National Institutes of Health’s All of Us research programme to collect person-generated health data. Because of this, people who are traditionally underrepresented in politics (the elderly, people of colour, indigenous peoples, the economically disadvantaged, and the medically infirm) are routinely left out of the political process. 5,6. Although nationally representative, the National Health and Nutrition Examination Survey is hampered in its ability to assess temporal effects and account for seasonality by its cross-sectional design and 1-week accelerometer measurement period. If there isn’t a standard dataset to compare results to, it could lead to more problems with bias, wider health gaps, and more harm for already vulnerable populations. 7,8.
For this reason
We developed American Life in Realtime (ALiR), a publicly accessible benchmark dataset, cohort, and research infrastructure for self-reported health data. Promoting inclusive representation, encouraging methodological rigour in artificial intelligence and machine learning, fostering interdisciplinary collaboration and transparency, and facilitating comprehensive exploration of the dynamic interplay between daily life and health are the four primary goals of ALiR that advance equitable precision health. Here, we highlight a few design options for accomplishing these goals, as well as applications of the data and infrastructure in precision medicine.
Several methods were employed to ensure that the ALiR cohort (n = 1,038) would be a broad cross-section of the adult population in the United States with respect to demographic, socioeconomic, and health-related factors9. Members of the Understanding America Study10 panel, a well-established probability-based survey panel drawn from a national random-sample of U.S. addresses, were invited to take part in the study. All study participants were given a Fitbit Inspire 2 as an incentive, and those without access to a computer were given a 4G Samsung Galaxy Tablet with access to the internet. We made sure that the study’s participant app worked on a wide variety of smartphones and operating systems, and we kept a support hotline open for those who needed it. We also oversampled people of colour, Native Americans, Polynesians, African-Americans, Asians, Hispanics, and people with less than a bachelor’s degree in order to account for potential underrepresentation due to factors beyond digital inclusion, such as mistrust or privacy concerns.
There are many benefits to using ALiR
Generalizable predictions of health outcomes in response to population-level stressors like current events, systemic racism, natural disasters, or surges in cases of SARS-CoV-2 infection are all examples where probability sampling improves the accuracy and validity of inference. Our findings show that supplying hardware levelled the playing field in terms of demographics. Since historically under-represented and marginalised populations were oversampled, we now have the statistical power to detect subgroup-specific differences, such as heterogeneity in outcomes experienced by Black and Latinx people (weights are also provided to rebalance the sample’s demographic composition to match the US population).
In an effort to strengthen the scientific credibility of AI and ML, we developed a systematic approach to data collection that includes validated, longitudinal measures (labels) of participant exposures, behaviours, and outcomes over extended periods of time (Table 1). The measures are based on consensus instruments developed by experts in the field, such as the Health and Retirement Study at the University of Michigan, the Patient Reported Outcomes Measurement Information System, and the Phenotypes and eXposures consensus measures developed by the US National Institutes of Health. You can encourage long-term commitment with points that can be redeemed for monetary compensation, streamline Fitbit integration with the app, conduct electronic surveys, send push notifications for announcements and reminders, and more with a bespoke mobile app.
To encourage development, adaptability, cooperation, and openness, we built a sturdy yet adaptable framework. Because it serves as both a data repository and a research hub, ALiR can facilitate fast, low-cost community cooperation. After each cohort-year of data is curated, beginning with year 1 in mid-2023, the data collected through ALiR will be made available to registered users through the Understanding America Study website. Making data publicly available promotes transparency, reproducibility, and explainability of results from statistical, AI, and ML analyses, provided that proper privacy and data-security safeguards are in place.
Participants, APIs from wearables, medical devices, the Internet of Things, genomics, and biomarkers; survey designs with preloaded information and skip logic, randomization, experiments, and ecological momentary assessments; and interactive communications with notifications and visual dashboards are all features that can be added to the research platform. By leveraging the Understanding America Study’s planned growth to 20,000 participants by 2025 and incorporating special populations like those with specific diseases managed by academic and industry partners, ALiR aims to achieve a truly large-scale sample. The source code will be made publicly available to promote harmonised data collection by other parties.
The ultimate goal of ALiR is to make precision health research more accessible, open, and multidisciplinary. Engineers can use the system to put their wares and sensors through their paces on a wide range of users. Methodologists may identify the causes or amplifying factors of selection biases, such as social and structural patterning of study participation and attrition, data quality and’missingness,’ and then create and test strategies to mitigate these biases. To better allocate public health resources, social scientists may examine the clustering and significance of social determinants in different populations. An example of a just-in-time intervention that could be developed by behavioural researchers is the passive detection of influenza-like symptoms via Fitbit data, which would result in a recommendation for SARS-CoV-2 testing. Caseworker and healthcare system integrations are two examples of areas that operations researchers may examine.
Together, the principles of ALiR provide a blueprint for fostering diversity, equity, inclusion, transparency, and multi-disciplinary collaboration within the field of precision health.