Home » Apple Watch Data Can Predict Your Health With 92% Accuracy

Apple Watch Data Can Predict Your Health With 92% Accuracy

An Apple-supported study introduces a new foundation model, the Wearable Behavior Model (WBM), trained on behavioral data from wearables to predict health conditions, demonstrating accuracy up to 92% and outperforming traditional sensor-based models in numerous tasks.

The preprint paper, titled “Beyond Sensor Data: Foundation Models of Behavioral Data from Wearables Improve Health Predictions,” emerged from the Apple Heart and Movement Study (AHMS). Researchers developed WBM by training it on over 2.5 billion hours of wearable data. This model demonstrated the ability to match and, in some instances, surpass the performance of existing models that rely on low-level sensor data. Unlike previous health-related foundation models, which predominantly utilized raw sensor streams such as photoplethysmograph (PPG) data from heart rate sensors or electrocardiograph (ECG) data, WBM learns directly from higher-level behavioral metrics. These metrics include step count, gait stability, mobility, and VO₂ max, all of which are abundantly generated by the Apple Watch.

The study elaborates on the rationale behind focusing on behavioral data over raw sensor inputs. Consumer wearables like smartwatches and fitness trackers provide extensive information across various health domains. Detecting a static health state, such as a history of smoking, a past diagnosis of hypertension, or current medication use like beta-blockers, is a critical aspect of health monitoring. Similarly, identifying a transient health state, such as sleep quality or pregnancy, is also crucial. The data required for these predictions typically aligns with the temporal resolution of human behavior, spanning days and weeks, rather than the second-level time scales at which raw sensor data is collected from wearables.

While much of the previous research concentrated on modeling low-level sensor data or its simplified features, higher-level behavioral information from wearables, including physical activity, cardiovascular fitness, and mobility metrics, represents a more natural data type for these detection tasks. These higher-level behavioral metrics are computed using validated algorithms derived from raw sensor data. Experts intentionally select these metrics because they align with physiologically relevant quantities and health states.

Crucially, these data points are sensitive to an individual’s behaviors, rather than being driven solely by physiology. These characteristics render behavioral data particularly promising for health detection tasks. For instance, mobility metrics that characterize walking gait and overall activity levels may serve as important behavioral factors in detecting changing health states, such as pregnancy.


Apple Watch can now ring in silent mode


Although the Apple Watch gathers raw sensor data, this data can be noisy, overwhelming, and may not consistently correlate with meaningful health events. While WBM’s metrics are derived from this sensor data, the information is refined to emphasize real-world behaviors and health-relevant trends. These refined metrics are more stable, easier to interpret, and better structured for modeling long-term health patterns. Essentially, WBM learns from the patterns present in processed behavioral data, rather than directly from raw sensor signals.

WBM was trained using data from 161,855 participants in the AHMS, incorporating Apple Watch and iPhone data. Instead of raw data streams, the model processed 27 human-interpretable behavioral metrics. These included active energy, walking pace, heart rate variability, respiratory rate, and sleep duration. The data was organized into weekly blocks and processed through a new architecture built upon Mamba-2, which demonstrated superior performance compared to traditional Transformers, such as those forming the basis for GPT, in this specific application.

When evaluated across 57 health-related tasks, WBM outperformed a strong PPG-based model in 18 of the 47 static health prediction tasks, which include determining if an individual takes beta-blockers. Furthermore, WBM excelled in all but one of the dynamic tasks, such as detecting pregnancy, sleep quality, or respiratory infection; the sole exception was diabetes, where PPG alone yielded better results. Combining both WBM and PPG data representations produced the most accurate overall results.

This hybrid model achieved an accuracy of 92% for pregnancy detection and demonstrated consistent improvements in tasks related to sleep quality, infection, injury, and cardiovascular conditions like Afib detection. The study concludes that WBM does not aim to replace sensor data but rather to complement it. WBM effectively captures long-range behavioral signals, while PPG identifies short-term physiological changes, and their combined use enhances the early detection of significant health shifts.


Featured image credit

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *