Category: Product

Moving Beyond Diagnoses: Using Real-World PHQ-9 Data to Identify Appetite-Related Metabolic Risk in Depression and Anxiety

Post author By Jenessa Mozdy
Post date July 8, 2026

Depression and anxiety are common conditions associated with a range of physical health outcomes, including changes in body weight. However, risk is not uniform across all patients. Two individuals with the same diagnosis may have very different symptom profiles and very different metabolic risks.

This symptom variability raises an important question: Can routinely collected symptom data help identify patients who may be at higher risk for clinically meaningful weight outcomes?

Many real-world data sources capture diagnoses, procedures, and medications, but few contain the depth of clinical information needed to understand symptom-level variation within a disease. Integrated electronic health record data can provide access to routinely collected patient-reported outcomes, such as the Patient Heath Questionnaire-9 (PHQ-9) responses, which is a questionnaire assessing mental health symptoms across several domains. The integration of the PHQ-9 with clinical measurements like body mass index (BMI) and comorbidities creates an opportunity to move beyond diagnosis codes and evaluate how specific symptoms may relate to meaningful health outcomes.

To explore this question, we analyzed real-world clinical data from nearly 2 million encounters among adults with depression and/or anxiety who had documented BMI measurements and item-level responses to the PHQ-9 within the OMNY Health real-world data platform. Rather than focusing only on diagnosis-level measures, we examined PHQ-9 item 5, which captures appetite-related symptoms (“poor appetite or overeating”).

Our findings showed that appetite dysregulation was associated with meaningful differences in BMI outcomes. Higher severity of appetite-related symptoms was associated with increased likelihood of both underweight and severe obesity, suggesting that appetite-related symptoms may identify patients at risk for weight extremes.

Importantly, the relationship was not simply driven by obesity overall. The strongest pattern was observed for class II-III obesity, while class I obesity remained relatively stable across appetite symptom severity levels. This result suggests that symptom-level data may help identify patients with more clinically significant metabolic risk profiles.

These findings persisted even after accounting for demographic characteristics, antidepressant and antipsychotic use, and cardiometabolic comorbidities including diabetes, hypertension, and dyslipidemia.

A key takeaway is that routinely collected clinical information, such as PHQ-9 responses, when documented and accessible, can provide value beyond traditional diagnosis categories. Item-level patient-reported outcomes may offer scalable opportunities to better understand heterogeneity within populations and support more personalized approaches to care.

As real-world data continues to expand, leveraging the depth of information already captured in clinical workflows may help uncover new insights into disease patterns, patient risk, and opportunities for intervention.

——-

This work was presented as a podium presentation at the 2026 ISPOR Annual Meeting in Philadelphia, highlighting the value of rich real-world clinical data and patient-reported outcomes for generating actionable evidence.

Product

OMNY Health at ISPOR 2026: Driving the Future of Evidence Generation

Post author By Jenessa Mozdy
Post date June 4, 2026

What a week at ISPOR 2026!

The OMNY Health team had a highly productive and impactful showing at this year’s annual conference. We anchored our presence with one podium presentation and six poster presentations—including a standout client-led poster utilizing OMNY Health data.

Our featured research spanned a diverse array of therapeutic areas, economic evaluations, and advanced methodological frameworks, demonstrating our robust capabilities in health economics and outcomes research (HEOR). From mental health and metabolic outcomes to respiratory and immunology research, these studies highlight the critical role that high-quality real-world data (RWD) plays in modern evidence generation.

Beyond our own presentations, our team spent the week engaging with the definitive themes shaping the future of HEOR: the evolution of AI adoption and advanced evidence generation in oncology.

A major takeaway from the sessions on AI and Large Language Models (LLMs) is the critical shift from manual abstraction to automated, scalable data enrichment.

The Challenge: The industry faces an unprecedented demand for RWD, but a massive volume of clinical narrative remains locked in unstructured electronic health records (EHRs) and physician notes. Traditional manual abstraction is costly, slow, and unsustainable.

The Methodological Solution: Discussions highlighted advanced AI optimization frameworks—splitting AI into Machine Learning (ML) for predictive risk profiling and Natural Language Processing (NLP) for clinical text recognition. Leading methodologies are deploying multi-layered approaches (combining web scraping, PDF processing, and rule-based parsing with prompt-specific LLMs) alongside quality frameworks like the Kahn Framework to eliminate false negatives and capture missing variables (such as nuanced Social Determinants of Health).

The OMNY Advantage: At OMNY Health, this is exactly where our AI services add distinct value. By converting unstructured text into structured, normalized data spaces , we bypass traditional site-visitation. We augment existing HEOR methods to deliver high-fidelity, generalizable datasets ready for advanced applications like digital twins and regulatory-grade evidence generation.

Oncology continues to be a frontier where conventional evidence paradigms fall short, especially as discussed in the cancer plenary on the financial toxicity of care and the rare cancer forums.

The Challenge: Generating credible real-world evidence (RWE) in oncology—particularly rare cancers—is routinely restricted by small sample sizes, single-arm designs, and highly fragmented data across the care continuum. Furthermore, regulatory bodies and Health Technology Assessment bodies (HTAbs) demand strict adherence to data quality frameworks.

The Methodological Solution: To accelerate market access and build stakeholder trust, the industry is moving toward advanced methods like target trial emulation, external control arms, and robust EHR-derived data quality frameworks to validate longitudinal patient journeys.

The OMNY Advantage: Our specialized oncology data offering directly answers this call. OMNY’s large, US-based oncology EHR-derived database is built with rigorous regulatory data quality frameworks in mind. By integrating medicines, molecular testing, and medical services across the care continuum, we provide life sciences companies and researchers with the deep, longitudinally complete data required to evaluate complex variables, track treatment adherence, and demonstrate true patient-centered value.

A huge thank you to everyone who stopped by our sessions, visited our posters, or took the time to connect with our team! We loved catching up with familiar faces and sparking new collaborations.

The conversations we had reinforce how vital high-quality RWD and intelligent AI extraction are to shaping the future of healthcare. We’re looking forward to continuing these discussions, supporting regulatory buy-in, and driving the industry forward together.

Missed us at the event but want to learn more about our advanced oncology data capabilities or AI-driven RWD solutions? Let’s connect!

Product

Lung Function Assessment in Patients with Persistent Asthma: Impact of Disease Severity and Acute Exacerbation Status ©

Post author By Jenessa Mozdy
Post date December 17, 2025

By Lawrence Rasouliyan, Amanda G. Althoff, and Danae A. Black | OMNY Health

Understanding the Role of Lung Function in Asthma Management

Lung function monitoring is important in the management of asthma, and it provides valuable information to clinicians on disease control and patient response.

Pulmonary function tests are common tests conducted in clinical settings, yet they are infrequently documented as measurements in free-text notes or tabulated sources of real-world data.

Meanwhile, disease severity and acute exacerbations are usually available based on the diagnosis codes such as the ICD-10.

However, the association between coded severity and measured indices of lung function has not been well characterized in routine clinical care data on a large scale.

This gap was filled by our team at OMNY Health, which analyzed the relationship of lung function measures with severity and exacerbation status in subjects with persistent asthma.

Study Overview

Using electronic health record (EHR) data from 2017 to 2024, we analyzed information from three integrated delivery networks included in the OMNY Health real-world data platform.

Patients were included if they had an ICD-10 code for persistent asthma classified as mild, moderate, or severe—either with or without an acute exacerbation. The relevant ICD-10 codes used for classification are shown below.

To be included, patients also needed at least one documented lung function measurement—specifically, forced expiratory volume in one second (FEV₁) percent predicted (pp), forced vital capacity (FVC) pp, or FEV₁/FVC pp—associated with an asthma-related encounter.

Key Findings

Out of approximately one million patients identified with an asthma ICD-10 code indicating severity and exacerbation status, 14,003 patients (across 31,463 encounters) had corresponding lung function data available.

Across all severities, lung function metrics declined with increasing asthma severity, and patients experiencing exacerbations consistently had lower lung function compared to those without exacerbations.

What the Data Suggests

Findings have shown a considerable decrease in mean lung function values because asthma severity increased—regardless of exacerbation status.

Patients experiencing exacerbations had consistently lower FEV₁, FVC, and FEV₁/FVC metrics as compared to the ones with no exacerbations.

Most interestingly, when we compared ICD-10–coded severity to typical clinical cutoffs for lung function, the correspondence was not strong enough.

This undoubtedly suggests that ICD-10 coding alone may not fully capture physiological severity, emphasizing the importance of integrating structured and unstructured lung function data into real-world datasets.

Why It Matters

By leveraging structured EHR data, this study highlights the potential to better understand asthma progression and treatment outcomes across real-world populations.

The results reinforce the value of using lung function metrics—not just diagnosis codes—to assess disease burden and guide more precise asthma management strategies.

References

Levy ML, et al. NPJ Prim Care Respir Med. 2023;33(1):7.

Firoozi F, et al. Thorax. 2007;62(7):581–7.

Gronkiewicz C, et al. Chest. 2015;147(4):1152–1160.

Xie F, et al. JMIR AI. 2025;4:e69132.

Tags asthma, data-driven insights, healthcare, lungfunction, rwd

Product

Development and Validation of an N-gram Model to Differentiate Between Melena and Hematochezia Using Unstructured EHR Notes ©

Post author By Jenessa Mozdy
Post date December 3, 2025

By Vikas Kumar, Yan Wang, and Lawrence Rasouliyan | OMNY Health

Extracting Meaningful Insights from Free-Text EHR Data

Extracting meaningful information from unstructured data is never easy, It’s often a time-consuming task. In structured EHR data for instance, certain values such as diagnosis codes are often insufficient to capture the full context. The same is mostly true for gastrointestinal bleeding, where a slight inaccuracy might cause significant implications for both diagnosis and treatment.

One such challenge which our team at OMNY Health tried to solve is the differentiation between two forms of GI bleeding: melena, black tarry stools, usually caused by upper GI bleeding, and hematochezia, bright red blood in stool, usually from lower GI bleeding. Although both are captured under general ICD-10 codes, namely K92.1 and K92.2, respectively, these do not make a distinction between the two. Our objective was to bridge this gap using unstructured data and machine learning.

Building the Model: From Clinical Notes to Meaningful Insights

We conducted a retrospective observational study using the OMNY Health Real-World Data Platform (2017–2025). Patients with ICD-10 codes starting with “K” (gastrointestinal diseases) or “E” (endocrine diseases) were included. A clinical domain expert reviewed 1,000 random clinical notes from patients with GI bleed–related codes (K92.1 or K92.2) to identify phrases that indicated either melena or hematochezia.

Through this manual review, our team identified 28 phrases for melena and 51 phrases for hematochezia, which were then used to build two separate N-gram models. These models searched millions of notes across the dataset to identify encounters associated with each condition.

These models are validated against real-world clinical outcomes to ensure reliability. We compared the rates of upper versus lower GI diagnoses, endoscopic procedures (EGD vs. colonoscopy), and pharmacologic treatments within 30 days following each encounter.

Results: Real-World Validation That Reflects Clinical Reality

Our validation showed that the N-gram models accurately differentiated between the two GI bleeding types.

Precision: 96% for melena; 98% for hematochezia

Recall: 7.9% for melena; 5.3% for hematochezia

TABLE 1 — Samples of Phrases Used for Melena and Hematochezia N-gram

(Note: samples shown; complete list of phrases used to train each model is available in OMNY Health’s internal dataset.)

Patients identified with melena were more likely to have an upper GI diagnosis and to undergo esophagogastroduodenoscopy (EGD). Conversely, patients with hematochezia were more likely to have lower GI diagnoses and receive colonoscopy procedures. These results aligned closely with clinical expectations, reinforcing the accuracy and validity of our models.

Figure 1. Validation Outcomes for Melena and Hematochezia N-gram Models

Why It Matters: A Step Toward Richer Real-World Evidence

The ability to differentiate between melena and hematochezia in unstructured EHR data proves to be more beneficial for more granular, clinically meaningful insights. This allows researchers and healthcare organizations to:

Better characterize patient populations
Refine outcome measures for GI bleeding studies
Support drug safety and effectiveness research with higher precision

OMNY Health platform helps researchers to unlock the full potential of real-world clinical information, i.e. turning free-text notes into actionable insights, hence improving care delivery and research quality.

Looking Ahead

The study demonstrates how natural language processing (NLP) can be effective in bridging gaps in structured EHR data. As we continue validating these models, our primary focus remains on empowering researchers, clinicians, and life science partners with trustworthy, real-world data solutions.

Tags Clinical Notes, data-driven insights, EHR, healthcare, life sciences

Product

Characterizing Latent Tuberculosis and Screening Among Individuals Diagnosed with Active Tuberculosis Disease in the United States ©

Post author By Jenessa Mozdy
Post date November 3, 2025

By Danae A. Black, Amanda Mummert, Amanda G. Althoff, and Lawrence Rasouliyan | OMNY Health

Tuberculosis (TB) has been a continuous public health challenge in the United States. Even though there are numerous screening and treatment options available, the burden of active TB disease still rises. Almost 80% of TB cases reactivate latent TB infection (LTBI). Timely identification of at-risk individuals and understanding their screening patterns is primarily important to reduce disparities in care.

Building the Study: From EHR Data to Insights

Our research team at OMNY Health conducted a retrospective, observational study using the OMNY Health Real-World Data Platform (2020–2024), which integrates electronic health records (EHRs) from multiple U.S. health systems. The goal was to describe individuals diagnosed with respiratory TB and evaluate patterns of TB screening and latent TB diagnosis before the onset of active TB disease.

Study Design and Methods

OMNY Health dataset identified the patients with respiratory TB (ICD-10-CM: A15). The earliest date of respiratory TB diagnosis was considered the index date. Demographic characteristics and social determinants of health (SDoH) were summarized at the index date or during the pre-index period.

Utilization of TB screening procedures (CPT: 86480, 86481, 86580; ICD-10-CM: Z11.1) or diagnosis of latent TB (ICD-10-CM: Z22.7, Z86.15) was evaluated during the pre-index period. Descriptive statistics were reported for all variables of interest.

Results: Identifying Patterns in Screening and Latent TB

Between the years 2020-2024, total 6,538 cases of respiratory TB were diagnosed. Among these cases, 238 had a latent TB diagnosis code before the index date and 20% showed evidence of TB screening prior to diagnosis.

TB testing was significantly higher among females and younger people, whereas it was lower among nonwhite and Hispanic groups. There had been more latent TB recorded among Hispanic individuals. This is consistent with high-risk profiles often seen among overseas-born populations or the ones travelling to counties where it is common.

Approximately 5% of the population had data available on social health determinants, which revealed certain transportation and education barriers impeding prevention measures or treatment adherence.

Figure 1. TB Prevention Pathway

Figure 2. Study Population Demographics Characteristics, by TB Status

Figure 3. Percentage of Affirmative Responses Across Social Determinants of Health Domains, by TB Status

Why It Matters: Addressing Gaps in TB Prevention

The study emphasizes early detection and screening to better prevent TB reactivation. The differences identified in screening rates indicate that demographic and social factors play a vital role to prevent TB. More targeted interventions can be developed to reduce inequities and improve outcomes if proper identification of populations (with limited access to screening and care) is done.

Looking Ahead

OMNY Health leverages real-world EHR data to better understand patient journeys enhancing preventive care. Future research is aimed to expand the SDoH integration into predictive modeling and public health decision-making. This will be helpful to bridge the gap between data and actionable outcomes.

References

Centers for Disease Control and Prevention. National Data: Reported Tuberculosis in the United States, 2023. Reported Tuberculosis in the United States, 2023. 2024 Nov 7. Accessed February 10, 2025. https://www.cdc.gov/tb-surveillance-report-2023/summary/national.html

US Preventive Services Task Force. Screening for Latent Tuberculosis Infection in Adults: US Preventive Services Task Force Recommendation Statement. JAMA. 2023;329(17):1487–1494. doi:10.1001/jama.2023.4899C.

Tags healthcare, research

Product

Unlocking the Hidden Details: Why Unstructured Clinical Notes Are Crucial for Life Sciences

Post author By Jenessa Mozdy
Post date October 20, 2025

In the world of life sciences, data is king. For decades, the focus has largely been on structured data, including neatly organized tables, registries with predefined fields, clinical trial results captured in clinical report forms (CRFs), and insurance claims data. While undeniably valuable, this structured data often tells only part of the story.

The real goldmine, often overlooked and underutilized, lies within unstructured clinical text. These free-text narratives – physician notes, discharge summaries, pathology reports, and radiology findings, to name a few types – contain a wealth of detailed, nuanced, and patient-specific information that rows and columns simply cannot capture fully.

For life sciences companies, understanding and extracting insights from this unstructured data is no longer a luxury, but a necessity.

Here’s why:

The Limitations of Structured Data

Imagine trying to understand a complex patient journey solely from a checklist. Structured data, by its very nature, simplifies and categorizes. It’s excellent for tracking demographics, diagnosis codes, medication lists, and lab results. However, it often misses:

Nuance and Context: The why behind a diagnosis, the specific symptoms a patient described, or the subtle changes in their condition over time. A coded diagnosis of “headache” doesn’t reveal if it’s a throbbing migraine, a dull ache, or accompanied by visual disturbances. In fact, most initial visit notes contain a History of Present Illness (HPI) that document the seven cardinal features of the patient’s reason for the encounter:

- Onset: When did the symptoms start? (The beginning time/date).

- Location: Where on the body is the symptom? Does it radiate or travel anywhere else?

- Duration: How long does the symptom last when it occurs? (e.g., seconds, hours, constant).

- Character (or Quality): What does it feel like or look like? (e.g., sharp, dull, throbbing, crushing, burning).

- Aggravating/Alleviating Factors: What makes it better or worse? (e.g., movement, rest, food, medication).

- Radiation (or Related Symptoms): Does the symptom move to another part of the body (Radiation)? Or are there any other symptoms that occur with the primary one (Associated Symptoms)?

- Timing (or Temporal characteristics): When does it occur? (e.g., constantly, intermittently, only in the morning, with exertion).

- Severity (or Scale): How bad is the symptom? (Usually rated on a scale, such as 1-10 for pain).

Patient History Beyond Codes: Family history details, lifestyle factors, or environmental exposures that might not fit into a predefined field.

- Family History: Particularly important for oncology and cardiovascular disease, involves identifying which nuclear/extended family members had related conditions.

- Lifestyle Factors: Smoking, alcohol usage, illicit drug use, living situation, social determinants of health, and sexual health are known to be important risk factors but often omitted from structured data.

- Occupational / Environmental Exposures: Extra information that sheds light on risk factors for diseases including cancer and asthma.

Treatment Rationale and Adjustments: Why a particular treatment was chosen / switched to / switched from / discontinued, how a patient responded to it, and subsequent modifications.

Rare Disease Insights: For conditions with limited structured data such as their own ICD-10 codes, the narrative of clinical notes becomes even more critical.

Four Applications of Unstructured Data

Now that we have established how notes can be delivered to researchers, how exactly can they be used to enhance clinical knowledge?

After de-identification, the possibilities are endless. Below we describe four common applications.

Extraction of Disease Severity: As opposed to the clinical trial world in which key outcomes are dutifully and regularly recorded, in the real-world researchers are reliant on physician record-keeping to identify the waxing and waning of disease progression. And often, these outcomes, also known as severity measures, are found in free-text notes. Learn more about how researchers at OMNY Health have been extracting information about disease severity from Notes since 2022 with the use of transformer-based pipelines and more recently with large language models (LLMs).

Identifying Reasons for Treatment Discontinuation: Rollouts of newly developed drugs cost pharmaceutical companies millions of dollars; with that amount of investment, it becomes imperative to know more about why new drugs are being discontinued by patients/physicians, and which drugs are taking their place. At OMNY Health, we have built various pipelines for extracting this information from clinical notes, again using both transformer-based methods and LLMs.

Researching Rare Diseases: For rare diseases, clinical note repositories can be particularly useful in pooling large numbers of patients having such diseases and establishing basic clinical knowledge about them – e.g. What patients are at risk? Why do some patients experience flares? What treatments work best? An example is work that OMNY Health completed in partnership with a life sciences company on generalized pustular psoriasis (GPP). Notes can also be used to find patients exhibiting symptoms or characteristics that might be consistent with undiagnosed rare diseases. Clinical Notes can help identify patients that are candidates for genetic tests that could potentially validate a rare disease diagnosis.

Training AI Models: In the age of Generative AI and LLMs, it is becoming more important than ever to find reputable sources of healthcare data (read: not the Internet) with which to train healthcare-specific LLMs that can reason without harmful biases. Need a proven source of de-identified clinical notes from diverse populations and provider mixes with which to train your LLM? We have made it possible.

Learn More about OMNY Notes – Contact Us!

At OMNY Health, we would love to discuss how our OMNY Notes product combined with our structured data offerings can support your clinical research initiatives and ultimately improve health outcomes. Please contact us at info@omnyhealth.com. We look forward to hearing from you!

Product

Powering Research with LLMs and OMNY Health’s De-identified Clinical Notes in Cystic Fibrosis

Post author By Jenessa Mozdy
Post date September 29, 2025

OMNY Health Notes when used in tandem with publicly accessible large language models can generate novel insights that previously were only accessible with labor– intensive chart reviews.

Large language models (LLMs) have advanced dramatically since they were first introduced to the mainstream public a few years ago. When combined with large, nationally representative de-identified data sets like the OMNY Notes product they can deliver insights in just a few minutes that previously would have required months of effort, and also at a fraction of the cost. Check out this brief demonstration on how today’s LLMs and OMNY Notes can be used together to enhance clinical research, using notes for patients treated with cystic fibrosis as a sample use case.

Thank you for watching our demo. Please contact us using the email address/link provided in the video to discover how your research team can harness novel insights using OMNY Notes.

Product

Unlocking the Full Story: The Power of Clinical Notes in Real-World Data

Post author By Jona Kerma
Post date July 21, 2025

The year is 2025, or more than fifteen years since the enaction of The HITECH Act and Meaningful Use. Almost all of the clinical data recorded from ordinary Americans’ physician office visits and hospital stays have now shifted to electronic format. Therefore, increasing emphasis is gradually being placed on the value of real-world data, with the hope that medical knowledge resulting in care improvements can be extracted from the vast amount of information that exists in electronic health records.

Structured vs. Unstructured Clinical Data

This electronic clinical data can be subdivided into two categories – structured and unstructured. Examples of structured data include demographic information, diagnoses and procedures (in the form of clinical codes), medication prescription information, insurance records, and vital signs, while examples of unstructured data include free-text clinical narratives and imaging and test reports.

Both types of clinical information are important and perform complementary functions in real-world data. Structured data contains many basic data elements and is traditionally easier to process, due to its tabular nature. However, unstructured data has been estimated to comprise 80% of clinical data by volume and often provides insights that are absent from structured clinical data and claims data [1]. There is an old saying among medical professionals that “90% of diagnoses can be made using the patient history, and 10% using the physical exam [2]” (notably, both elements are virtually absent from structured EHR data).

What are some of the details captured in the well-written clinical note that are typically excluded from structured EHR data?

Information Extraction from Unstructured Data in an LLM-World

A well-written clinical note contains many details about the patient that are absent from both structured tabular data and claims data. Until just a couple of years ago, the challenge was extracting information from a clinical note into a usable format. However, with the advent of large language models (LLMs), one can present a note as context and ask a favorite LLM questions about the note, such as “Where is the location of this patient’s pain?” or “Why did the patient discontinue lisinopril?” Adaptation of this method enables extraction of information from the note as structured categorical data, which can then be used as structured data.

OMNY Notes: A First-of-its-Kind Clinical Notes Data Product

OMNY Notes is one of our exciting new data products that makes billions of de-identified clinical notes from diverse health systems available to the end-user. Researchers no longer must rely solely on structured EHR and claims data; they can now view the full patient journey with our HIPAA-compliant de-identified linked structured EHR, claims and notes solutions representing more than 75M individuals. No other solution available today provides the combined depth, breadth, and scale of OMNY structured and unstructured data to support improving quality, safety, and efficiency of healthcare delivery and overall public health.

OMNY Foundation
OMNY Linked Claims
OMNY MedTech

References:

[1] https://healthtechmagazine.net/article/2023/05/structured-vs-unstructured-data-in-healthcare-perfcon.

[2] Tsukamoto, Tomoko, et al. “The contribution of the medical history for the diagnosis of simulated cases by medical students.” International Journal of Medical Education 3 (2012).

Tags data-driven insights, healthcare, OMNY Platform, research

Product

Cognitive Impairment in Alzheimer’s Disease: Patient Characteristics and Treatments in the Real-World Setting

Post author By Jona Kerma
Post date July 8, 2025

OMNY Health’s recent study is shedding light on the role of cognitive impairment in patients diagnosed with Alzheimer’s disease. Alzheimer’s disease (AD) is the most frequent type of neurodegenerative disease that develops over several years. It is characterized by multiple cognitive deficits that progress over time, including memory deterioration. Newly approved monoclonal antibodies, unlike traditional medications that primarily relieve symptoms, have been shown to slow the progression of Alzheimer’s disease by approximately 30%.

At OMNY Health, researchers recently completed a retrospective analysis (2017 to 2024) of electronic health records from over 150,000 patients that had received Alzheimer’s disease care in the United States. The study aimed to characterize the cognitive state of patients and to describe their treatment regimens. OMNY’s researchers focused on nearly 7,000 Alzheimer’s disease patients in the dataset who had either the Montreal Cognitive Assessment (MoCA) or the Mini-Mental State Exam (MMSE) scores available.

Key Findings

The findings provide insight into the cognitive state of patients as they are first observed within the health system and receiving a diagnosis of Alzheimer’s disease. On average patients were 78 years of age at the time of their first observed diagnosis. Following the first observed diagnosis of Alzheimer’s, patients were categorized into four groups based on their first reported cognitive scores: normal, mild, moderate, and severe.

The distribution of cognitive severity among patients was as follows:

Normal: 15%
Mild: 37%
Moderate: 35%
Severe: 13%

OMNY’s research also found that with increasing cognitive impairment (normal, mild, moderate, severe), there was a monotonic rise in the proportions of female patients (51%, 60%, 64%, 67%) and nonwhite patients (15%, 17%, 22%, 27%).

The study additionally assessed treatment patterns within 30 days of diagnosis. Many patients (59%) received cholinesterase inhibitors or NMDA receptor antagonists—therapies aimed at symptom management—with usage rates consistent across impairment levels. Fewer than 1% of patients had received treatment with newer monoclonal antibodies.

Why It Matters

As disease-modifying therapies become more accessible, OMNY’s insights can support the research community in evaluating treatment effectiveness, informing clinical decision-making, and advancing understanding of disease progression across patient populations.

Tags data-driven insights, healthcare, OMNY Platform, research

Product

Mining the Clinical Note: Extracting FEV1/FVC with LLMs for Scalable RWE

Post author By Jona Kerma
Post date May 22, 2025

An OMNY Health Study on Severity Measure Extraction in Respiratory Care

Understanding disease severity is essential to supporting treatment decisions, yet many critical severity metrics—especially in respiratory conditions—are often buried in unstructured EHR notes. At ISPOR 2025, OMNY Health presented findings on how large language models (LLMs) can accurately extract FEV1/FVC scores, a core pulmonary function test (PFT) metric, directly from free-text clinical notes.

Why It Matters

Clinical measures like FEV1/FVC are pivotal in evaluating lung function and diagnosing COPD and asthma. However, these measures are not consistently captured in structured EHR fields, making them difficult to access at scale. OMNY Health’s study explored whether retrieval-augmented LLMs could help close that gap—automating severity extraction while preserving accuracy.

Study Overview

OMNY Health researchers sampled 50 random note excerpts containing the phrase “FEV1/FVC” from the OMNY Health platform and categorized them as:

Simple (S): One FEV1/FVC score present
No Value (NV): No score provided
Complex (C): Multiple scores present

OMNY Health researchers tested two Gemini LLMs (Flash and Pro) using a structured prompt and evaluated:

Accuracy in correct extraction
Hallucination rate (false outputs when no score existed)
Latency (processing time in slot milliseconds)

Key Findings

1. High Accuracy in Simple Notes
Flash extracted FEV1/FVC scores with 90% accuracy, outperforming Pro (73.3%).

2. No Hallucinations in NV Notes
Neither model generated phantom data when no score was present.

3. Pro Recognized Complex Context Better
While Flash returned one of the multiple scores, Pro acknowledged the complexity—useful for nuanced documentation.

4. Flash Was Faster
Flash processed data using 6,210 slot milliseconds versus Pro’s 23,576—offering speed advantages for scale.

Example Outputs

Performance Snapshot

What’s Next

This study highlights how LLMs can reliably extract clinical severity measures from real-world EHR data. Moving forward, the team plans to evaluate performance across more complex note structures, investigate how interpretability and robustness vary by model, and assess the trade-offs between cost, speed, and depth of output. The long-term vision is to expand this approach to other therapeutic areas where structured fields fall short—enabling deeper, more scalable evidence generation.

Tags data-driven insights, healthcare, OMNY Platform, research

Archives

Categories