Categories
Featured Product

Lung Function Assessment in Patients with Persistent Asthma: Impact of Disease Severity and Acute Exacerbation Status © 

By Lawrence Rasouliyan, Amanda G. Althoff, and Danae A. Black | OMNY Health 

Understanding the Role of Lung Function in Asthma Management 

Lung function monitoring is important in the management of asthma, and it provides valuable information to clinicians on disease control and patient response. 

Pulmonary function tests are common tests conducted in clinical settings, yet they are infrequently documented as measurements in free-text notes or tabulated sources of real-world data. 

Meanwhile, disease severity and acute exacerbations are usually available based on the diagnosis codes such as the ICD-10.  

However, the association between coded severity and measured indices of lung function has not been well characterized in routine clinical care data on a large scale. 

This gap was filled by our team at OMNY Health, which analyzed the relationship of lung function measures with severity and exacerbation status in subjects with persistent asthma. 

Study Overview 

Using electronic health record (EHR) data from 2017 to 2024, we analyzed information from three integrated delivery networks included in the OMNY Health real-world data platform.  

Patients were included if they had an ICD-10 code for persistent asthma classified as mild, moderate, or severe—either with or without an acute exacerbation. The relevant ICD-10 codes used for classification are shown below. 

To be included, patients also needed at least one documented lung function measurement—specifically, forced expiratory volume in one second (FEV₁) percent predicted (pp), forced vital capacity (FVC) pp, or FEV₁/FVC pp—associated with an asthma-related encounter. 

Key Findings 

Out of approximately one million patients identified with an asthma ICD-10 code indicating severity and exacerbation status, 14,003 patients (across 31,463 encounters) had corresponding lung function data available. 

Across all severities, lung function metrics declined with increasing asthma severity, and patients experiencing exacerbations consistently had lower lung function compared to those without exacerbations.

What the Data Suggests 

Findings have shown a considerable decrease in mean lung function values because asthma severity increased—regardless of exacerbation status.  

Patients experiencing exacerbations had consistently lower FEV₁, FVC, and FEV₁/FVC metrics as compared to the ones with no exacerbations.  

Most interestingly, when we compared ICD-10–coded severity to typical clinical cutoffs for lung function, the correspondence was not strong enough.  

This undoubtedly suggests that ICD-10 coding alone may not fully capture physiological severity, emphasizing the importance of integrating structured and unstructured lung function data into real-world datasets. 

Why It Matters 

By leveraging structured EHR data, this study highlights the potential to better understand asthma progression and treatment outcomes across real-world populations. 

The results reinforce the value of using lung function metrics—not just diagnosis codes—to assess disease burden and guide more precise asthma management strategies. 

References 

  1. Levy ML, et al. NPJ Prim Care Respir Med. 2023;33(1):7. 
  1. Firoozi F, et al. Thorax. 2007;62(7):581–7. 
  1. Gronkiewicz C, et al. Chest. 2015;147(4):1152–1160. 
  1. Xie F, et al. JMIR AI. 2025;4:e69132. 

 

© 2025 OMNY Health  

Categories
Product

Development and Validation of an N-gram Model to Differentiate Between Melena and Hematochezia Using Unstructured EHR Notes © 

By Vikas Kumar, Yan Wang, and Lawrence Rasouliyan | OMNY Health 

Extracting Meaningful Insights from Free-Text EHR Data 

Extracting meaningful information from unstructured data is never easy, It’s often a time-consuming task. In structured EHR data for instance, certain values such as diagnosis codes are often insufficient to capture the full context. The same is mostly true for gastrointestinal bleeding, where a slight inaccuracy might cause significant implications for both diagnosis and treatment. 

One such challenge which our team at OMNY Health tried to solve is the differentiation between two forms of GI bleeding: melena, black tarry stools, usually caused by upper GI bleeding, and hematochezia, bright red blood in stool, usually from lower GI bleeding. Although both are captured under general ICD-10 codes, namely K92.1 and K92.2, respectively, these do not make a distinction between the two. Our objective was to bridge this gap using unstructured data and machine learning. 

Building the Model: From Clinical Notes to Meaningful Insights 

We conducted a retrospective observational study using the OMNY Health Real-World Data Platform (2017–2025). Patients with ICD-10 codes starting with “K” (gastrointestinal diseases) or “E” (endocrine diseases) were included. A clinical domain expert reviewed 1,000 random clinical notes from patients with GI bleed–related codes (K92.1 or K92.2) to identify phrases that indicated either melena or hematochezia. 

Through this manual review, our team identified 28 phrases for melena and 51 phrases for hematochezia, which were then used to build two separate N-gram models. These models searched millions of notes across the dataset to identify encounters associated with each condition. 
 
These models are validated against real-world clinical outcomes to ensure reliability. We compared the rates of upper versus lower GI diagnoses, endoscopic procedures (EGD vs. colonoscopy), and pharmacologic treatments within 30 days following each encounter. 

Results: Real-World Validation That Reflects Clinical Reality 

Our validation showed that the N-gram models accurately differentiated between the two GI bleeding types. 

  • Precision: 96% for melena; 98% for hematochezia 
  • Recall: 7.9% for melena; 5.3% for hematochezia 

TABLE 1 — Samples of Phrases Used for Melena and Hematochezia N-gram 

(Note: samples shown; complete list of phrases used to train each model is available in OMNY Health’s internal dataset.) 

Patients identified with melena were more likely to have an upper GI diagnosis and to undergo esophagogastroduodenoscopy (EGD). Conversely, patients with hematochezia were more likely to have lower GI diagnoses and receive colonoscopy procedures. These results aligned closely with clinical expectations, reinforcing the accuracy and validity of our models. 

Figure 1. Validation Outcomes for Melena and Hematochezia N-gram Models 

Why It Matters: A Step Toward Richer Real-World Evidence 

The ability to differentiate between melena and hematochezia in unstructured EHR data proves to be more beneficial for more granular, clinically meaningful insights. This allows researchers and healthcare organizations to:

  • Better characterize patient populations
  • Refine outcome measures for GI bleeding studies
  • Support drug safety and effectiveness research with higher precision

OMNY Health platform helps researchers to unlock the full potential of real-world clinical information, i.e. turning free-text notes into actionable insights, hence improving care delivery and research quality. 

Looking Ahead 

The study demonstrates how natural language processing (NLP) can be effective in bridging gaps in structured EHR data. As we continue validating these models, our primary focus remains on empowering researchers, clinicians, and life science partners with trustworthy, real-world data solutions. 

© 2025 OMNY Health