In the world of life sciences, data is king. For decades, the focus has largely been on structured data, including neatly organized tables, registries with predefined fields, clinical trial results captured in clinical report forms (CRFs), and insurance claims data. While undeniably valuable, this structured data often tells only part of the story.
The real goldmine, often overlooked and underutilized, lies within unstructured clinical text. These free-text narratives – physician notes, discharge summaries, pathology reports, and radiology findings, to name a few types – contain a wealth of detailed, nuanced, and patient-specific information that rows and columns simply cannot capture fully.
For life sciences companies, understanding and extracting insights from this unstructured data is no longer a luxury, but a necessity.
Here’s why:
The Limitations of Structured Data
Imagine trying to understand a complex patient journey solely from a checklist. Structured data, by its very nature, simplifies and categorizes. It’s excellent for tracking demographics, diagnosis codes, medication lists, and lab results. However, it often misses:
- Nuance and Context: The why behind a diagnosis, the specific symptoms a patient described, or the subtle changes in their condition over time. A coded diagnosis of “headache” doesn’t reveal if it’s a throbbing migraine, a dull ache, or accompanied by visual disturbances. In fact, most initial visit notes contain a History of Present Illness (HPI) that document the seven cardinal features of the patient’s reason for the encounter:
- Onset: When did the symptoms start? (The beginning time/date).
- Location: Where on the body is the symptom? Does it radiate or travel anywhere else?
- Duration: How long does the symptom last when it occurs? (e.g., seconds, hours, constant).
- Character (or Quality): What does it feel like or look like? (e.g., sharp, dull, throbbing, crushing, burning).
- Aggravating/Alleviating Factors: What makes it better or worse? (e.g., movement, rest, food, medication).
- Radiation (or Related Symptoms): Does the symptom move to another part of the body (Radiation)? Or are there any other symptoms that occur with the primary one (Associated Symptoms)?
- Timing (or Temporal characteristics): When does it occur? (e.g., constantly, intermittently, only in the morning, with exertion).
- Severity (or Scale): How bad is the symptom? (Usually rated on a scale, such as 1-10 for pain).
- Patient History Beyond Codes: Family history details, lifestyle factors, or environmental exposures that might not fit into a predefined field.
- Family History: Particularly important for oncology and cardiovascular disease, involves identifying which nuclear/extended family members had related conditions.
- Lifestyle Factors: Smoking, alcohol usage, illicit drug use, living situation, social determinants of health, and sexual health are known to be important risk factors but often omitted from structured data.
- Occupational / Environmental Exposures: Extra information that sheds light on risk factors for diseases including cancer and asthma.
- Treatment Rationale and Adjustments: Why a particular treatment was chosen / switched to / switched from / discontinued, how a patient responded to it, and subsequent modifications.
- Rare Disease Insights: For conditions with limited structured data such as their own ICD-10 codes, the narrative of clinical notes becomes even more critical.
Four Applications of Unstructured Data
Now that we have established how notes can be delivered to researchers, how exactly can they be used to enhance clinical knowledge?
After de-identification, the possibilities are endless. Below we describe four common applications.
- Extraction of Disease Severity: As opposed to the clinical trial world in which key outcomes are dutifully and regularly recorded, in the real-world researchers are reliant on physician record-keeping to identify the waxing and waning of disease progression. And often, these outcomes, also known as severity measures, are found in free-text notes. Learn more about how researchers at OMNY Health have been extracting information about disease severity from Notes since 2022 with the use of transformer-based pipelines and more recently with large language models (LLMs).
- Identifying Reasons for Treatment Discontinuation: Rollouts of newly developed drugs cost pharmaceutical companies millions of dollars; with that amount of investment, it becomes imperative to know more about why new drugs are being discontinued by patients/physicians, and which drugs are taking their place. At OMNY Health, we have built various pipelines for extracting this information from clinical notes, again using both transformer-based methods and LLMs.
- Researching Rare Diseases: For rare diseases, clinical note repositories can be particularly useful in pooling large numbers of patients having such diseases and establishing basic clinical knowledge about them – e.g. What patients are at risk? Why do some patients experience flares? What treatments work best? An example is work that OMNY Health completed in partnership with a life sciences company on generalized pustular psoriasis (GPP). Notes can also be used to find patients exhibiting symptoms or characteristics that might be consistent with undiagnosed rare diseases. Clinical Notes can help identify patients that are candidates for genetic tests that could potentially validate a rare disease diagnosis.
- Training AI Models: In the age of Generative AI and LLMs, it is becoming more important than ever to find reputable sources of healthcare data (read: not the Internet) with which to train healthcare-specific LLMs that can reason without harmful biases. Need a proven source of de-identified clinical notes from diverse populations and provider mixes with which to train your LLM? We have made it possible.
Learn More about OMNY Notes – Contact Us!
At OMNY Health, we would love to discuss how our OMNY Notes product combined with our structured data offerings can support your clinical research initiatives and ultimately improve health outcomes. Please contact us at info@omnyhealth.com. We look forward to hearing from you!







