Five Natural Language Processing Subtasks that Large Language Models Can Perform to Improve Healthcare

Jan 12, 2024

In late 2022, the technology world was turned upside down as OpenAI released ChatGPT, its new artificial intelligence (AI) model.  Unlike most previous natural language processing (NLP) models, this model contained billions of parameters, was trained on a corpus of unstructured data with unprecedented size, and underwent a novel alignment process to better orient the model towards human needs.  The result was a model that provided superior performance for various NLP-related tasks and professional certification exams (including medical ones) and could produce realistic-sounding text that even the most hardened AI cynics could not deny.  Technological and legal impacts were felt almost immediately. Many technology companies began to invest in the new field of generative artificial intelligence (genAI) with an eye toward the potential for financial benefits. 

However, lost in all the excitement, is the fact that genAI relies largely on the bread-and-butter NLP methods and subtasks that have supported and improved many industries, including healthcare, over the past decade or two, with the additional capability of text generation.  Unlike previous NLP models, newer large language models (LLMs) can perform many different NLP subtasks with one single model. 

So, what are the different subtasks that a state-of-the-art LLM can perform? Without further ado, here are five NLP subtasks that can be programmed into LLM to improve healthcare:

Text Classification

Text classification, or the task of distributing text into categories, is often seen as the most primitive NLP subtask.  A classic non-healthcare example of this subtask is categorizing reviews (e.g., critiquing products or movies) into having positive or negative sentiments. Text can be classified at the sentence or paragraph level, depending on the use case.   

In healthcare, the clinical notes for patients can be classified as to whether they identify patients as having certain attributes or disease characteristics.  Using that information, appropriate personalized interventions and treatments can then be used to improve health.  Over the past year, work at OMNY Health has focused on several text classification projects, including the following:

  1. Classifying generalized pustular psoriasis (GPP) patients or patients with a rare, devastating form of the skin disease psoriasis, by their disease status (flare versus non-flare) in collaboration with a life sciences partner. 
  2. Classifying psoriasis patients as to whether they received joint assessments and/or rheumatology referrals in the presence of psoriatic arthritis symptoms, in collaboration with life sciences partners. 
  3. Identifying patients adversely affected by social determinants of health, including illiteracy and financial and housing insecurity. 

The results of the study are available (View Report).

Named Entity Recognition

Also known as token classification, named entity recognition (NER) involves classifying individual words or phrases as reflective of an entity, for example, a person, place, or organization.  It differs from text classification in that every word is classified, unlike text classification in which sentences and paragraphs are classified.

In healthcare, common entities detected include symptoms, medications, and protected health information (PHI). PHI may be related to individuals, such as names, addresses, and contact information, in larger efforts to deidentify clinical notes before their input in an NLP model.  In fact, at OMNY Health, more than 20 PHI entities are identified and removed from clinical text before NLP to protect patient privacy and to comply with the Health Insurance Portability and Accountability Act (HIPAA) of 1996.

Relation Extraction

Relation extraction (RE) represents a relatively more complex subtask in which the relationships between various entities are determined.  For example, if two “person” entities are detected in the text, an RE task may be used to identify whether the two entities are married.  It is usually performed downstream of an NER step.  Previously, RE tasks often required expensive annotation and labeling processes to be performed successfully, although this requirement has faded with the advent of newer LLMs that can label and annotate notes automatically.

In healthcare, RE applications include identifying patient-doctor relationships for clinical note understanding, as well as more general clinical RE to model medical knowledge ontologies such as the Systematic Nomenclature in Medicine – Clinical Terms (SNOMED-CT).  At OMNY Health, work has been performed to identify relations between detected symptom and medication entity-pairs to determine if they comprised an adverse drug event (ADE).

Closed-domain Question Answering

Closed-domain question answering is an NLP subtask in which a text is paired with a question about the text, like a reading comprehension question on a standardized exam.  The question-context pair is then passed to an NLP model, which then extracts the answer to the question from the text.

In healthcare, question-answering pipelines can be performed to extract practically any conceivable information about a patient from clinical notes, when paired with the correct prompt.  The answer can then be processed and refined using additional NLP steps.  As an example, a clinical note about medication discontinuation could be passed as input to an NLP model paired with the question, “Why was the medication discontinued?”  The answer could then be passed to a text classification model which categorizes the reason for treatment discontinuation.  At OMNY Health, we have constructed such a pipeline to determine reasons for treatment discontinuation for various medications (View Report).  A second application of question-answering is the extraction of severity scores from clinical notes for various diseases.

Text Generation

Finally, text generation is the process of composing new, original text in response to a provided prompt.

The basic mechanism revolves around predicting the next word based on the input prompt and the sequence of previously generated words.  As alluded to in the introduction, modern LLMs undergo alignment to prevent toxic, irrelevant, and/or unhelpful text from being generated. This has resulted in improved output.  Text generation can be used for open-domain question answering, in which a general prompt is provided with no corresponding answer in the context (e.g. “What is the purpose of life?”).  Text generation is also used on the downstream end of retrieval-augmented generation pipelines, in which a query first causes the NLP pipeline to filter the relevant documents in a database, extract an answer, and then generate text to present the answer to the user.

In healthcare, potential applications of text generation include helping physicians write clinical notes to ease documentation burden, as well as simplifying routine tasks for health analysts at pharmaceutical companies.  A true challenge is to accomplish these use cases while protecting patient privacy and preventing any PHI leakage in responses to prompts.  At OMNY Health, we strive to accomplish these and related use cases responsibly.


In conclusion, there is more to LLMs than generating text.  Combining text generation functionality with the subtasks mentioned in this article potentiates LLMs to complete powerful tasks that can help improve healthcare.

By continuing to use the site, you agree to the use of cookies. More information.

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below, then you are consenting to this.