Doctoral Thesis: Practical Considerations For the Deployment of Clinical NLP Systems

Wednesday, May 1
1:30 pm - 3:00 pm

E25, Room 117

By: Eric Lehman

Supervisor: Peter Szolovits

Details

  • Date: Wednesday, May 1
  • Time: 1:30 pm - 3:00 pm
  • Category:
  • Location: E25, Room 117
Additional Location Details:

https://mit.zoom.us/j/6036483463

Abstract: Although recent advances in scaling large language models (LLMs) have resulted in improvements on many NLP tasks, it remains unclear whether these models
trained primarily with general web text are the right tool in highly specialized, safety critical domains such as healthcare. A healthcare system attempting to automate a clinical task must weigh all approaches with respect to safety, efficacy, and efficiency. This thesis investigates the challenges and implications of implementing LLMs in clinical settings, focusing on the three considerations listed above: safety, efficacy, and efficiency. We first explore the potential biases that might be introduced in downstream patient safety by using LLMs in a zero or few-shot setting and find that LLMs can propagate, or even amplify, harmful societal biases in a number of clinical tasks. Then, we examine the privacy considerations of pretraining a language model on protected health information (PHI) bearing clinical text and find that simple probing methods are unable to meaningfully extract sensitive information from an encoder-only language model pretrained on non-deidentified electronic health record (EHR) notes. Finally, we conduct an extensive empirical analysis of 12 language models, ranging from 220M to 175B parameters, measuring their performance on 3 different clinical tasks that test their ability to parse and reason over electronic health records. We show that relatively small specialized clinical models are substantially more effective than larger models trained on general text used through in-context learning. Further, we find that pretraining on clinical tokens allows for smaller, more parameter-efficient models that either match or outperform much larger language models trained on general text. We argue that using a clinical text-specific pretrained language model allows for an efficient, effective, and privacy-conscious approach, enabling a tailored and ethically responsible application of AI in healthcare.