AI Agents in Healthcare

healthcare

AI agents are an emerging technology in the field of artificial intelligence. Simply put, an AI agent is a large language model that has been given a well-defined set of tools it can use to complete a task. Imagine you gave ChatGPT the following prompt: "Buy me a present for my brother-in-law's birthday next week". Assuming you did this today, you would probably get a response containing a list of some generic present suggestions for a stereotyped middle-aged male (I got "Tech Gadgets and "Sports Equipment" as two suggestions when I tried this). A thoughtful present that captures some element of your shared relationship wouldn't end up on your brother-in-law's doorstep on the day of his birthday as you implicitly requested. However, let's imagine we enhanced ChatGPT with a set of tools to help with this task: the ability to search and buy presents from Amazon, access to your contacts to identify the exact birthday of your brother-in-law and his address, and a copy of your messaging history to try and get an understanding for presents he might like (maybe drawing upon inside jokes you share). Given only these three tools, the model can now find and have delivered a personalised gift from you to your family member. A simple prompt is the only input needed to complete a non-trivial task with multiple steps.

As more and more of healthcare is conducted in the digital realm, AI agents have growing utility. I split their potential use in the following two ways:

  1. Operating the system - managing waiting lists, supply chain logistics, staffing, patient communication, estate management etc.
  2. Carrying out clinical work - conduct initial consultations, request relevant tests or imaging needed to work through a differential diagnosis, counsel patients about lifestyle and medications, and calculate risk scores

Despite the broad space of problems that these agents are suited to, the lack of an ability to navigate real-world clinical environments, and gain additional knowledge from physically examining patients likely delays the emergence of a AI agent that fully assumes the role of a doctor until significant advancements have been made in the field of robotics.

What agents could do

Operating the system

Healthcare systems are deeply complex, and possess all of the functions present in any large organisation, to list a few:

AI agents will likely slot into this existing system and adopt roles similar to those which already exist. There may be an agent that is responsible for managing surgery waiting lists, which is given tools for sending an email, text, or making a phone call to patients to invite them to the hospital in light of a last-minute cancellation; ordering the list by calling on an algorithm pre-defined by a human review board that weights factors such as the clinical need of the patient, and how long they've been waiting. There may also be an agent that is responsible for managing patient appointment bookings. This agents tool-set may include tools to book, reschedule, and cancel appointments, and email patients with requests for additional information.

Carrying out clinical work

To quote the mantra of Zak Kohane (Head of the Harvard Bioinformatics Lab), "medicine is at its core is an information processing discipline"1. If we think about the essence of much of the clinical work that doctors do, they take in lots of information about a patient, through speaking to them, reading clinical notes, reviewing blood results, imaging, and conducting examinations. They then recommend a plan to either provide further diagnostic information (e.g. further tests), or recommend a treatment. Currently, the latest generation of AI foundation models are able to take in increasing large volumes of text data, process the data, and give us useful text outputs. This makes them well-suited to tasks such as writing discharge summaries, and summarising a transcribed consultation.

However, the architecture that makes up language models (and therefore agents), known as the transformer, isn't restricted to just understanding the modality of text. Transformers are data agnostic, and can identify patterns across multiple data modalities. New models have recently been released (think GPT-4o) that take in both text and images and generate text responses based not only upon the text input, but also from what is contained in the image. These multimodal models have also been applied to the medical domain. For example, X-Ray GPT allows you to submit an x-ray image and a question, such as "What are the main findings from this x-ray"2. The model will then give a response in the style of a radiography report.

There's no theoretical reason why any form of data is off the table. Soon, agents will have the ability to simultaneously take in and process information such as text, images, video, audio signals, and tabular blood results taken over multiple days. A further impressive characteristic of these models is that the data doesn't necessarily need to be well-structured. You can just throw in the entirety of the written notes from a patient's recent admissions, and the model will try and make sense of it regardless.

That's not to say that you can enter low quality data. Data from a mis-calibrated temperature probe, mistranscribed speech from a patient's history, blood results from an incorrect patient after mislabelling a blood bottle. Data still needs to correctly represent the true state of the patient if it is to lead to truthful responses.

Again, affording a multi-modality model with the use of tools gives us an agent that can carry out a number of clinical tasks. For example, let's say we task a model with helping us conduct a full assessment of a patient who has come in for an annual clinical review. The model starts by taking in all of the patient's history, including doctor's notes, blood test results, and a chest x-ray that were collected as part of a recent admission to hospital for a recent fall. It first decides to use one of its tools to calculate a risk score for cardiovascular disease (such as QRISK3). It finds that the patient exceeds the threshold score of 10, meaning they have a high risk of develping cardiovascular disease within the next 10 years, and therefore recommends that they start a statin. It uses another tool to request a blood test be taken to check the patient's liver function, which is standard practice when starting them on this medication. Finally, it calls a text-to-speech tool so that it can have a real-world conversation with the patient, and counsel them about taking this medication, and answer any questions they might have.

What agents probably won't do (in the short-term)

In the field of robotics, there exists a concept known as Moravec's paradox. Put simply, this posits that contrary to what many people believe, higher order reasoning and complex cognitive abilities require relatively little resource. However, it requires significantly more resource for an agent to perceive, and reliably navigate complex environments (normal, everyday environments are heavily complex). This makes sense from an evolutionary perspective. Humans only recently gained the ability to reason and carry out long-term planning (in the last 100,000 years or so). However, since the existence of some of the earliest lifeforms in the primordial soup, life has been forever trying to better sense, understand, and navigate its environment (we now talk on the order of 4 billion years). Following this paradox to its logical conclusion, the tasks that will be hardest to solve in healthcare are those that involve interaction with the environment around us.

Let me now pose a scenario that demonstrates the limitations of AI agents in healthcare. An old woman comes into the emergency department after a collapse. She is clearly unwell: she is confused and struggles to respond to your questions, her heart rate is high, her blood pressure low, and she has a fever. You are concerned that she might have sepsis. So you conduct an examination. You first feel her pulse by placing your forefinger and middle fingers over her wrist - the slight tactile sensation of her pulse against your fingers reveals a thready character. You carefully position the lady forward in the hospital bed to listen to her chest with a stethoscope. Whilst moving her, you ensure not to pull on the taut tube delivering IV fluids into a cannula positioned precariously to the lateral aspect of her right hand. You position the stethoscope at different locations on her back, making small adjustments to its position, so as to hear her breathing more clearly. All lung zones are clear, making you less convinced the infection is from a respiratory source. You conduct a full examination, making sure to check all areas of the skin. Upon lifting her leg, on the back of her calf, that there is an inflamed, well-demarcated, broken patch of skin about 5 cm in diameter. You suspect this is cellulitis and that you have found the source of her septic presentation. You start her on broad-spectrum IV antibiotics. You take a gentle swab of the area and send this to a lab which grows a pathogenic strain of bacteria - the likely causative agent. You've made your diagnosis and initiated an appropriate treatment plan.

Almost all clinical interactions involve some degree of actual interaction with the patient. Considering the typical scenario above, there are many points in this small clinical interaction where reaching the diagnosis required a trained individual to carefully manoeuvre a complex environment and unwell, frail patient. Even if 90% of the useful information needed to make diagnoses or recommend treatments can be gleaned from the patient's history (which could feasibly be collected by a language model that can produce and transcribe speech) and looking at digital test results and imaging, you still miss the remaining 5% of useful information that can get you to that critical diagnosis. Until we have capable robots possessing a full suite of sensory modalities, that can safely and effectively navigate a complex environment, and interact with unwell patients, skilled medical practitioners will still be needed for the medical process.

Final thoughts

Multi-modal autonomous agents with the right set of tools have huge potential to improve the efficiency of operating healthcare systems, but also the quality of diagnoses and treatment. The last mile in healthcare where humans will still have a critical role is their ability to navigate complex environments, and glean hidden information that is often critical to ensuring correct diagnoses and treatments.

Links

Footnotes

  1. https://www.zaklab.org/blog/ (opens in a new tab)

  2. https://arxiv.org/abs/2306.07971 (opens in a new tab)

© Seth Howes.RSS