Cost-effectiveness of artificial intelligence aided vessel occlusion detection in acute stroke: an early health technology assessment

Background Limited evidence is available on the clinical impact of artificial intelligence (AI) in radiology. Early health technology assessment (HTA) is a methodology to assess the potential value of an innovation at an early stage. We use early HTA to evaluate the potential value of AI software in radiology. As a use-case, we evaluate the cost-effectiveness of AI software aiding the detection of intracranial large vessel occlusions (LVO) in stroke in comparison to standard care. We used a Markov based model from a societal perspective of the United Kingdom predominantly using stroke registry data complemented with pooled outcome data from large, randomized trials. Different scenarios were explored by varying missed diagnoses of LVOs, AI costs and AI performance. Other input parameters were varied to demonstrate model robustness. Results were reported in expected incremental costs (IC) and effects (IE) expressed in quality adjusted life years (QALYs). Results Applying the base case assumptions (6% missed diagnoses of LVOs by clinicians, $40 per AI analysis, 50% reduction of missed LVOs by AI), resulted in cost-savings and incremental QALYs over the projected lifetime (IC: − $156, − 0.23%; IE: + 0.01 QALYs, + 0.07%) per suspected ischemic stroke patient. For each yearly cohort of patients in the UK this translates to a total cost saving of $11 million. Conclusions AI tools for LVO detection in emergency care have the potential to improve healthcare outcomes and save costs. We demonstrate how early HTA may be applied for the evaluation of clinically applied AI software for radiology. Supplementary Information The online version contains supplementary material available at 10.1186/s13244-021-01077-4.


Background
Currently, there are more than one hundred CE-certified artificial intelligence (AI) software products for radiology on the market, addressing a wide range of applications. Vendors often make broad claims on improving healthcare and increasing efficiency, however evidence on its impact on healthcare is generally lacking [1]. For detection of acute ischemic stroke specifically, six regulatory cleared (CE and/or FDA) AI products are commercially available that detect large vessel occlusions (LVOs) on computed tomography angiography (CTA). The main aim of these products is to improve the diagnostic accuracy of LVOs, while fast and accurate diagnosis is crucial to provide appropriate treatment, especially when intraarterial thrombectomy (IAT) is possible, which overall results in improved patient outcome [2][3][4].
There is some evidence on the stand-alone diagnostic accuracy of LVO detection software in peer-reviewed publications (n = 2) and FDA submissions (n = 3) indicating radiologist-level performance regarding sensitivity [5][6][7][8][9]. However, limited research has yet been published on the added value of these algorithms on the diagnosis, treatment decisions, patient outcomes and societal outcomes [10]. It therefore remains unknown to what extend the use of AI software for LVO detection impacts the diagnostic accuracy and, consequently can impact healthcare outcomes and overall costs.
Early health technology assessment (HTA) is a methodology to assess the potential value of an innovation in an early stage before it has been implemented [11,12]. The aim of early HTA is to provide insight in the potential value of new technology to inform about further development (is it worthwhile to further develop the technology and perform research on the technology), positioning of the technology within the treatment pathway (for instance should it be an addition to the current pathway or replace something else), required specifications (minimal needed effectiveness, pricing) and future research (what outcomes should be included in future clinical research). Outcomes provide guidance in allocating healthcare resources in an efficient way promoting value-based healthcare. We hypothesize that early HTA is also a useful method to assess the potential impact of AI applications on healthcare outcomes and costs. Therefore, the aim of this study, was to use early HTA to evaluate the potential cost-effectiveness of using an AI tool in ischemic stroke for intracranial LVO detection on computed tomography angiography (CTA) in comparison to standard of care.

Strategies
To demonstrate the potential value of AI-aided LVO detection, two strategies were compared with regards to costs and effects. The first strategy comprised current standard of care. Patients receive head CTA with or without CT perfusion (CTP) when suspected of ischemic stroke based on symptoms and exclusion of other causes as demonstrated by non-contrast CT. The images are evaluated by a radiologist and/or neurologist on duty after which IAT follows if the patient is deemed eligible according to current guidelines [13].
The second strategy was defined as a theoretical strategy in which AI is used as an aid for LVO detection on CTA. For both strategies, only vessel occlusions in the proximal anterior circulation (ICA, A1, M1, M2) were regarded as relevant for patient selection to IAT as in concordance with the recommendations in the current stroke guidelines [13]. In this strategy, we assume that AI software is capable of increasing the diagnostic sensitivity, especially for the detection of M2 occlusions, without a decrease in specificity. False positives of the AI software are expected to be neutralized by the judgement of the reader in order to prevent overtreatment.
Some vendors claim that, besides providing a more accurate diagnosis, the use of AI may lead to shorter time to treatment, especially when it enables to bypass the radiologist [10]. As most currently available commercial products focus on triage and interactive decision support, we only assessed the claim that the use of AI could provide a more accurate diagnosis, i.e. reduce the number of missed LVOs.

Model structure
We used a Markov model to demonstrate the costs and health outcomes of the two strategies. The decision tree represents the acute phase (first 90 days) for both strategies (Fig. 1). The modified Rankin Score (mRS), describing the level of disability, at 90 days was the initial state for the subsequent Markov model to model outcomes over a lifetime horizon. After each yearly cycle, over a total of 70 cycles, patients could remain in their current health state, have a recurrent stroke, or die according to mortality probabilities of the general population [14]. The analysis was conducted from a societal perspective in the context of the United Kingdom. Modelling was done in Microsoft Excel 16.

Population
We based our cohort (n = 71,840) predominantly on stroke registry data from the UK [15]. The initial cohort, on which the AI software would be applied, consisted of patients that received CTA in the diagnostic work-up of acute stroke. We excluded late presenters of stroke (last seen well > 4 h) as available mRS data was based on early presenters. Of the patients, 30.6% had a LVO, of whom 43.8% were eligible for IAT treatment [15].
Little is known about the percentage of missed LVOs in standard of care, but estimates ranging between 6 and 20% have been reported in literature [16,17]. We therefore test a range of this variable in our model and zoom van Leeuwen et al. Insights Imaging (2021) 12:133 in to the conservative scenario where 6% of LVOs are missed. This number was based on an observer study with three neuroradiologists using CTAs and CTPs for LVO detection [16].
Other variables needed to complete the model were compiled from previously performed large stroke studies. Population age was set at 66 years based on the MR Clean trial [3]. mRS outcomes at 90 days were based on pooled data from large randomized trials (SWIFT PRIME, MR CLEAN, REVASCAT, EXTEND-IA, ESCAPE) [2]. The yearly average probability for recurrent stroke after the acute phase (first 90 days) was 2.84% [18]. Recurrent stroke in the acute phase was included in the mRS outcome at 90 days. In patients who suffered from a recurrent stroke worse outcome with higher mRS states were possible. mRS state distribution in patients with a recurrent stroke were assumed to be equal to a group of patients who did not receive IAT. Table 1 provides an overview of these model inputs.

Costs
Costs were derived from a previous study (Table 2) [19]. Short term costs (< 90 days) consisted of costs for treatment, hospitalization and management of adverse events. Long term costs were attributed to the different mRS states, based on the OXVASC study, including costs of personal social services, such as nursing and residential care costs [20].
All costs were inflated to the level of 2019 according to the Consumer Price Index from the Office of National Statistics of the UK and were discounted at 4% annually [21,22]. We report all costs in US dollars to ease interpretation by applying the 2019 average exchange rate (£1 = $1.283) [23].

Health outcome
The health outcomes have been described in quality adjusted life years (QALYs) with 0 meaning death and 1 being in perfect health. The mRS scores were utility weighted based on the MR Clean trial (Table 3) [26]. Utilities were discounted at a rate of 1.5% annually [21].

Analysis
In multiple-scenario analyses we assessed the differences in health outcomes and costs at different algorithm performance values (percentage of LVOs detected by the AI tool that would have otherwise been missed), costs of the AI tool (US dollar per analysis) and percentage of missed LVOs in standard of care. A base case analysis was performed using a combination of assumptions for these parameters. The sensitivity of AI tools validated stand-alone for detection of LVOs have been reported to be in the range of 87.8-97.9% [5][6][7][8]. However, for this analysis it is relevant to know how much the diagnostic accuracy improves when a radiologist uses the AI tool as a detection aid. As there is no evidence available on the percentage of reduced missed LVOs through the use of commercial AI products, we varied this number around an assumed base case performance of 50%. The price of the AI per case was set at $40 for the base case scenario. As previously described the base case assumption of missed LVOs in standard of care was set at 6%. The reference value for one QALY was £20,000 ($25,662) [27]. We performed multiple one-way sensitivity analyses in which we varied the parameters that were not included in the scenario analysis to evaluate the robustness of the model. This was an observational study for which the Institutional Review Board approval was waived. The model is made available on https:// www. AIfor Radio logy. com with the possibility to adapt the variables [28].

Results
Applying AI for LVO detection has the potential to save costs and increase patient outcomes. Table 4 shows the outcomes in different scenarios regarding the percentage of missed LVOs in usual care and the percentage of missed LVOs that are detected by the innovation. No costs for the innovation were included in these analyses. Therewith, the potential cost savings represent the maximum costs for the innovation at which the innovation is dominant over usual care, i.e., results in both health gain and cost savings. For example, with 1% of current missed LVOs and 25% of detection with the innovation the maximum costs per AI analysis could be $16 for the innovation to be dominant. Figure 2 illustrates the cost saving per patient when varying the costs per AI analysis and the percentage reduction of missed LVOs by the AI tool, assuming a current missed LVO rate of 6%. Here, it becomes apparent at what cost or performance the AI tool will be cost saving. The filled green bullet demonstrates the base case scenario (6% missed diagnoses, $40 per AI analysis, 50% reduction of missed LVOs by AI). Table 5 zooms in on the results of the base case scenario. Here, the model predicted that the AI strategy results in both cost reduction and improved patient outcome compared to the standard of care. For the projected lifetime per ischemic stroke patient, the incremental costs and incremental efficacy were − $156 (− 0.23%) and + 0.0095 QALYs (+ 0.07%) respectively. Using the reference value of $25,662 per QALY, 0.0095 QALY would translate to $244. For each yearly cohort of patients in the UK this translates to a total cost saving of $11 million and QALY gain of 682 ($17.5 million).
Within a ninety-day window, the intervention scenario led to incremental costs (IC: + $60) due to the increased utilization of the more expensive IAT treatment. Negligible healthcare improvement (IE: + 0.0001 QALY) was observed. A cost reduction and larger QALY gain are expected for the lifetime duration after the acute phase (IC: − $216, IE: + 0.0094 QALY). Figure 3 shows the results of the one-way sensitivity analysis taking the base case scenario as its starting point. In all cases the incremental costs remained negative and the incremental efficacy positive. The long-term stroke costs was the parameter causing most variation in the results followed by the starting age of the patient population. Sensitivity analyses of the mRS probabilities and utility values are reported in the Additional file 1. The analyses did not include extra costs for possible increased reading time for false positive cases as this only resulted in neglectable extra costs − $0.07 for each percentage point of false positives-as shown in the Additional file 2.

Discussion
We applied early health technology assessment (HTA) to demonstrate the potential cost-effectiveness of an AI tool. Current AI research focuses mainly on the performance of algorithms (the means) instead of assessing the impact on healthcare outcomes and costs, especially in the early stage of development.
In this study, we demonstrate with the use of early HTA that AI software for the aided detection of intracranial LVO in ischemic stroke has the potential to improve healthcare outcomes and save overall costs. The sensitivity analyses demonstrate the robustness to variation of model parameters and assumptions. The improved LVO detection leads to better clinical outcome on both the Table 2 Model input-costs * Cost of the average treatment was adjusted for the estimated frequency of the treatment according to different studies. For IAT eligible patients, we assumed 85% to receive both IAT and intravenous thrombolysis (IVT), 10% to receive IAT only, and 5% to receive IVT and going for IAT but who appeared revascularised during angiography [3,15,24]. For non-IAT-eligible patients we assumed 40% to receive IVT based on previous study and our local medical center data [25] IAT, intra-arterial thrombectomy; mRS, modified Rankin Scale Costs are rounded to the nearest integer     short term as well as long term resulting in reduced overall healthcare consumption.
Early HTA analyses are not meant to provide a firm 'go' or 'no-go' recommendation for the development or purchasing of an innovation, but provide insights in the direction to head regarding development, implementation and reimbursement [11]. For example in this study, the analyses show that cost benefits are obtained in the long term, while the costs for the software are short term and are usually covered by the radiology department or hospital. This observation could contribute to the debate on the investments, financial accountability and reimbursement for the clinical use of AI technology.
The difficulty with doing an early HTA is that not all data is yet known, hence it is called 'early' . Therefore, we used data from previous studies and made some assumptions to complete the model. The sensitivity analysis described, demonstrates the effects when varying the parameters and provides context on the bias and confidence. The model is made publicly available to allow for personalization of the model and the results by adapting the parameter values.
One of the main assumptions was the percentage of missed LVOs. This was assumed to be 6% for the base case scenario, but may vary depending on the clinical context [16,17]. Because of the limited evidence, we based this rate on a study with two neuroradiologists and one neuroradiology resident as observers. However, in many clinical settings there might not be a specialized physician within reach. Some centers have limited exposure to stroke patients and during out-of-office hours the studies are generally first evaluated by less experienced observers (residents). We therefore consider 6% to be a conservative estimate. With higher percentages of missed LVOs, the benefits of the use of the AI software increase.
Also, we made an assumption on the capability of the AI to reduce missed LVOs in the base case scenario as there is limited evidence of the added value of AI tools in LVO detection. The results of this study only hold, provided that AI tools have a positive impact on the diagnostic accuracy of LVOs. Therefore, we have demonstrated the effects for an AI tool with the capability to reduce the percentage of missed LVOs in the range of 0% to 100%. It should be noted that this percentage cannot be directly translated to the sensitivity of an AI-algorithm applied stand-alone (ranging between 87.8% and 97.9% [5-8]), as we assume that the cases that were missed by a physician are also more likely to be missed by an algorithm (e.g. M2 occlusions). For that reason we went for a conservative 50% reduction of missed diagnoses in the base case scenario and we advise to not use sensitivity measures provided by a vendor directly as the input to this model.
There are some points of note to our analysis. First, in this economic evaluation, we considered a pay-per-use business model and assume costs per analysis. When vendors employ a software license model, the proposed evaluation method can still be applied. By dividing the yearly license fee by the expected population, one obtains an estimate of the per-analysis costs.
Second, we used data from early-presenters as the healthcare outcome data was available for this group only. Currently the indication for IAT is being expanded to also include late presenters [13]. The majority of the UK cohort (86%) was made up of early presenters. We hypothesize that cost savings and health gain remain positive, but will be slightly reduced when including the late presenters in the model. Other AI tools aimed at assessing eligibility for IAT, such as CTP analysis, collateral status assessment, or ASPECT scoring may have more impact on the late presenting population, but are beyond the scope of this study. Lastly, we did not elaborately take into account the interaction between the physician and the AI software. We expect limited effects of false positives when present at an acceptable rate. We assumed that false positive predictions of the AI software would be neutralized by the reader and would not lead to unnecessary invasive procedures (diagnostic subtraction angiogram). False positives may contribute to a slightly longer reading time, while true positives may speed up the reading process. As shown in the sensitivity analyses the slightly extra reading time resulted in neglectable extra costs.
In this work, we compared standard of care with use of an AI tool to increase diagnostic efficacy in LVO detection in ischemic stroke. Depending on the center and the way that stroke care is organized, there may be alternative strategies or a combination of strategies to reach the same goal that we did not consider, such as the use of advanced imaging techniques including CTP, additional training of the physicians interpreting the CTA scans, a dedicated (neuro)radiologist on call outside of office hours, or even other types of AI tools. In further costeffectiveness studies these alternatives could be explored to determine if implementing an AI tool for LVO detection is the most cost-efficient solution.

Conclusion
We demonstrate the potential cost-effectiveness of computer aided LVO detection in ischemic stroke by performing an early health technology assessment. Such analysis can be used to indicate the potential efficacy of an AI tool in an early phase to guide development and implementation. The most important next step is to validate the outcomes of the early health technology assessment in clinical practice. With increasing number of AI tools implemented in the clinic, it is important to assess the impact of AI tools on our healthcare system. Real-life outcome measures should be used to gain insights into how to apply AI tools in a sensible and safe way. This is a prerequisite to prove the claim that AI is making healthcare better and more affordable.