Skip to main content
Fig. 3 | Insights into Imaging

Fig. 3

From: T-staging pulmonary oncology from radiological reports using natural language processing: translating into a multi-language setting

Fig. 3

MedStruct pipeline. Schematic overview of the MedStruct pipeline, in which five different microservices are present: preprocessing, spaCy, pyContextNLP, measurement extractor and T-stage classifier. The report can be processed either from an Excel file or direct from the graphical user interface (GUI). All components use an intermediate JavaScript Object Notation (JSON) annotation format to chain the pipeline components and can be consumed over REpresentational State Transfer (REST) or chained using a message broker. The use of a JSON annotation format simplifies reusability of the different components, enables mixing programming languages, prevents for duplicate processing and guarantees token alignment between components. This implementation saves annotations at token level instead of sentence level, which enables precise highlighting of annotations in a GUI. Detected tumor and lymph nodes are stored as objects in a list, allowing for detection of concurrent mentions. Documents can now be processed individually with the same rule-based algorithm

Back to article page