Since IBM Watson began in 2007, humans have been continuously pursuing the development of medical artificial intelligence (AI). A usable and powerful medical AI system has enormous potential to reshape all aspects of modern medicine, enabling smarter, more accurate, efficient, and inclusive care, bringing well-being to medical workers and patients, and thereby greatly improving human health. In the past 16 years, although medical AI researchers have accumulated in various small fields, at this stage, they have not yet been able to bring science fiction to reality.

This year, with the revolutionary development of AI technology such as ChatGPT, medical AI has made great progress in many aspects. Unprecedented breakthrough in the ability of medical AI: Nature journal has continuously launched the research of medical large language model and medical image basic model; Google releases Med-PaLM and its successor, reaching an expert level in the US medical Practitioner exam questions. Major academic journals will focus on medical AI: Nature releases the outlook on the basic model of general medical AI; Following a series of reviews of AI in Medicine earlier this year, the New England Journal of Medicine (NEJM) published its first digital health review on November 30, and launched the first issue of the NEJM sub-journal NEJM AI on December 12. Medical AI landing soil is further mature: JAMA sub-journal published the global medical image data sharing initiative; The US Food and Drug Administration (FDA) is developing draft guidelines for the regulation of medical AI.

Below, we review the significant progress that researchers around the world have made in the direction of usable medical AI in 2023


Medical AI Basic Model

The construction of medical AI basic model is undoubtedly the hottest research focus of this year. The Nature journals have published review articles on the Universal Basic model of healthcare and the large language model of healthcare during the year . Medical Image Analysis, the top journal in the industry, reviewed and looked forward to the challenges and opportunities of basic model research in medical image analysis, and proposed the concept of “pedigree of basic model” to summarize and guide the development of basic model research of medical AI . The future of basic AI models for healthcare is becoming clearer. Drawing on the successful examples of large language models such as ChatGPT, using more advanced self-supervised pre-training methods and vast accumulation of training data, researchers in the field of medical AI are trying to build 1) disease-specific base models, 2) general base models, and 3) multimodal large models that integrate a wide range of modes with massive parameters and superior capabilities

Medical Data Acquisition AI Model

In addition to the large AI models that play a great role in the downstream clinical data analysis tasks, in the upstream clinical data acquisition, the technology represented by generative AI models has also emerged. The process, speed, and quality of data acquisition can be significantly improved by AI algorithms.


Earlier this year, Nature Biomedical Engineering published a study from Turkey’s Straits University¬† that focused on using generative AI to solve the problem of pathologic image-assisted diagnosis in clinical applications. Artifacts in frozen section tissue during surgery are an obstacle to rapid diagnostic evaluation. Although formalin and paraffin embedded (FFPE) tissue provides a higher quality sample, its production process is time-consuming and often takes 12-48 hours, making it unsuitable for use in surgery. The research team therefore proposed an algorithm called AI-FFPE, which can make the appearance of the tissue in the frozen section similar to FFPE. The algorithm successfully corrected the artifacts of frozen sections, improved the image quality, and retained the clinically relevant features at the same time. In clinical validation, AI-FFPE algorithm significantly improves the diagnostic accuracy of pathologists for tumor subtypes, while greatly shortening the clinical diagnosis time.

Cell Reports Medicine reports a research work by a team from the Third Clinical College of Jilin University, the Department of Radiology, Zhongshan Hospital Affiliated to Fudan University, and Shanghai University of Science and Technology [25]. This study proposes a general-purpose deep learning and iterative reconstruction fusion framework (Hybrid DL-IR) with high versatility and flexibility, showing excellent image reconstruction performance in fast MRI, low dose CT, and fast PET. The algorithm can achieve MR Single-organ multi-sequence scanning in 100 seconds, reduce the radiation dose to only 10% of the CT image, and eliminate noise, and can reconstruct small lesions from PET acquisition with 2 to 4 times acceleration, while reducing the effect of motion artifacts.

Medical AI in Collaboration with Medical Workers

The rapid development of medical AI has also led medical professionals to seriously consider and explore how to collaborate with AI to improve clinical processes. In July this year, DeepMind and a multi-institutional research team jointly proposed an AI system called Complementary Driven Clinical Workflow Delay (CoDoC) . The diagnostic process is first diagnosed by a predictive AI system, then judged by another AI system on the previous result, and if there is doubt, the diagnosis is finally made by a clinician to improve diagnostic accuracy and balance efficiency. When it comes to breast cancer screening, CoDoC reduced false positive rates by 25% with the same false negative rate, while reducing clinician workload by 66%, compared to the current “double-read arbitration” process in the UK. In terms of TB classification, false positive rates were reduced by 5 to 15 percent with the same false negative rate compared to independent AI and clinical workflows.

Similarly, Annie Y. Ng et al., of Kheiron Company in London, UK, introduced additional AI readers (in cooperation with human examiners) to re-examine the results when there were no recalll results in the double-read arbitration process, which improved the problem of missed detection in early breast cancer screening, and the process had almost no false positives . Another study, led by a team at the University of Texas McGovern Medical School and completed at four stroke centers, applied computed tomography angiography (CTA) -based AI technology to automate the detection of large vascular occlusive ischemic stroke (LVO). Clinicians and radiologists receive real-time alerts on their mobile phones within minutes of CT imaging being completed, notifying them of the possible presence of LVO. This AI process improves in-hospital workflows for acute ischemic stroke, reducing the door-to-groin time from admission to treatment and providing opportunities for successful rescue. The findings are published in JAMA Neurology .

An AI Healthcare Model for Universal Benefit

2023 will also see a lot of good work that uses medical AI to find features that are invisible to the human eye from more readily available data, enabling universal diagnosis and early screening at scale. At the beginning of the year, Nature Medicine published studies done by the Zhongshan Eye Center of Sun Yat-sen University and the Second Affiliated Hospital of Fujian Medical University. Using smartphones as application terminals, they used cartoon-like video images to induce children’s gaze and record children’s gaze behavior and facial features, and further analyzed abnormal models using deep learning models to successfully identify 16 eye diseases, including congenital cataracts, congenital ptosis and congenital glaucoma, with an average screening accuracy of more than 85%. This provides an effective and easy to popularize technical means for the large-scale early screening of infant visual function impairment and related eye diseases.

At the end of the year, Nature Medicine reported a work done by more than 10 medical and research institutions around the world, including the Shanghai Institute of Pancreatic Disease and the First Affiliated Hospital of Zhejiang University. The author applied AI to the pancreatic cancer screening of asymptomatic people in physical examination centers, hospitals, etc., to detect the lesion features in plain scan CT images that are difficult to detect with the naked eye alone, so as to achieve efficient and non-invasive early detection of pancreatic cancer. In reviewing data from more than 20,000 patients, the model also identified 31 cases of clinically missed lesions, which significantly improved clinical outcomes.

Sharing of Medical Data

In 2023, many more perfect data sharing mechanisms and successful cases have emerged around the world, ensuring multi-center cooperation and data openness under the premise of protecting data privacy and security.

First, with the help of AI technology itself, AI researchers have contributed to the sharing of medical data. Qi Chang and others from Rutgers University in the United States published an article in Nature Communications, proposing a federal learning framework DSL based on distributed synthetic adversarial networks, which uses generative AI to train the specific generated data of multi-centers, and then replaces the real data of multi-centers with the generated data. Ensure AI training based on multicentre big data while protecting data privacy. The same team also open-source a dataset of generated pathological images and their corresponding annotations. The segmentation model trained on the generated data set can achieve similar results to the real data.

The team of Dai Qionghai from Tsinghua University published a paper on npj Digital Health, proposing Relay Learning, which uses multi-site big data to train AI models under the premise of local data sovereignty and no cross-site network connection. It balances data security and privacy concerns with the pursuit of AI performance. The same team subsequently jointly developed and validated CAIMEN, a chest CT pan-mediastinal tumor diagnosis system based on federal learning, in collaboration with the First Affiliated Hospital of Guangzhou Medical University and 24 hospitals across the country. The system, which can be applied to 12 common mediastinal tumors, achieved 44.9 percent better accuracy when used alone than when used by human experts alone, and 19 percent better diagnosis accuracy when human experts were assisted by it.

On the other hand, several initiatives are under way to build secure, global, large-scale medical data sets. In November 2023, Agustina Saenz and others from the Department of Biomedical Informatics at Harvard Medical School published online in Lancet Digital Health a global framework for sharing medical image data called Artificial Intelligence Data for All Healthcare (MAIDA). They are working with healthcare organizations around the world to provide comprehensive guidance on data collection and de-identification, using the U.S. Federal Demonstration Partner (FDP) template to standardize data sharing. They plan to gradually release data sets collected in different regions and clinical Settings around the world. The first dataset is expected to be released in early 2024, with more to come as the partnership expands. The project is an important attempt to build a global, large-scale and diverse set of publicly available AI data.

In the wake of the proposal, the UK Biobank has set an example. The UK Biobank released new data on 30 November from the whole genome sequencing of its 500,000 participants. The database, which publishes the complete genome sequence of each of the 500,000 British volunteers, is the largest complete human genome database in the world. Researchers around the world can request access to this de-identified data and use it to probe the genetic basis of health and disease. Genetic data has always been highly sensitive for verification in the past, and this historic achievement of the UK Biobank proves that it is possible to build an open, privacy-free global large-scale database. With this technology and database, medical AI is bound to usher in the next leap.

Verification and Evaluation of Medical AI

Compared with the rapid development of medical AI technology itself, the development of verification and evaluation of medical AI is slightly slow. Validation and evaluation in the general AI field often ignore the real requirements of clinicians and patients for AI. Traditional randomized controlled clinical trials are too laborious to match the rapid iteration of AI tools. Improving the verification and evaluation system suitable for medical AI tools as soon as possible is the most important thing to promote medical AI to truly leapfrog research and development to clinical landing.

In Google’s research paper on Med-PaLM, published in Nature, the team also published the MultiMedQA evaluation benchmark, which is used to assess the ability of large language models to acquire clinical knowledge. The benchmark combines six existing professional medical Q&A datasets, covering professional medical knowledge, research and other aspects, as well as an online search medical question database dataset, considering doctor-patient online Q&A, trying to train AI into a qualified doctor from many aspects. In addition, the team proposes a framework based on human assessment that takes into account multiple dimensions of fact, understanding, reasoning, and possible bias. This is one of the most representative research efforts to evaluate AI in healthcare published this year.

However, does the fact that large language models show a high level of encoding clinical knowledge mean that large language models are competent for real-world clinical tasks? Just as a medical student who passes the professional physician exam with a perfect score is still far from a solo chief physician, the evaluation criteria proposed by Google may not be a perfect answer to the topic of medical AI evaluation for AI models. As early as 2021 and 2022, researchers have proposed reporting guidelines such as Decid-AI, SPIRIT-AI, and INTRPRT, hoping to guide the early development and validation of medical AI under the condition of considering factors such as clinical practicality, safety, human factors, and transparency/interpretability. Just recently, The journal Nature Medicine published a study by researchers from Oxford University and Stanford University on whether to use “external validation” or “recurring local validation. “To validate AI tools.

The unbiased nature of AI tools is also an important evaluation direction that has received attention this year from both Science and NEJM articles. AI often exhibits bias because it is limited to training data. This bias may reflect social inequality, which further evolves into algorithmic discrimination. The National Institutes of Health recently launched the Bridge2AI initiative, estimated to cost $130 million, to build diverse datasets (in line with the goals of the MAIDA initiative mentioned above) that can be used to validate the unbiasability of medical AI tools. These aspects are not considered by MultiMedQA. The question of how to measure and validate medical AI models still needs extensive and in-depth discussion.

In January, Nature Medicine published an opinion piece called “The Next Generation of Evidence-Based Medicine” from Vivek Subbiah of the University of Texas MD Anderson Cancer Center, reviewing the limitations of clinical trials exposed in the context of the COVID-19 pandemic and pointing out the contradiction between innovation and adherence to the clinical research process. Finally, it points out a future of restructuring clinical trials – the next generation of clinical trials using artificial intelligence, that is, the use of artificial intelligence from a large number of historical research data, real world data, multi-modal clinical data, wearable device data to find key evidence. Does this mean that AI technology and AI clinical validation processes may be mutually reinforcing and co-evolving in the future? This is the open and thought-provoking question of 2023.

Regulation of Medical AI

The advancement of AI technology also poses challenges to the regulation of AI, and policymakers around the world are responding carefully and carefully. In 2019, the FDA first published a Proposed Regulatory Framework for Software Changes to Artificial Intelligence Medical Devices (Discussion Draft), detailing its potential approach to premarket review of AI and machine learning-driven software modifications. In 2021, the FDA proposed the “Artificial Intelligence/Machine Learning-based Software as a Medical Device Action Plan”, which clarified five specific AI medical regulatory measures. This year, the FDA reissued the Premarket Submission for Device Software Features to provide information on premarket submission recommendations for FDA’s evaluation of the safety and efficacy of device software features, including some software device features that use machine learning models trained through machine learning methods. The FDA’s regulatory policy has evolved from an initial proposal to practical guidance.

Following the publication of the European Health Data Space in July last year, the EU has once again enacted the Artificial Intelligence Act. The former aims to make the best use of health data to provide high-quality healthcare, reduce inequalities, and support data for prevention, diagnosis, treatment, scientific innovation, decision-making and legislation, while ensuring that EU citizens have greater control over their personal health data. The latter makes it clear that the medical diagnosis system is a high-risk AI system, and it needs to adopt targeted strong supervision, whole-life cycle supervision and pre-evaluation supervision. The European Medicines Agency (EMA) has published a Draft Reflection Paper on the use of AI to support drug development, regulation and use, with an emphasis on improving the credibility of AI to ensure patient safety and the integrity of clinical research results. Overall, the EU’s regulatory approach is gradually taking shape, and the final implementation details may be more detailed and strict. In stark contrast to the EU’s stringent regulation, the UK’s AI regulatory blueprint makes clear that the government plans to take a soft approach and not enact new bills or set up new regulators for now.

In China, the Medical Device Technical Review Center (NMPA) of the National Medical Products Administration has previously issued documents such as “Review Points of Deep Learning Assisted Decision Software”, “Guiding Principles for the Registration Review of Artificial Intelligence Medical Devices (Draft for Comment)” and “Circular on Guiding Principles for the Classification and Definition of Artificial Intelligence Medical Software Products (No. 47 in 2021)”. This year, the “Summary of the first medical device product classification results in 2023″ was released again. This series of documents makes the definition, classification and regulation of artificial intelligence medical software products clearer and easier to operate, and provides clear guidance for the product positioning and registration strategies of various enterprises in the industry. These documents provide a framework and management decisions for the scientific regulation of AI medical devices. It is worth looking forward to that the agenda of the China Medical Artificial Intelligence Conference held in Hangzhou from December 21 to 23 set up a special forum on digital medical governance and high-quality development of public hospitals and artificial intelligence medical device testing and evaluation technology standardization industry development forum. At that time, officials from the National Development and Reform Commission and the NMPA will attend the meeting and may release new information.


In 2023, medical AI has begun to integrate into the entire medical upstream and downstream process, covering hospital data collection, fusion, analysis, diagnosis and treatment, and community screening, and organically collaborate with medical/disease control workers, showing the potential to bring well-being to human health. Usable medical AI research is beginning to dawn. In the future, the progress of medical AI not only depends on the technological development itself, but also needs the full cooperation of industry, university and medical research and the support of policy makers and regulators. This cross-domain collaboration is the key to achieving AI-integrated medical services, and will certainly promote the development of human health.

Post time: Dec-30-2023