Development of automated LLM knowledge graph extraction to inform clinical research of infectious and immune mediated diseases (Topic 136)

Abstract The ability to use the vast amount of available data to inform biomedical research from basic through clinical discovery and development is increasingly daunting.  While (Artificial Intelligence) AI can provide a powerful means for distilling such information, representation of the data in formats that allow for understanding remains a challenge. Knowledge Graphs (KGs) can help solve this problem through meaningful data representation. We postulate that the advancements in large language models, with algorithmic fine tuning will result in automated KGs at super-human levels. We plan to develop an advanced Natural Language Processing (NLP) pipeline and accompanying web service, on our existing commercial software platform to streamline the process of knowledge distillation for researchers in the infectious- and immune-mediated diseases community. For Phase I, we will create a specific infections and immune mediated disease Large Language Model (LLM) with the capabilities of knowledge extraction from unstructured PDF publications.  Our software is designed to be agnostic of specific existing KGs, ensuring effortless integration into any KG framework. The ability to extract and data from published literature will be supplemented with additional data sets including clintrials.gov and anonymized patient medical data to support clinical research programs.