July 6, 2024

Assessing The Biological Reasoning Abilities Of Large Language Models

Researchers from the University of Georgia and Mayo Clinic have conducted a study to evaluate the biological knowledge and reasoning skills of various large language models (LLMs). In their paper, which has been pre-published on the arXiv server, the researchers found that OpenAI’s model GPT-4 performs better than other leading LLMs on biology-related reasoning tasks.

Large language models are advanced deep learning algorithms that can generate texts in response to prompts. They have gained popularity in various applications, such as document summarization, brand name generation, and question answering. However, their ability to reason about biological concepts has not been extensively studied until now.

The researchers aimed to assess and compare the performance of different LLMs, including GPT-4, GPT-3.5, PaLM2, Claude2, and SenseNova, in comprehending and reasoning through biology-related questions. They designed a 108-question multiple-choice test covering various areas, such as molecular biology, biological techniques, metabolic engineering, and synthetic biology.

Multiple-choice tests were chosen as the evaluation method because they allow for easy grading and comparison of LLMs’ performance. The researchers asked each LLM the same questions multiple times with different phrasings to assess both their average performance and variation in answers. This approach aimed to mimic real-world scenarios where questions may be asked in different ways.

The results of the study suggest that LLMs, particularly GPT-4, show promising abilities in responding to biology-related questions and accurately understanding concepts rooted in fundamental molecular biology and other subfields. GPT-4 achieved an average score of 90 on the multiple-choice tests, surpassing the other models.

GPT-4 also demonstrated consistency across the trials, indicating its reliability in biology reasoning compared to its peers. These findings highlight the potential of GPT-4 in assisting biology research, education, and the development of interactive learning tools and testable hypotheses.

The researchers believe that the integration of advanced AI, particularly LLMs, with the field of biology can lead to important scientific discoveries and advancements in education. They plan to conduct further studies, focusing on overcoming computational demands and privacy concerns associated with using GPT-4. This may involve creating open-source LLMs for tasks like gene annotation and phenotype-genotype pairing.

Additionally, the researchers aim to explore the multimodal analysis capabilities of GPT-4V, the vision-enabled version of GPT-4. They will investigate the chemical and biosynthetic pathways of natural drug molecules, particularly those with unknown biosynthetic pathways, to advance drug discovery and development in synthetic biology.

This study represents a significant step in leveraging the potential of advanced AI, specifically LLMs, in the complex field of biology. It not only positions AI as a supportive tool but also as a central element in navigating and deciphering the vast biological landscape. The researchers are hopeful that further advancements in LLMs and their training on biological data will lead to groundbreaking scientific breakthroughs and the creation of more advanced educational tools.

*Note:
1. Source: Coherent Market Insights, Public sources, Desk research
2. We have leveraged AI tools to mine information and compile it