A review of the IFB AI hackathon - IFB

From 1 to 3 June 2026, IFB/ELIXIR-FR brought together around thirty participants at the CNRS campus in Villejuif for a hackathon dedicated to artificial intelligence applied to biology, combining presentations and technical sessions.
Organised as part of the IFB’s strategic AI focus and the MUDIS4LS project, this event brought together teams from the IFB-core, member platforms, contributors and associated teams with the following objectives: to take stock of AI practices and tools within the IFB and its platforms, and to outline a collective strategy for the coming years.

From the state of the art to current projects: AI at IFB/ELIXIR-FR

The first half-day was devoted to an overview of the current state of artificial intelligence in biology. Following an overview entitled ‘Panorama and prospects for the IFB’ led by Christophe Blanchet, Alban Gaignard and Jacques van Helden, Romuald Marin (IFB/ELIXIR-FR) presented the fundamentals of LLMs (tokenisation, embeddings, training, prompting), whilst highlighting their limitations: computational cost, hallucinations, non-determinism and sensitivity to query phrasing. Finally, Nicolas Servant (Institut Curie) presented various concrete and illustrated applications of RAG (Retrieval-Augmented Generation).

Several internal initiatives at the IFB were then presented. Romuald Marin introduced ViromeChat, a chatbot designed to explore interactions between viruses, their hosts and their environments, enabling users to query taxonomic and host-virus interaction databases and answer questions from virologists. Anakim Gualdoni then presented his PhD project on the use of LLMs to make predictions about fungal species, as well as ideas for enriching the metadata in the madbot database. Alban Gaignard, meanwhile, explained how LLMs and ontologies can coexist, and more specifically the ability to populate knowledge graphs from text, or conversely to query an LLM via requests.

A look back at these days in pictures. ©️IFB/ELIXIR-FR

Six use cases developed during the hackathon

On the second day, the group hacking sessions provided an opportunity to work on six of the eight planned use cases, illustrating the wide range of potential applications for LLMs within the bioinformatics ecosystem:

The EDAM-terms-recommender group, led by Alban Gaignard and Baptiste Rousseau, has been working on the automatic annotation of bioinformatics tools using the EDAM ontology via an adapter that allows switching between a local LLM (BioMistral), the Albert API (academic API) and the Groq API (commercial LLM). Assessing the relevance of annotations remains a challenge, particularly due to the hierarchy of EDAM classes. The code is available on GitHub.

The working group on ‘Using AI to facilitate and improve concept definitions in EDAM’, led by Jacques van Helden, explored how a large language model (LLM) can help revise and enrich the terms of an ontology, using transcriptional regulation as an example. ChatGPT proved most effective following iterative dialogue, producing a table of relevant but incomplete results, whilst Albert demonstrated limitations related to context size (3,400 lines of EDAM). Moving forward, the aim would be to investigate the possibility of conducting these analyses on sovereign AI systems.

The AI Assistant for Tool Selection in Biosphere group, led by Matis Zouari, Audrey Bihouée, Christophe Blanchet and Hervé Ménager, set out to develop a chatbot capable of guiding Biosphere users to the virtual machines best suited to their needs. The results are encouraging, although there were a few instances of the chatbot producing incorrect responses to complex queries that were not included in the limited dataset used. Link to the GitLab repository

The Spatial RAG for Earth Virome Exploration group, led by Paul Tissot, Pauline Le-Corre and Romuald Marin, has developed an agent that queries a dataset comprising 6 million lines of BioSample metadata from the Virome@tlas project. Three tools have been developed: the first produces a textual description based on its identifier; the second generates a summary of samples for a given country; the third searches for viruses present in the vicinity of a city by calculating distances.
Translated with DeepL.com (free version) Link to the GitLab repository

The Benchmarking Group for Nextflow Workflow Implementation Solutions, led by Philippe Hupé, Frédéric Jarlier, Nicolas Servant, Corentin Raoux, Baptiste Roelens, Fabrice Leclerc and Quentin Duvert, compared Albert, Seqera AI, Claude and Gemini in generating a complete single-cell RNA-seq pipeline. Result: the most recent and largest models deliver better results, pre-planning with AI significantly improves code quality, and commercial models continue to outperform academic solutions. None of the tools tested produced a functional pipeline.

The Microbiome Metadata group, led by Hélène Chiapello, Nicolas Pons, Liliana Ballesteros-Mejia, Alban Gaignard, Imane Messak and Thomas Denecker, has tested the ability of large language models (Perplexity / Mistral, Albert API) to automatically extract metadata on microbiome samples from scientific articles. This work forms part of a constellation of complementary projects already underway within the community (FAIR-Checker, MIASSM Cloud4Sams, MicrobiomeSchemas, the MNHN biodiversity repository, madbot).

For more information, please see the summary of these different use cases.

Strategic discussions on building academic AI

The final morning session, chaired by Jacques van Helden, provided an opportunity to raise key questions regarding the future of the IFB in relation to AI. Several challenges were identified: scaling up and the cost of GPU resources, the choice between open models and commercial solutions, the need for reference datasets to objectively evaluate and compare models, and the growing importance of benchmarking.

In the shorter term, participants agreed on the need to pool expertise and models across projects, to develop a future training programme for the bioinformatics community, and to coordinate the IFB’s AI initiatives with those of other French and European stakeholders.