NLP @ IDSIA

Introduction
Our Research
Selected publications
Team Members
Group news
- 2026
- 2025
- 2024
- Past
Our Projects
How to find us
- Address
- Contact

Introduction

This web site is an entry point for NLP research at IDSIA. The NLP group at IDSIA has been established in 2019. Together with our host Institute (IDSIA), we share a joint affiliation with the University of Applied Sciences and Arts of Southern Switzerland (SUPSI) and the Università della Svizzera Italiana (USI).

The dual nature of IDSIA (basic research and technology transfer) allows us to perform cutting edge state-of-the-art research, and at the same time requires us to collaborate with local and national companies in order to bring these technologies into practical usage.

Follow us on Twitter/X: @idsianlp

Our Research

We combine an understanding of the nature of natural language (human language) with expertise in the most recent techniques in the field of Natural Language Processing (NLP), in particular transformer-based architectures (including Large Language Models).

We apply our expertise to basic research and applied projects in collaboration with industry, in many cases funded by the Swiss Innovation Agency (InnoSuisse). See below some selected examples of recent projects.

A specific area of research interest is biomedical text processing for different textual domains, such as the scientific literature, clinical reports, and social media. We are also working on applications of NLP deep learning models (LLMs) in the financial domain, in collaboration with the Swiss banking industry.

During the COVID-19 pandemic we performed several biomedical text mining activities in support of COVID-19 research, in particular:

Processing biomedical literature about COVID-19.
Monitoring Twitter conversations about COVID-19.
Collaborating at a repository of COVID-19 literature with classification into clinically relevant-categories and translations in Spanish.

Selected publications

Follow this link for the full list of publications. Below you can find a few selected publication.

Kanjirangat, V., Samardzic, T., Dolamic, L. and Rinaldi, F. Tokenization and Representation Biases in Multilingual Models on Dialectal NLP Tasks. EMNLP 2025. http://arxiv.org/abs/2509.20045 (SAC award)
Giovanni Profeta, Joseph Cornelius, Fabio Rinaldi. Enhancing the study of historical figures through AI-powered interactive data visualizations. DH2025 - Digital Heritage International Congress, Siena 2025. https://doi.org/10.2312/dh.20253150
Cornelius J, Detering H, Lithgow-Serrano O, Agosti D, Rinaldi F, Waterhouse R (2025) From literature to biodiversity data: mining arthropod organismal traits with machine learning. Biodiversity Data Journal 13: e153070. DOI: 10.3897/BDJ.13.e153070
Joseph Cornelius, Oscar Lithgow-Serrano, Sandra Mitrovic, Ljiljana Dolamic, and Fabio Rinaldi. 2024. BUST: Benchmark for the evaluation of detectors of LLM-Generated Text. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 8029–8057, Mexico City, Mexico. Association for Computational Linguistics. doi: 10.18653/v1/2024.naacl-long.444 https://github.com/IDSIA-NLP/BUST
Anastassia Shaitarova, Jamil Zaghir, Alberto Lavelli, Michael Krauthammer, Fabio Rinaldi. Exploring the Latest Highlights in Medical Natural Language Processing across Multiple Languages: A Survey. IMIA Yearbook of Medical Informatics, 2023 December 2023 Yearbook of Medical Informatics 32(01):230-243 doi: 10.1055/s-0043-1768726
Vani Kanjirangat, Tanja Samardžić, Ljiljana Dolamic, Fabio Rinaldi (2023). Optimizing the Size of Subword Vocabularies in Dialect Classification. VarDial 2023, ACL 2023. doi: 10.18653/v1/2023.vardial-1.2
Kanjirangat,V., Samardzic,T., Rinaldi,Fabio., Dolamic,Ljiljana. (2022). Early Guessing for Dialect Identification. To appear in In Findings of The 2022 Conference on Empirical Methods in Natural Language Processing.
Lenz Furrer, Joseph Cornelius, Fabio Rinaldi. Parallel sequence tagging for concept recognition. BMC Bioinformatics volume 22, Article number: 623 (2021). doi: 10.1186/s12859-021-04511-y
Roberto Zanoli, Alberto Lavelli, Theresa Löffler, Nicolas Andres Perez Gonzalez, Fabio Rinaldi. An annotated dataset for extracting gene-melanoma relations from scientific literature. Journal of Biomedical Semantics, volume 13, Article number: 2 (2022). doi: 10.1186/s13326-021-00251-3

Team Members

Researchers

Research Assistants

Behnaz Rezaeifar

Associated group members

Nico Colic

Former members and temporary visitors

Group news

2026

[2026-01-25 Sun] The new TaxoMine project (provisional name) will start in March! Two positions available.

2025

[2025-11-07 Fri] Our EMNLP 2025 paper on Tokenizer Fairness received a SAC award! (Senior Area Chair award)
[2025-09-26 Fri] New SNF project approved: we will continue our successful collaboration with Robert Waterhouse on taxonomic-based information extraction from the scientific literature.
[2025-09-22 Mon] Tanja Samardžić and Samuel Corecco will join the group soon.
[2025-07-22 Tue] press release about the QUADRATIC project https://www.supsi.ch/it/l-ia-a-servizio-della-farmacosorveglianza
[2025-04-30 Wed] We are featured in a new New IDSIA promotional video
[2025-02-10 Mon] Dr. Rinaldi and Prof. Rizzoli (IDSIA director) interviewed about Deepskek https://www.supsi.ch/en/deepseek-parliamone

2024

[2024-11-20 Wed] The InnoSuisse project AutoDischarge has been approved!!!
[2024-09-28 Sat] The SNF project M2P2 has been approved!!! We are looking forward to a collaboration with Prof. Michael Krauthammer (UZH) and Prof J.L. Raisaro (CHUV) to increase the impact of modern NLP technologies in the Swiss health sector!
[2024-09-16 Mon] Invited presentation at the "Giornata della democrazia", Locarno.
[2024-03-01 Fri] Our article on evaluating detectors of LLM-generated text accepted at NAACL 2024! BUST: Benchmark for the evaluation of detectors of LLM-Generated Text Joseph Cornelius, Oscar Lithgow-Serrano, Sandra Mitrovic, Ljiljana Dolamic, Fabio Rinaldi
[2024-02-01 Thu] During 2024 our participation in the Swiss AI initiative will be a core focus of our activity. The Swiss AI initiative is a Swiss-wide consortium to develop innovative AI applications using the new powerful infrastructure Alps, provided by the Swiss National Supercomputing Centre.

Past

See here.

Our Projects

We execute several technology transfer projects in collaboration with Swiss companies, with the aim of bringing the benefits of advanced NLP technologies into an industrial context.

We also have a few pure research projects, exploratory in nature. Our main research interest is NLP applications in the health area. Check in particular the SNF-funded projects QUADRATIC and M2P2.

Below you can find some representative examples of the projects we are involved in. This is not an exhaustive list (partially because for contractual reasons we are not allowed to mention some projects).

Active (selected active project, as of Sep 2025)

SNF/M2P2

Medical, Multilingual and Privacy-Preserving Natural Language Processing in the clinical domain (M2P2)

AutoDischarge

Semi-automated generation of discharge summaries (AutoDischarge).

Swiss AI initiative (2024)

Coordination of IDSIA activities in relation to the Swiss AI Initiative

The National Supercomputing Center (CSCS) is performing a major upgrade of its infrastructure. The new ALPS infrastructure, which will be capable of supporting the development of innovative AI applications, such as Large Language Models, will become available early next year, and the Swiss academic community is organizing itself to make use of it. Working groups are being formed across Switzerland to deal with different potential applications (from the development of a foundational model to specific applications in science, education, medicine, etc). The purpose of this project is to coordinate IDSIA's participation in the Swiss AI initiative.

Recent

SNF/QUADRATIC (2024)

NLP in support of Pharmacovigilance: QUality Adverse Drug Reaction AcTIve Control (QUADRATIC)

Mini-MUSE (2023-2024)

AI-based visualization methods to explore digitized publications

WRSD (2022-2024)

Identificazione del Rischio e Prevenzione di Disordini dovuti allo Stress nell’ambiente lavorativo.

https://sites.supsi.ch/meditech/progetti/Wrsd.html

Brisk.AI (2023-2024)

This is a small project in collaboration with Dr. Yalbi Itzel Balderas-Martínez of the National Institute of Respiratory Diseases-Mexico (INER) in Mexico, aiming at using AI techniques to produce translated and simplified versions of scientific literature, for educational purposes.

Previous projects

A list of all current and previous projects can be found here.

How to find us

We are based at the Dalle Molle Institute for Artificial Intelligence (IDSIA), in Lugano, Switzerland.

Address

Click here to find our location on a map

Dalle Molle Institute for Artificial Intelligence Research /
Istituto Dalle Molle di studi sull’intelligenza artificiale (IDSIA)
IDSIA USI-SUPSI

Polo universitario Lugano - Campus Est
Via la Santa 1
CH-6962 Lugano - Viganello

Contact

Dr. Fabio Rinaldi
E-Mail: fabio AT idsia.ch
Tel: +41 (0)79 300 67 71
Skype: fabio.rinaldi.uzh