PROJECTS

Table of Contents

Our Projects

We execute several technology transfer projects in collaboration with Swiss companies, with the aim of bringing the benefits of advanced NLP technologies into an industrial context.

We also have a few pure research projects, exploratory in nature. Our main research interest is NLP applications in the health area. Check in particular the SNF-funded projects QUADRATIC and M2P2.

Below you can find some representative examples of the projects we are involved in. This is not an exhaustive list (partially because for contractual reasons we are not allowed to mention some projects).

Active (as of Oct 2024)

SNF/M2P2

Medical, Multilingual and Privacy-Preserving Natural Language Processing in the clinical domain (M2P2)

SNF/QUADRATIC (2024)

NLP in support of Pharmacovigilance: QUality Adverse Drug Reaction AcTIve Control (QUADRATIC)

Project in collaboration with EOC.

https://data.snf.ch/grants/grant/220564

Swiss AI initiative (2024)

Coordination of IDSIA activities in relation to the Swiss AI Initiative

The National Supercomputing Center (CSCS) is performing a major upgrade of its infrastructure. The new ALPS infrastructure, which will be capable of supporting the development of innovative AI applications, such as Large Language Models, will become available early next year, and the Swiss academic community is organizing itself to make use of it. Working groups are being formed across Switzerland to deal with different potential applications (from the development of a foundational model to specific applications in science, education, medicine, etc). The purpose of this project is to coordinate IDSIA's participation in the Swiss AI initiative.

WRSD (2022-2024)

Identificazione del Rischio e Prevenzione di Disordini dovuti allo Stress nell’ambiente lavorativo.

https://sites.supsi.ch/meditech/progetti/Wrsd.html

Brisk.AI (2023-2024)

This is a small project in collaboration with Dr. Yalbi Itzel Balderas-Martínez of the National Institute of Respiratory Diseases-Mexico (INER) in Mexico, aiming at using AI techniques to produce translated and simplified versions of scientific literature, for educational purposes.

Finished

PREGAMUS (2023)

This is a small project in collaboration with Dr. Jin-Dong Kim of Database Center for Life Sciences (Japan), aiming at using AI techniques to produce translated and simplified versions of scientific literature, for educational purposes.

ELDI (2023)

This project is partly a continuation of the INCdid project. The goal of this project is to develop methods for dialect identification of small samples of text (e.g. social media posts, short messages), focusing specifically on variants of Arabic.

DAG-MTSM (2023)

  • People: Joseph, Oscar, Fabio, Sandra

This project is in part a continuation of MisInfoCOV, in that it aims to develop and assess techniques for the detection of misinformation. Since the public awareness of the power of Large Language Models has greatly increased, so have been opportunities to create false and misleading information. One additional goal of this project is to develop and asses techniques that are capable of identifying text generated by LLMs.

ArthroTraitMine (2021-2023)

This research project proposes to leverage the power of approaches from next-generation text mining with artificial intelligence and machine learning methods, and apply this to kick-start the construction of a comprehensive resource of trait data across Arthropoda. Methods developed for text analytics and natural language processing of biomedical literature will be leveraged to achieve three major goals: to develop a codebase with informatics workflows that collate and assess biological trait data; to build the first large-scale standardised database that collates arthropod trait data from a wide range of sources; and to develop an arthropod trait ontology to power both mining efforts and future research through large-scale quantitative analyses of trait data.

Past

MisInfoCOV (2021-2022)

  • People: Joseph, Oscar, Fabio

In recent years we have witnessed a combination of an enormous amount of fake or misleading information disseminated through social media. During the current COVID-19 pandemic the problem has been particularly noticeable. Wrong and misleading information can spread extremely rapidly, potentially causing serious harm, a problem which has been termed as an infodemic. In this project, we aim to investigate the state of the art and to establish the baselines for the two research questions, namely: "Identification of the misinformation in social media" and "Identification of stance and sentiment of the public towards public policies and controversial statements."

BERGAMOS (2021-2022)

The project BERGAMOS (Biomedical Entry Repository for General Annotations that are Machine-readable, Open and Searchable) is funded by a "Bridging Grant" of SERI for collaborations with East Asian countries. In particular we collaborate with the Database Center for Life Sciences (DBCLS) based in Kashiwa, Chiba prefecture, Japan.

In this project, we register our entity recognition pipeline OGER as an annotation service for PubAnnotation, an online repository for annotations of biomedical literature developed at DBCLS. Through the proposed work, biomedical researchers will be able to have their collections of PubMed articles on PubAnnotation automatically be annotated through OGER. Furthermore, this will facilitate compatibility of PubAnnotation with other annotation services similar to OGER. Ultimately, this foundational work will allow us to make PubAnnotation a standard repository where researchers can easily obtain annotations to fuel their machine learning algorithms and evaluate them.

INCdid (2021-2022)

The goal of this project is to develop methods for dialect identification of small samples of text (e.g. social media posts, short messages) under various circumstances, focusing especially on noisy text and language similarity.

Social Media Mining for health (2020-2021)

  • People: Joseph, Fabio

Social media platforms offer extensive information about the development of the COVID-19 pandemic and the current state of public health. In recent years, the Natural Language Processing community has developed a variety of methods to extract health-related information from posts on social media platforms. In order for these techniques to be used by a broad public, they must be aggregated and presented in a user-friendly way. We have aggregated ten methods to analyze tweets related to the COVID-19 pandemic, and present interactive visualizations of the results on our online platform, the COVID-19 Twitter Monitor.

Mining patient insights in social media conversations (2019-2021)

  • People: Joseph, Fabio

We have established a collaboration with Roche in the area of social media and web monitoring, to harness patient insights for the novel and transformative concept of patient-centered drug development. We contribute advanced Information Extraction components to help leverage these insights to increase the efficacy and efficiency of the company’s R&D.

SwissMADE (2017-2021)

  • People: Nico, Fabio

SwissMade stands for "Swiss Monitoring of Adverse Drug Reactions". The full title of the project is "Automated detection of adverse drug events from older inpatients’ electronic medical records using structured data mining and natural language processing."

This project is part of the National Research Programme (NRP) 74, "Smarter Health Care". It is a collaboration with five Swiss Hospitals. The goal is to use NLP techniques and data mining in order to extract useful information from electronic medical records.

LifeLike/BOOST (2020-2021)

  • People: Oscar, Denis, Fabio

SkillGym (https://www.skillgym.com/) is a computer-based training system that enables in-role and prospective leaders to develop their communication skills by presenting them with realistic simulations of workplace situations. SkillGym walks the end user through a sequence of videos related to a specific management situation by showing a rich set of alternatives as text boxes. SkillGym also provides extensive feedback, which enables users to review a conversation step by step, and learn the implications of their behavior at each step.

Feedback from SkillGym users praises its engaging training environment. To make simulations even more realistic, our goal is to move from the existing point-and-click interface to a voice-based interface. Achieving this goal requires cutting-edge natural language understanding to interpret the user input in the context of the ongoing flow of the simulated interaction. Our proposed solution is to carry out feature extraction based on the output of a commodity speech-to-text engine so that a dialog state tracker can select the next video based on the user input. Notably, the user must be guided through textual hints to ensure that she provides input that is coherent with the training goals of SkillGym. Moreover, the dialog state tracker must handle all situations where the user input is not aligned with the training goals (e.g. off-topic comments, disambiguation).

StageAI (2019-2020)

  • People: Denis, Sandra, Vani, Fabio

In this project, we focus on conversational recommender systems that allow users to specify their preferences through a sequence of dynamically customized interactions, as contrasted to traditional ones. In particular, we seek to improve an online recommendation platform of Stagend (stagend.com) that aims at finding the most suitable performer ("an item") for a particular event specified by an event organizer ("a user"). In the first phase, an adaptive, Bayesian methods-based approach was used to sequentially update the model given a new piece of information, e.g. performer's answer to organizer's question. However, in a real-time setting, delayed/incomplete interactions (e.g. missing reply), can hamper the system efficiency.

To overcome this issue, and also to avoid unnecessary burden on performer (in cases when the answer is already available in performer's biography or previous events' conversations), we investigate the ways of enhancing the Bayesian approach with NLP methods. Specifically, we adopt a question-answering BERT-based approach to either provide a confident automated answer based on the existing information, or to indicate uncertainty and thus, the necessity of contacting the performer. Additionally, given that Stagend operates in multilingual markets, we benchmark different multilingual models such as multilingual BERT and XLM-RoBERTa, as well as compare these with separate language models per each of the target languages (DE + Swiss DE challenge, FR, IT, EN).

TalentScout (2020)

  • People: Claudio, Fabio

In a collaborative project with a major pharma company we explored name entity recognition (NER) strategies applied to job/resume mining tasks. In the project we leveraged advanced NER approaches in order to identify job titles, organization names, and geographical locations which are the essential parts of a job mining task, such as recruiting, tracking job candidates and job recommendation. This process is currently based on the manual analysis of hundreds of CVs, often with no relevance for a specific position or a profile.

Despite the existence of many commercial providers of similar services, there are no publicly available datasets to evaluate the advertised algorithms. The existing pre-trained NER models such as spaCy models, and Stanford NER models were trained on blogs, news and media. Their performance drops significantly when applied on the sentences taken from the resumes, since titles, locations and organization names in a resume are often written in the manner of a heading.

Our approach outperforms pre-trained models by a significant margin. Our NER models have been integrated in a prototype system which demonstrates a more dynamic and flexible data analysis compared to baseline commercial solutions.

Previous projects

Projects conducted by Dr. Fabio Rinaldi before he joined IDSIA can be found here: http://www.ontogene.org/

In particular the last of these projects (MelanoBase) continued to generate output well into 2022, check for example this screencast published by the Swiss Institute of Bioinformatics! [2022-08-24 Wed]

Author: Fabio Rinaldi

Created: 2024-10-24 Thu 19:57

Validate