Tanja Samardzic

Table of Contents

Short Bio

tanja.jpg

Tanja Samardžić is a senior researcher at IDSIA NLP Group working on the structure of text data and multilingual text processing. She holds a PhD in Computational linguistics from the University of Geneva, where she studied in the group Computational Learning and Computational Linguistics (CLCL). Before joining IDSIA in 2025, she was the Head of the Text Group and a lab director (alternating) of the Language and Space Lab at the University of Zurich (2013-2024), a Visiting Scholar at the University of Cambridge (2024) and a Visiting Researcher at the IT University Copenhagen (2022). Her research is driven by the idea that establishing scientific facts about information density in text data is a way towards more multilingually fair, robust and sustainable language modelling.

External links

Recent publications

  • Olga Pelloni, Rob van der Goot, Peter Ranacher, Ivan Vulic and Tanja Samardžić 2025 Subword symmetry in natural languages. Royal Society Open Science.12: 250295, http://doi.org/10.1098/rsos.250295
  • Zachary William Hopton, Yves Scherrer, and Tanja Samardžić. 2025. Functional Lexicon in Subword Tokenization. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 7839–7853, Albuquerque, New Mexico. Association for Computational Linguistics.
  • Tanja Samardžić, Ximena Gutierrez, Christian Bentz, Steven Moran, and Olga Pelloni. 2024. A Measure for Transparent Comparison of Linguistic Diversity in Multilingual NLP Data Sets. In Findings of the Association for Computational Linguistics: NAACL 2024, pages 3367–3382, Mexico City, Mexico. Association for Computational Linguistics.
  • Ximena Gutierrez-Vasques, Christian Bentz, and Tanja Samardžić. 2023. Languages Through the Looking Glass of BPE Compression. Computational Linguistics, 49(4):943–1001.
  • Olga Pelloni, Anastassia Shaitarova, and Tanja Samardžić. 2022. Subword Evenness (SuE) as a Predictor of Cross-lingual Transfer to Low-resource Languages. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 7428–7445, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.

Contact

E-Mail: tanja.samardzic AT supsi.ch

NLP group page

Author: Tanja Samardzic

Created: 2025-09-26 Fri 16:32

Validate