Shenbin Qian's Homepage

βœ‰ πŸ”
Shenbin Qian

Shenbin Qian

Postdoc Researcher

University of Oslo

Biography

I am a postdoc researcher at the Language Technology Group, University of Oslo, as part of the Maria SkΕ‚odowska-Curie Action (MSCA) programme DSTrain. My current research focuses on the evaluation and explainability of large language models (LLMs). I earned my PhD in Natural Language Processing (NLP) at the University of Surrey, where I mainly worked on the evaluation of machine translation for user-generated content using deep learning methods. During my doctoral studies, I also contributed to projects in diverse machine learning fields, including machine translation evaluation, information retrieval, text‐to‐image generation, and video understanding.

During my undergraduate years, I was a linguist in language and translation studies. Later, I self-taught programming languages and nurtured my interest in NLP after joining a technology company in 2017. I started applying statistical and deep learning methods to language and vision data during my MSc programme at the University of Exeter. My interest now is in investigating different aspects of LLMs and the application of LLMs to various NLP subdomains, including the explainability and fairness of LLMs, and evaluation of multilinguality of LLMs, etc.

Interests

πŸ”  Multilingual Evaluation πŸ€– LLM Evaluation πŸ“ LLM Explainability 😊 Sentiment Analysis πŸ’» Machine Translation πŸ” Information Retrieval πŸ“Š Sequence Labeling πŸ—‚οΈ Multimodal Generation

Education

πŸŽ“

PhD in Machine Translation and Natural Language Processing, 2025

University of Surrey, UK

πŸŽ“

MSc in Applied Data Science and Statistics, 2022

University of Exeter, UK

πŸŽ“

MA in Translation and Interpreting, 2017

Xihua University, China

πŸŽ“

BA in English, 2014

Zhijiang College, Zhejiang University of Technology, China

Recent News

[11/07/25] πŸŽ‰ Secured a MSCA postdoc fellowship on LLM evaluation at the University of Oslo.
[08/07/25] πŸ₯³ Our paper ALOPE: Adaptive Layer Optimization for Translation Quality Estimation using Large Language Models has been accepted by COLM 2025. Congrats to the team!
[26/04/25] πŸ₯³ Our paper NEAR2: A Nested Embedding Approach to Efficient Product Retrieval and Ranking has been published on SIGIR-e-Com'25. Congratulats to the team!
[March 25] πŸŽ‰ Successfully defended PhD thesis: Evaluating Machine Translation of Emotion-loaded Chinese User-generated Content
[Jan 25] Started a new research project on evaluating video understanding models, funded by RNIB.
[30/10/24] Invited talk for a research seminar on "Evaluating MT of Chinese User-generated Content" by @STINGSwansea

Recent Publications

View all publications

Adaptive Layer Optimization for Translation Quality Estimation using Large Language Models

Archchana Sindhujan, Shenbin Qian, Chan Chi Chun Matthew, Diptesh Kanojia, Constantin Orasan

Accepted by the Second Conference on Language Modeling (COLM 2025)

NEAR2: A Nested Embedding Approach to Efficient Product Retrieval and Ranking

Shenbin Qian, Diptesh Kanojia, Samarth Agrawal, Hadeel Saadany, Swapnil Bhosale, Constantin Orasan, Zhe Wu

SIGIR-e-Com'25: SIGIR Workshop on eCommerce

Automatically Generating Chinese Homophone Words to Probe Machine Translation Estimation Systems

Shenbin Qian, Constantin Orasan, Diptesh Kanojia, FΓ©lix Do Carmo

Proceedings of the Tenth Workshop on Noisy and User-generated Text (W-NUT 2025)

What do Large Language Models Need for Machine Translation Evaluation?

Shenbin Qian, Archchana Sindhujan, Minnie Kabra, Diptesh Kanojia, Constantin Orăsan, Tharindu Ranasinghe, Fred Blain

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024)

A Multi-task Learning Framework for Evaluating Machine Translation of Emotion-loaded User-generated Content

Shenbin Qian, Constantin Orăsan, Diptesh Kanojia, Félix do Carmo

Proceedings of the Ninth Conference on Machine Translation (WMT24)

Using character-level models for efficient abbreviation and long-form detection

*Leonardo Zilio, *Shenbin Qian, Diptesh Kanojia, and Constantin Orasan

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Research Projects

Evaluating Video Understanding Models

Jan - September 2025 | Funded by RNIB

This project aims to evaluate the textual descriptions generated by state-of-the-art video understanding models. These textual descriptions are used to synthesize audios to describe the scenes in TV series for the visually impaired people. This project is led by Prof Sabine Braun, funded by the Royal National Institute of Blind People (RNIB) in UK, in collaboration with BBC. I am responsible for video data processing, finding and running state-of-the-art video understanding models, and helping design prompts to generate audio descriptions for human evaluation.

E-commerce Product Retrieval

Jan 2023 - Dec 2024 | Funded by eBay

This project aimed to develop efficient and effective language models and build lightweight contextual character-based embeddings for product retrieval. This research aimed to improve search accuracy and relevance on eBay's e-commerce platforms. The project was funded by eBay and led by Dr Diptesh Kanojia. I was one of the main investigator for developing algorithms based on eBay's e-commerce datasets to achieve efficient and effective product retrieval and ranking.

Post-editing Translations of User-generated Content

Jan - Dec 2023 | Funded by EAMT

I led this student project funded by the European Association for Machine Translation (EAMT) to collect post-edited reference translations from professional translators to evaluate machine translation of emotion-loaded user-generated content. I utilized this dataset to train automatic machine translation evaluation systems in my PhD. The dataset is publicly available at the GitHub repository of Surrey NLP.

Using NLP to Help People Access Justice

Mar - Jun 2022 | Funded by University of Surrey

This project investigated ways in which NLP technologies can help improve an award-winning tool developed by Monaco Solicitors, a solicitor company, to enable users to better understand and implement their legal rights freely. This project was funded by the University of Surrey and led by Prof Constantin Orasan. I was the main investigator to explore the topics of the letters from customers of the solicitor company using topic modeling.

Professional Activities

Reviewer/Program Committee Member

  • ACL ARR reviewer since June 2024
  • COLING 2022, 2024, 2025
  • MT Summit 2025
  • Computer Speech & Language 2022, 2024

Work Experience

Research Assistant in Natural Language Processing

University of Surrey | Jul - Dec 2024

Trained language models on e-commerce data in eBay to enhance product retrieval and ranking efficiency.

Resource Engineer & Project Manager

Lancoo Group, China | Jan 2017 - Dec 2020

Established and maintained annotated-feature databases for rule-based parsing systems, achieving accuracy of 99%, 98% and 97% respectively for English words, phrases and sentence patterns.

Recent Honors & Awards

EAMT Award for 2022 Sponsorship of Student Activities

Jan 2023

As part of its commitment to promote research, development and awareness about translation technologies, the European Association for Machine Translation (EAMT) funded MT-related activities led by students during 2023.

Winner of the MSc Project Award

Apr 2022

Awarded to the student with the best project for the year on a Mathematics MSc Programme at the University of Exeter.

Global Excellence Scholarship

Apr 2020

Selected based on academic merit and the ability to demonstrate academic ambition and future career goals at the University of Exeter.

Prize for the "Best Team of the Year"

Jan 2020

Awarded by Lancoo Group for outstanding team performance and project management.

Key Collaborators

Constantin Orasan

Constantin Orasan

University of Surrey

Professor of Language and Translation Technologies

Diptesh Kanojia

Diptesh Kanojia

University of Surrey

Senior Lecturer in People-Centred AI

Yves Scherrer

Yves Scherrer

University of Oslo

Associate Professor in Natural Language Processing

FΓ©lix do Carmo

FΓ©lix do Carmo

University of Surrey

Senior Lecturer in Translation and Natural Language Processing

Leonardo Zilio

Leonardo Zilio

UniversitΓ© catholique de Louvain

Assistant Professor in Translation Studies

Sui He

Sui He

University of Swansea

Senior Lecturer in AI-Aided Intercultural Communication

Tharindu Ranasinghe

Tharindu Ranasinghe

Lancaster University

Lecturer in Security and Protection Science

Archchana Sindhujan

Archchana Sindhujan

University of Surrey

PhD in Multilingual Evaluation

Xinran Liu

Xinran Liu

University of Surrey

PhD in Dance Motion Generation