Skip to content

Conversation

@federetyk
Copy link
Contributor

@federetyk federetyk commented Jan 6, 2026

closes #24

Description

Added the Job Title Similarity dataset (Avature/Job-Title-Similarity) as a new ranking task in WorkRB. This task evaluates a model's ability to rank job titles by semantic similarity to a query job title. The dataset includes 11 languages (en, de, es, fr, it, ja, ko, nl, pl, pt, zh) with ~105 queries and ~2,500 corpus job titles per language.

Changes:

  • Added new task group RankingTaskGroup.JOBSIM in src/workrb/types.py
  • Added support for Chinese (zh), Japanese (ja), and Korean (ko) languages in the Language enum in src/workrb/tasks/abstract/ranking_base.py
  • Created JobTitleSimilarityRanking task class in src/workrb/tasks/ranking/job_similarity.py
  • Exported the new task in src/workrb/tasks/__init__.py and src/workrb/tasks/ranking/__init__.py
  • Added tests for the new task in tests/test_task_loading.py

Task characteristics:

  • Task type: Ranking
  • Label type: Multi-label (each query has multiple relevant corpus documents)
  • Query/Target input type: Job titles
  • Default evaluation metrics: MAP, MRR, RP@5, RP@10 (as used by Zbib et al., 2022)

References:

  • Zbib et al. (2022): "Learning Job Titles Similarity from Noisy Skill Labels" (link)
  • Deniz et al. (2024): "Combined Unsupervised and Contrastive Learning for Multilingual Job Recommendations" (link)

Checklist

  • Added new tests for new functionality
  • Tested locally with example tasks
  • Code follows project style guidelines
  • Documentation updated
  • No new warnings introduced

@Mattdl Mattdl self-requested a review January 6, 2026 18:08
Copy link
Collaborator

@Mattdl Mattdl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @federetyk, super valuable contribution!

Code looks good to me. Would just add some clarifications in the JobSImilarity task as it is the first task of its kind (see review).

"""
Job Title Similarity ranking task based on Zbib et al. (2022) and Deniz et al. (2024).
Predict similar job titles from the datasets presented in the aforementioned papers.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great to give a bit more context here on the JobTitleSimilarityTask.

  1. Provide a link to the huggingface repo, and shortly how it is used (corpus and query sets) and languages it covers.
  2. How this task differs from JobNormalization (for each job title you have multiple similar job titles, and others are deemed non-similar), whereas jobNormalization maps to a single best-matching canonical job title.
  3. Give an example of a query and labels.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, to create visibilitiy, would add your dataset entry in the README.md table of datasets (along with nb targets x nb queries).

@federetyk
Copy link
Contributor Author

Code looks good to me. Would just add some clarifications in the JobSImilarity task as it is the first task of its kind (see review).

@Mattdl Thanks for the review! I have updated the docstring with the requested context and added the entry to the README table as suggested.

Copy link
Collaborator

@Mattdl Mattdl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, ready to merge.

@Mattdl Mattdl merged commit d0f9fdd into techwolf-ai:main Jan 9, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Add Job Title Similarity ranking task

2 participants