Hebrew is considered a low-resource language in AI. It has a sophisticated root and pattern system and is a morphologically rich language. Prefixes, suffixes, and infixes are added to words to change their meaning and tense or produce plurals, among other things. Words are constructed from roots. The occurrence of several legitimate word forms derived from a single root might result from this complexity, rendering conventional tokenization techniques—which were meant for morphologically simpler languages—ineffective. Because of this, current language models could find it difficult to interpret and process Hebrew’s subtleties correctly, which emphasizes the need for benchmarks that consider these particular linguistic characteristics.
LLM research in Hebrew is not just a niche area but a crucial field that requires specialized benchmarks to address the linguistic peculiarities and subtleties of the language. A new Hugging Face study is set to revolutionize this field with its ground-breaking initiative: the brand-new open LLM scoreboard. This scoreboard, designed to assess and improve Hebrew language models, is not just another tool but a significant step towards enhancing our understanding and processing of Hebrew’s complexities. By offering strong assessment metrics on language-specific activities and encouraging an open community-driven improvement of generative language models in Hebrew, this leaderboard is poised to close this gap.
The Hugging Face team uses the Demo Leaderboard template, and it draws inspiration from the Open LLM Leaderboard. Submittable models are automatically deployed via HuggingFace’s Inference Endpoints and assessed via literal library-managed API queries. The environment setup was the only complicated part of the implementation; the rest of the code worked as intended.
The Hugging Face team has created four essential datasets to evaluate language models on their comprehension and production of Hebrew, independent of their performance in other languages. These benchmarks assess the models using a few-shot prompt format, which makes sure the models can adjust and react appropriately even in situations with little context. They are listed in the following order:
Answering a Hebrew Question: This assignment assesses a model’s comprehension and ability to accurately retrieve responses based on context, particularly emphasizing understanding and processing information presented in Hebrew. The model’s understanding of Hebrew syntax and semantics is assessed using straightforward question-and-answer formats.
Sentiment Accuracy: This benchmark tests the model’s capacity to identify and decipher sentiments in Hebrew text. It evaluates the model’s accuracy in using language clues to identify positive, negative, or neutral statements.
The Winograd Schema Problem: The exercise’s purpose is to assess the model’s comprehension of Hebrew contextual ambiguity and pronoun resolution. It also assesses the model’s capacity to accurately distinguish pronouns in difficult sentences using common sense and logical reasoning.
Translation: The model’s ability to translate between Hebrew and English is evaluated in this test. It assesses the model’s proficiency in multilingual translation tasks by evaluating linguistic accuracy, fluency, and the capacity to maintain meaning across languages.
The team believes that this new leaderboard will serve as more than just a measuring tool, inspiring the Israeli tech community to identify and close the gaps in Hebrew language technology research. They hope to encourage the creation of models that are both linguistically and culturally varied by offering thorough, targeted evaluations. This will open the door for innovations that respect the diversity of the Hebrew language.
Dhanshree Shenwai is a Computer Science Engineer and has a good experience in FinTech companies covering Financial, Cards & Payments and Banking domain with keen interest in applications of AI. She is enthusiastic about exploring new technologies and advancements in today’s evolving world making everyone’s life easy.