25Jun


The Emergence Of Data Design to create highly granular, conversational & refined data for language model fine-tuning.

Recent research and development have highlighted the emergence of Data Design in model training and fine-tuning processes.

This phenomenon is that models are trained not to necessarily imbue the model with knowledge, hence augmenting the Knowledge Intensive nature of the model.

But rather change the behaviour of the model, teaching the model new behaviour.

Can Conversation Designers Excel As Data Designers?

There has been many discussions on the future of conversation designers…and an idea came to mind…many of these datasets require human involvement in terms of annotation and oversight.

And these datasets hold key elements of dialog, reasoning and chains of thought…

So, the question which has been lingering in the back of my mind for the last couple of days is, is this not such a well suited task for conversation designers?

Especially in getting the conversational and thought process topology of the data right?

Allow me to explain, I have been talking much about a data strategy needing to consist of the Eight D’s: data discovery, data design, data development and data delivery.

Data delivery has been discuss much considering RAG and other delivery strategies.

Data Discovery has also been addressed to some degree, for instance XO Platform’s Intent Discovery. However, there is still much to do in finding new development opportunities…

Coming to Data Design…in this article I discuss three recent studies which focusses on teaching language models (both large and small) certain behaviours. While not necessarily imbuing the model with specific world knowledge, but rather improving the behaviour and abilities of the model.

These abilities can include self correction, reasoning abilities, improving contextual understanding, both short and long, and more…



Source link

Protected by Security by CleanTalk