Gavagai participates as a stakeholder in a VINNOVA-funded project on creating data resources for language technology research.
Gavagai will during the course of the project develop an open-source software kit that enables language models to be trained on data that is not made public. This is useful for when a data resource cannot be shared freely, but its owner is willing to let researchers build models on top of it. This could be e.g. media companies with previously published material or libraries and archives with collections they want to make available but not shareable.
Read more about the project here.