FiNERweb: Datasets and Artifacts for Scalable Multilingual Named Entity Recognition
IntermediateJonas Golde, Patrick Haller et al.Dec 15arXiv
FINERWEB is a new, carefully built dataset pipeline that teaches computers to spot names of people, places, and more across 91 languages and 25 writing systems.
#multilingual NER#named entity recognition#LLM supervision