Abstract
Evaluations of tools (information retrieval systems, machine learning, speech recognition, machine translation, automatic acquisition of data, etc.) are annually organized throughout evaluation campaigns (TREC, ELRA, ESTER IWSLT, etc.). The building of an ad hoc evaluation corpus in the context of these evaluation campaigns is a complex task and it is done manually today and with a high cost. Indeed, this is a very dedicated corpus that would answer to an application need in a precise context but automating its building is a challenge that will help significantly the organization of these campaigns. As a contribution to this challenge, we propose in a context of multimedia information retrieval, an approach of multilevel extension of a small applicative corpus to a larger and voluminous corpus based on the detection of intersections between the two corpus in terms of lemmas having the same grammatical label, that means to get a list of appropriate terminology for which we use several tools (internal and external to our laboratory) and we try to evaluate them in order to keep consistency and coherence with the original corpus..