Commentaires Résumé
2008/4 Information documentaire: les outils du futur

Ergonomic Minding of Media Collections

Commentaires Résumé

Personal or professional collections of media, such as photos, music, movies and homevideos tend to increase quickly in size, mainly due to the ease of collecting lots of digital content using a variety of capture devices (digital cameras, mobile phones, camcorders) or over the Internet. However, the level of management (ie organization and search) of those multi­media collections remains largely behind the ease of content creation.

Ergonomic Minding of Media Collections

In spite of advances in content­based retrieval and automatic multimedia in­dexing, multimedia content manage­ment is still difficult. For example, when it comes to personal photography, the frequent procedure is to place items into folders, often organized by date or periods, possibly events, and loosely an­ notated. As a result, images and events are difficult to find and search becomes a frustrating or even painful operation because tools for browsing personal im­ age collections are not appropriate to the user’s needs. In some cases, it is possible to organize a collection by accurate date/time or location/place (thanks to the EXIF and GPS data re­corded by modern capture devices). However, this is still limited and it remains desirable to search using the multimedia content itself.

The current challenge of multimedia information systems is thus to design and provide professional or non­-profes­sional users with new interactive tools that:

– improve the browsing experience to access both personal or professional collections;

– make the search easier and more natural than the folder­style layout;

– allow finding images and videos eas­ily, rapidly, and accurately.

Media management can be significant­ly improved using the current results of low­level content abstraction tech­niques (e.g., color, contours, texture) and high­level content abstraction tech­niques (e.g., image classification, face recognition) by combining those re­sults with information obtained by minding a particular collection and de­veloping novel browsing interfaces guided by the user’s personal prefer­ ences.

The Ergonomic Minding of Media Collections is thus facing the following challenges: – extract relevant information from the media content for efficient indexing, search and retrieval. More precisely, the following technologies should be investigated: (i) robust extraction of invariant visual descriptors for image classification and object recognition, (ii) accurate and reliable uncon­strained face detection and recogni­tion as the human face is recognized to be an important semantic cue in visual content.

– develop novel multimedia minding strategies and techniques to optimal­ly prepare and enrich the collection content for constructing new user in­teraction models.

– propose and validate in front of users new interaction models for image/ video search and browsing based on clustering or topology-­preserving di­mensionality reduction and projec­tion techniques. The main goal is to offer the user a global and precise ac­cess to the multimedia collection at minimal cost.

In an example scenario, the user navi­gates within a flat 2­dimensional ar­rangement of his/her media collection, organized by clusters. Each cluster cor­responds to a given search criterion (query) and is represented by a statisti­cal visual summary of the query result. One main challenge is to make avail­able data interoperable at all levels. Thus, the construction of clusters would use and combine EXIF informa­tion (e.g., date/time, location), meta­data (e.g., tags, events) and/or content (e.g., dominant color, people occur­rence). In a complementary faceted­-like approach, if the current facet combina­tion isolaes a subset of the collection, then it is possible to further refine the search by seamlessly navigating within the cluster­based representation of this subset.

Examples of Existing Systems

As examples of relevant directions in the development of the Ergonomic Minding of Media Collections methodology, we detail two applications illustrating dif­ferent perspectives.

Collection Guiding

The Collection Guide proposes an alter­native approach to many current infor­mation management systems, which are centered on the notion of a query. This is true over the Web (with all classical Web Search Engines), and for digital libraries. In the domain of mul­timedia, available commercial appli­cations propose rather simple man­agement services whereas research prototypes are also looking at respond­ing to queries. The notion of browsing comes as a complement or as an alter­native to query-­based operations in sev­eral possible contexts.

In the most general case, multime­dia browsing is designed to supplement search operations. This comes from the fact that the multimedia querying sys­tems largely demonstrate their capa­bilities using the Query­-by-­Example (QBE) scenario, which hardly corre­sponds to any usable scenario. Multi­media search systems are mostly based on content similarity. Hence, to fulfill an information need, the user must express it with respect to relevant and non­-relevant examples. The question then arises of how to find the initial ex­amples themselves. Researchers have therefore investigated new tools and protocols for the discovery of relevant examples. These tools often take the form of browsing interfaces whose aim is to help the user exploring the infor­mation space in order to locate the sought items.

In Marchand­-Maillet, 2005, the princi­ple of Collection Guiding is introduced. Given the collection of images, a path traversing the complete multimedia collection is automatically created so as to “guide” the visit of the collection. For that purpose, image intersimilarity is computed and the path is created via a Travelling Salesman tour of the collec­tion. The aim is to provide the user with a base exploration strategy based on a minimal variation of content at every step. This implicitly provides a dimen­sion reduction method from a high­ dimensional feature space to a linear ordering. The Collection Guide meth­odology provides also several multidimensional arrangements and is therefore directly related to informa­tion visualization.

The figure on page 23 shows an ex­ample opposing the classical unordered grid­based display and a 3D display strategy exploiting content similarity and clustering. Similar images are au­tomatically arranged around their rep­resentative image in a planet metaphor. The user may thus obtain a global over­ view of the collection (interplanet ar­rangement) or visit a specific subset of the collection (i.e. visit a specific “solar system”).

It is now clear that browsing comes as a necessity to closely adapt informa­tion inspection and retrieval to the spe­cific user’s needs. There is no doubt that future information systems will com­prise this emergent aspect, as a comple­ment to currently dominating search operations.

Google Portrait

In 2007, S. Marcel and al. proposed Google Portrait ( to retrieve and browse images from the Internet containing only one particular object of interest: the human face. The goal is to filter the images provided by a standard image retrieval system with a face detector and to present portraits as a result instead of the complete image.

Image search starts with a text que­ry. The Google Image engine is used to retrieve images matching the query. Each image URL is extracted from the Google Image result page, then images are processed in parallel. This process­ ing includes download and face find­ing. Images with detected faces are presented on a new result page listing face portraits together with a confidence and direct links to the image URL and to the source page. The result page is a table with 5 columns and with as many rows than images with detected faces. The first column contains image close­ ups (“portraits”), the second column contains a confidence on the likelihood of the portrait to be a face, the third col­umn contains the size of the original image and the last columns contain links on the the original image and on its website.

Google Portrait includes a module for manual annotation. Indeed, por­traits (detected faces) are very likely to correspond to the query, but there is no guarantee as Google Portrait uses a face detection system, not a face recognition system. It is then possible to edit a re­sult and to change the tag of the portrait (name of the person). Tags are saved in a database which can then be populated based on collaborative working.

Google Portrait has been released on November 27, 2006. Later in spring 2007, Google Image was providing an “unofficial” face finding search mode. Google Image face finding is now di­rectly available in Advanced image search at least since June 24, 2007. Nearly at the same time, probably dur­ing summer 2007, Microsoft added also face detection to Microsoft’s Live Search. The comparison with Google Portrait in terms of performance is im­ practical since both Google and Micro­soft (those companies have incompara­ble computing facilities) online face-detectors give results on already batch­ processed images. Conversely, Google Portrait is performing live face detec­tion on images downloaded on the fly from the Internet.


The Ergonomic Minding of Media Collections implicitly acknowledges the limits of current information access systems and paves the way to new solutions and challenges in the Multimedia Informa­tion Retrieval and Management com­munity. The emphasis is placed on shifting the focus from the traditional content­processing and indexing view­ point to a knowledge and data minding approach complemented with a strong involvement of users in the construc­tion of interactive systems.

We believe that such a joint data­ processing and user­centric approach will demonstrate that the strong impli­cation of users, as a source of semantic information via dedicated interfaces adapted to efficiently capture useful in­ formation, is a robust and scalable solu­tion to the problem of high­level management of multimedia information.

Various European and Swiss re­search projects in the area of Multime­dia Information Retrieval and Manage­ment are currently active. Additionally, a project initiative on the Ergonomic Minding of Media Collections is current­ly under setup to target the develop­ment of these new generations of inter­active multimedia management systems and to encourage their transfer to usable commercial solutions.


– S.Marchand­MailletandE.Bruno,“Collec­tion Guiding: A new framework for handling large multimedia collection”, International Workshop on Audio-Visual Con- tent and Information Visualization in Digital Libraries (AVIVDiLib’05), 2005 (

– S. Marcel, P. Abbet and M. Guillemot, “Google Portrait” (, Idiap Communication, Idiap­Com­07­2007, 2007.


Sébastien Marcel

Idiap Research Institute, Centre du Parc, Rue Marconi 19, CP 592, 1920 Martigny, Switzerland, phone: +41 (27) 721 77 27


Stéphane Marchand-Maillet

Viper group – CUI – University of Geneva, Battelle Building A, 7, Route de Drize, 1227 Carouge, Switzerland, phone: +41 (22) 379 01 54


Il est aujourd’hui très simple de se constituer des collections de médias (photos, musique, films, etc.). La gestion et le maniement de gran­des quantités de données, qui deviennent rapidement trop volumineuses, sont néanmoins toujours aussi difficiles. La gestion des médias peut être considérablement améliorée si l’on combine des techniques low-level pour abstraire des contenus (p.ex.: couleurs, contours, surfaces) avec des techniques high-level (p. ex.: classification des images, sélection des visages).

Tandis que le «browsing» classique dépend d’indications de recherche précises, le Collection Guiding renverse le processus de recherche: les collections sont en effet computérisées, les vues multidimensionnelles sont réduites à une présentation linéaire le long de laquelle le «browser» peut se mouvoir; il apprend ainsi à reconnaître le portait simplifié de la collection complète et peut décider dans quelle direction, respectivement dans quel espace il veut poursuivre sa recherche. De tels systèmes de browsing, qui peuvent être adaptés aux besoins de tout un chacun, remplaceront de plus en plus à l’avenir les systèmes de recherche rigides.

Google Portrait procède d’une autre manière: le système délaisse toutes les autres composantes au profit d’un seul et unique critère de re­cherche: le visage humain. Les images sont filtrées au moyen d’un programme de détection de visages (à ne pas confondre avec un pro­gramme de reconnaissance des visages; en anglais: to detect, resp. to recognize) où uniquement le portrait et non pas l’image dans son ensemble est intégré dans les résultats de la recherche.

Microsoft a suivi entre­temps avec une détection des visages en mode «Live Search». Globalement, on constate dans la gestion des objets multimédias un glissement du traitement classique des contenus avec indexation vers une approche qui privilégie le knowledge management et le data minding et qui implique davantage l’utilisateur dans la construction de systèmes interactifs. En Europe et en Suisse, de nombreux projets sont actuellement en cours dans le domaine de la recherche sur les contenus multimédias et leur organisation.

Es ist heutzutage einfach, digitale Mediasammlungen (Fotos, Musik, Filme etc.) anzulegen. Die Verwaltung und der Umgang mit den rasch wachsenden und damit unübersehbaren Datenmengen gestalten sich aber immer schwieriger. Das Mediamanagement kann erheblich verbessert werden, wenn einfache Techniken der Bildbeschreibung, z.B. Farben, Konturen, Oberfläche, mit entsprechenden anspruchsvollen Techniken, z.B. Bildklassifizierung, Gesichtsdetektion, gemeinsam eingesetzt werden.

Während herkömmliches «Browsen» von präzisen Sucheingaben abhängig ist, kehrt das «Collection Guiding» den Suchvorgang um: Samm­lungen werden informatisiert, mehrdimensionale Ansichten auf eine lineare Darstellung heruntergebrochen, deren entlang sich der «Brow­sende» bewegen kann. Der Nutzer lernt so das vereinfachte Abbild der gesamten Sammlung kennen und kann sich entscheiden, in welche Richtung resp. in welchem Raum er weitersuchen möchte. Derartige Browsersysteme, die sich den Wünschen des Einzelnen anpassen kön­nen, werden starre Suchsysteme in naher Zukunft vermutlich in den Hintergrund drängen.

Einen anderen Weg geht Google Portrait: Das System vernachlässigt alle übrigen Suchelemente bis auf ein einziges Suchkriterium: das menschliche Gesicht. Bilder werden mittels eines Gesichtsdetektionsprogramms (nicht zu verwechseln mit Gesichtserkennungsprogramm; engl. to detect resp. to recognize) gefiltert, anstelle des gesamten Bildes wird ausschliesslich das Portrait in das Suchergebnis einbezogen. Microsoft hat in der Zwischenzeit mit einer Gesichtsdetektion in «Live Search» nachgezogen.

Insgesamt zeigt sich beim Multimedia-­Management eine Verschiebung des Ansatzes von der traditionellen Content­-Verarbeitung mit Inde­xierung hin zu einem neuen Ansatz, der das Knowledge-­Management und das Data Minding in den Vordergrund stellt und damit den Nutzer stärker als bislang in den Aufbau von interaktiven Systemen einbezieht. Zur Zeit sind in der Schweiz und in Europa zahlreiche Projekte zum Thema Suche nach und Organisation von multimedialen Inhalten am Laufen.