Query log analysis in the context of Information Retrieval for children

Posted on May 24, 2013 by Sergio Duarte

by Sergio Duarte Torres, Djoerd Hiemstra and Pavel Serdyukov.

In this paper we analyze queries and sessions intended to satisfy children’s information needs using a large-scale query log. The aim of this analysis is twofold: i) To identify differences between such queries and sessions, and general queries and sessions; ii) To enhance the query log by including annotations of queries, sessions, and actions for future research on information retrieval for children. We found statistically significant differences between the set of general purpose and queries seeking for content intended for children. We show that our findings are consistent with previous studies on the physical behavior of children using Web search engines. Read the poster paper.

An analysis of queries intended to search information for children

Posted on May 24, 2013 by Sergio Duarte

by Sergio Duarte Torres, Djoerd Hiemstra and Pavel Serdyukov.

The majority of children and teenagers are active users of the Internet for education and entertainment purposes, thus developing children’s abilities to find and understand information is a key in their development as young adults. However, children’sability to use the Internet is severely hampered by the lack of appropriate search tools. Most Information Retrieval (IR) systems are designed for adults: they return information in a form that is unsuitable for children.The aim of this presentation is twofold: (i) introduce the research lines and outcome of the PuppyIR project which is aimed at providing a infrastructure and framework for developing child-focused information service; and (ii) explore the outcome of our research on understanding the search behavior of children in the Internet and provide a brief description of query recommendation mechanisms tailored at young users. Read the paper.

Wikipedia entity retrieval for Dutch and Spanish

Posted on May 24, 2013 by Sergio Duarte

by Gosse Bouma and Sergio Duarte Torres.

We developed two systems (for Dutch and Spanish) for the GikiCLEF task, in which Wikipedia pages have to be found that match a description in natural language. We concentrated on linguistic analysis of the query, for mapping the question onto the most relevant Wikipedia categories, and for extracting additional constraints that matching pages have to satisfy. In addition, for Spanish we experimented with query expansion for improved recall of the IR process. In both the Dutch and Spanish system we tried to incorporate additional knowledge sources (WordNet, Yago, DbPedia) for better question analysis and retrieval results. The Dutch system obtained a GikiCLEF score of 2.5 (7th overall and 7th for Dutch). The Spanish system was still under development at the time of the official evaluation, and performed poorly. We show that the completed system would have performed well at the 2009 task. Read the paper.

Information Retrieval for Children: Search behavior and Solutions

Posted on May 15, 2013 by Sergio Duarte

Seminar given at GREYC (University of Caen Lower Normandy).

The majority of children and teenagers are active users of the Internet for education and entertainment purposes, thus developing children’s abilities to find and understand information is a key in their development as young adults.However, children’sability to use the Internet is severely hampered by the lack of appropriate search tools. Most Information Retrieval (IR) systems are designed for adults: they return information in a form that is unsuitable for children.The aim of this presentation is twofold: (i) introduce the research lines and outcome of the PuppyIR project which is aimed at providing a infrastructure and framework for developing child- focused information service; and (ii) explore the outcome of our research on understanding the search behavior of children in the Internet and provide a brief description of query recommendation mechanisms tailored at young users. Get the slides.

Visual Exploration of Health Information for Children

Posted on April 28, 2013 by Sergio Duarte

by Frans van der Sluis, Sergio Duarte Torres, Djoerd Hiemstra, Betsy van Dijk, Frea Kruisinga

Children experience several difficulties retrieving informa- tion using current Information Retrieval (IR) systems. Particularly, chil- dren struggle to find the right keywords to construct queries given their lack of domain knowledge. This problem is even more critical in the case of the specialized health domain. In this work we present a novel method to address this problem using a cross-media search interface in which the textual data is searched through visual images. This solution aims to solve the recall and recognition problem which is salient for health information, by replacing the need for a vocabulary with the easy task of recognising the different body parts. Read the paper.

Workshop at COMMIT: Planning your international career

Posted on April 28, 2013 by Sergio Duarte

Workshop organized by Peter Apers and Iddo Bante. During this workshop we will share experience and instruments to facilitate the next step in your career at an international level. Talks will be given on Horizon 2020, ERC Grants, ICT Labs, working for international R&D labs. We will also address your requests for instruments for your next step.
Presentations by: Iddo Bante, managing director at the CTIT at University Twente and Sergio Duarte Torres

A Novel Image Encryption Scheme Based on a Generalized Chinese Remainder Theorem

Posted on April 28, 2013 by Sergio Duarte

by Sergio Duarte Torres, David Becerra Romero, Luis Niño and Yoan Pinzon.

In this paper, a novel method for image encryption based on a Generalized Chinese Remainder Theorem (GCRT) is presented. The proposed method is based on the work developed by Jagannathan et al. Some modifications are proposed in order to increase the method’s encryption quality and its robustness against attacks. Specifically, the inclusion of a vector to reduce the segment pixel space and a Generalized Chinese Remainder Theorem (GCRT) algorithm are proposed. These vectors are generated randomly which allows its use as private keys joining these unrestricted key values generated by the GCRT algorithm. An analysis to study a system where the RGB channels are independently encrypted is performed. Some experiments were carried out to validate the proposed model obtaining very promising results. Read the paper.

A Model for Resource Assignment to Transit Routes in Bogota Transportation System Transmilenio

Posted on April 28, 2013 by Sergio Duarte

by Sergio Duarte Torres, David Becerra Romero and Luis Niño.

In this work, a model based on genetic algorithms, queue theory and graph theory for route planning in a mass transportation system is presented. Most important features of the proposed approach are i) the modeling of the Americas line in the mass transportation system Transmilenio in Bogota; ii) Data preprocessing using graph theory to characterize the shortest routes between all the possible combinations of destination and source stations; iii) the optimization of travel time by route assignment using genetic algorithms iv) the simulation of events using the Poisson and Erlang distributions, corresponding to bus arrival at specific stations and to users waiting time. An experimental methodology was developed to validate the proposed approach. Read the paper (In Spanish).

A novel ab-initio genetic-based approach for protein folding prediction

Posted on April 28, 2013 by Sergio Duarte

by Sergio Duarte Torres, David Becerra, Luis Niño and Yoan Pinzon.

In this paper, a model based on genetic algorithms for protein folding prediction is proposed. The most important features of the proposed approach are: i) Heuristic secondary structure information is used in the initialization of the genetic algorithm; ii) An enhanced 3D spatial representation called cube-octahedron is used, also, an expansion technique is proposed in order to reduce the computational complexity and spatial constraints; iii) Data preprocessing of geometric features to characterize the cube-octahedron using twelve basic vectors to define the nodes. Additionally, biological information (torsion angles, bond angles and secondary structure conformations) was pre-processed through an analysis of all possible combinations of the basic vectors which satisfy the biological constrains defined by the spatial representation; and iv) Hashing techniques were used to improve the computational efficiency. The pre-processed information was stored in hash tables, which are intensively used by the genetic algorithm. Some experiments were carried out to validate the proposed model obtaining very promising results. Read the paper.

Sergio Duarte Torres

Home Page / Portafolio

Author Archives: Sergio Duarte

Query log analysis in the context of Information Retrieval for children

An analysis of queries intended to search information for children

Wikipedia entity retrieval for Dutch and Spanish

Information Retrieval for Children: Search behavior and Solutions

Visual Exploration of Health Information for Children

Workshop at COMMIT: Planning your international career

A Novel Image Encryption Scheme Based on a Generalized Chinese Remainder Theorem

A Model for Resource Assignment to Transit Routes in Bogota Transportation System Transmilenio

A novel ab-initio genetic-based approach for protein folding prediction