Back to main page Research Interests Teaching Publications Contact me Search the site


Papers | Communications


DIAS, Zanoni; ROCHA, Anderson; and GOLDENSTEIN, Siome.

  • Paper: Image Phylogeny by Minimal Spanning Trees

  • Experimental setup used in the paper.

  • Nowadays, digital content is widespread and also easily redistributable, either lawfully or unlawfully. Images and other digital content can also mutate as they spread out. For example, after images are posted on the internet, other users can copy, resize and/or re-encode them and then repost their versions, thereby generating similar but not identical copies. While it is straightforward to detect exact image duplicates, this is not the case for slightly modified versions. In the last decade, some researchers have successfully focused on the design and deployment of near-duplicate detection and recognition systems to identify the cohabiting versions of a given document in the wild. Those efforts notwithstanding, only recently have there been the first attempts to go beyond the detection of near-duplicates to find the structure of evolution within a set of images. In this paper, we tackle and formally define the problem of identifying these image relationships within a set of near-duplicate images, what we call Image Phylogeny Tree (IPT), due to its natural analogy with biological systems. The mechanism of building IPTs aims at finding the structure of transformations and their parameters if necessary, among a near-duplicate image set, and has immediate applications in security and law-enforcement, forensics, copyright enforcement, and news tracking services. We devise a method for calculating an asymmetric dissimilarity matrix from a set of nearduplicate images and formally introduce an efficient algorithm to build IPTs from such a matrix. We validate our approach with more than 625,000 test cases, including both synthetic and real data, and show that when using an appropriate dissimilarity function we can obtain good IPT reconstruction even when some pieces of information are missing. We also evaluate our solution when there are more than one near-duplicate set in the pool of analysis and compare to other recent related approaches in the literature.

  • Please, send us an e-mail for the download link.


SABOIA, Priscila; CARVALHO, Tiago; and ROCHA, Anderson.

  • Paper: Eye specular highlights telltales for digital forensics: a machine learning approach

  • Database with 120 real pictures used for the experiments in the paper above.
  • 60 pictures are genuine (without any tampering) and 60 pictures were obtained from compositions. Each image contains from 2 to 6 people, genuine or composed. All of them have a resolution of 2048 x 1536 pixels, and their focal distances are unknown. In some pictures, eyelids occlude the eyes, but the specular highlights are visible in all pictures (no occlusion).
  • The elliptical limbi and the specular highlight of each eye in the pictures are needed for the refereed forensics technique to work. These points were obtained from manual marks, previously done with the support of Inkscape tool. From these marks, an implemented auxiliary program was used to compute interesting points. This process was carried out to obtain more quickly the entry points for the technique. However, it is important to note that these points could be obtained directly by the means of segmentation techniques that are able to separate these points of the other regions, or even by an expert.
  • Therefore, the pictures in database have marks as follows: For each image, the 1-pixel thin border of an ellipse was manually fit in green to the limbus of each eye. Similarly, each specular highlight was localized by specifying the 1-pixel thin red border of a rectangular area that contained it. Finally, for each person, the 3-pixel thin blue border of a rectangular area was marked in order to contain his or her faces.

  • Please, send us an e-mail for the download link.

614 MB


PEIXOTO, Bruno; MICHELASSI, Carolina; and ROCHA, Anderson.

  • Paper: Face Liveness Detection under bad illumination conditions

  • Data set comprising 640 real faces and 1,920 LCD spoofs recaptured using the Yale Face Database B using three LCD monitors, an LG Flatron L196WTQ Wide 19'', a CTL 171Lx 17'' TFT and a DELL Inspiron 1545 notebook.
  • The cameras used were a Kodak C813 8.2 megapixels and a Samsung Omnia i900, with 5 megapixels. The images are cropped and face-centered.
  • The images are in grayscale, 64x64 pixels in resolution.

  • For the download link, please contact me by e-mail.

650 MB



ROCHA, Anderson; HAUAGGE, Daniel C.; WAINER, Jacques; GOLDENSTEIN, Siome.

  • 2,633 Fruits/Vegetables image data set collected on our local fruits and vegetables distribution center (CEASA). The data set comprises 15 different categories and is presented in the paper: Automatic fruit and vegetable classification from images.

  • We have used a Canon PowerShot P1 camera, at a resolution of 1,024x768 pixels.
  • In case of problems, please contact me by e-mail.

132 MB



ROCHA, Anderson; ALMEIDA, Jurandy; TORRES, Ricardo; GOLDENSTEIN, Siome.

  • Subset of 200,000 images of the database with one million images used in the qualitative experiments described in the paper Image Retrieval Using Semantic Information Regions.

  • We provide only the image web-locations. If you want the downloaded images, or the entire one million image database contact me by e-mail.

15 MB



ROCHA, Anderson; ALMEIDA, Jurandy; TORRES, Ricardo; GOLDENSTEIN, Siome.

  • Corel Photo Gallery and Darmstadt ETH data set selection used for the quantitative experiments in the paper: Image Retrieval Using Semantic Information Regions.

10 MB





ROCHA, Anderson; ALMEIDA, Jurandy; TORRES, Ricardo; NASCIMENTO, Mário; GOLDENSTEIN, Siome.

  • 3,462 FreeFoto images used in the quantitative experiments described in the paper Efficient and Flexible Cluster-and-Search for CBIR.

  • In case of problems, please contact me by e-mail.

293 MB





ROCHA, Anderson. Como tornar o Importa Fácil Ciência (um pouco) mais fácil.