# Last edited on 2001-09-20 17:54:24 by stolfi DATASET DESCRIPTION Fragment preparation This directory contains the input data for the test "ceramic-3". The test objects were five unglazed ceramic tiles, each about 5cm × 20cm × 7mm. On the flat side, each tile had a bevel all around the edge, about 2mm wide and 1 mm deep. The tiles had apparently been baked to the "biscuit" stage (That's the stage just below vitrification, where the piece will no longer soften in water, but can still absorb it.) The tiles were placed face down on a paper sheet, resting upon a cement surface, and were hit repeatedly with a steel bar. The material was harder than common brick, but softer than typical ceramic tiles; so fracturing generated many small crumbs and dust, and edges were rather delicate, so fragments had to be handled with care in order not to damage de fracture edges. Fragment scanning We recoverd 112 pieces large enough to handle, ranging from ~5cm to ~0.5 cm in diameter. These were numbered with pencil and scanned directly on a ordinary flatbed document scanner (UMAX model UC630 Maxcolor, driven by Adobe Photoshop running on a Macintosh). The pieces were scanned in four batches of about 30 pieces each, in no particular order; see the files "batches/fragments-{a,b,c,d}.pgm". Individual pieces were later isolated from these images with the PZPsplit program. The fragments were placed, flat side down,, in random position and orientation. To maximize contrast, the flat face of each piece was lightly rubbed with chalk, and a black velvet cloth was used in lieu of the scanner's cover. The scanner was set to grayscale mode, 300dpi (its nominal maximum). The images were saved in TIFF format and later converted to 8-bit PGM files. Reassembled objects After scanning, the pieces were manually reassembled into the original tiles. Each tile was wrapped in transparent plastic to reduce the risk of further abrasion, and scanned again; see file "tiles/assembled.pgm". Unfortunately, it did not occur to us to scan the original tiles before breaking them up. File history These data are the same fragments and images as "ceramic-2", but processed with a corrected and improved filtering program (PZFilter of 20/aug/1999). These files were originally stored in euler:/n/lac3/hcgl/tests/ceramic-3. INPUT DATA FILES Directory "batches" The "batches" directory contains unprocessed images of multiple fragments, as obtained from the scanner: Bytes File ------- --------------- 6035867 fragments-a.pgm 5357307 fragments-b.pgm 5113563 fragments-c.pgm 5958513 fragments-d.pgm ------- --------------- Directory "fragments" The "fragments" directory contains images and extracted outlines of individual pieces. Each numbered sub-directory "fragments/0000" through "fragments/0111" contains data for one piece of the puzzle. The directory number should match the number penciled on the piece itself. The files in each sub-directory are: File Produced by Contents --------- ---------------- -------------------------------------------- image.pgm PZSplit Grayscale image of the piece. r000.flc PZBoundary The piece's raw outline, from image.pgm. f000.flc PZFilter Same as r000.flc, but centered at (0,0). f000.lbl PZFilter Sample labels for f000.flc. f000.ps PZDraw Plot of f000.flc. --------- ---------------- -------------------------------------------- Directory "multiscale" The "multiscale" directory contains multiple versions of the outline curves in the "fragments" directory, smoothed with PZFilter at multiple resolution scales NNN (001, 002, 004, ... 128, 256). Note that the coarsest versions are are not available for the smallest pieces. File Produced by Contents --------- ---------------- -------------------------------------------- fNNN.flc PZFilter The piece's outline, filtered to scale NNN. fNNN.lbl PZFilter The numeric label of each sample. fNNN.flv PZComputeVelAcc The velocity vector at each sample point. fNNN.fla PZComputeVelAcc The acceleration vector at each sample point. fNNN.fcv PZComputeCurvature The outline's curvature at each sample point. fNNN.cvc PZEncodeCurvature Cuvature values encoded as letters 'z-a0A-Z' fNNN.ps PZDraw Postscript plot of fNNN.flc. --------- ---------------- -------------------------------------------- Corresponding points in two different versions of the same fragment outline can be identified by having the same ".lbl" value. Directory "nonfractal" This directory contains one file "fNNN-str.seg" for each scale NNN, that specifies which segments of each fragment are too smooth to be considered by the shape matching programs. (These are likely to be outer edges anyway.) Directory "pairs" The "pairs" directory contains reference solutions, compiled by hand. These files are used to evaluated the precision and recall of our algorithms. Each of the main files contains a list of `candidates' --- pairs of matching fragment outline segments --- in the ".can" format (see below). There were 209 `true' candidates, i.e. pairs of outline segments which were indeed adjacent in the original tiles. Among these, we identified a subset of 195 `recognizable' candidates: those which, in our judgement, could possibly be recognized as such by a person, looking at the two outline segments. The remaining 14 candidates were considered too dissimilar in shape (due to material loss) to be recognized as such. File Contents -------------------------- --------------------------------------- f000-t.can The 195 "recognizable" true candidates. f000-t-can.dgr LaTeX adjacency graph of the same. f000-u.can The 14 "unrecognizable" true candidates. f000-u-can.dgr LaTeX adjacency graph of the same. adj-graph.tex LaTeX file to display the graphs. adj-graph.make Makefile for the above. f004-tr-0000??-dr-f.eps Postscript plots of some true candidates. -------------------------- --------------------------------------- Directory "tiles" The "tiles" images contains images of the manually reassembled tiles: Bytes File Contents ------- ---------------------- --------------------------------------- 7821075 assembled.pgm The five tiles, manually reassembled. 2169088 assembled.jpg Lossy version of assembled.pgm 191631 assembled-small.pgm A reduced version of assembled.pgm. 389473 assembled-small.eps Postscript version of the same. 211468 assembled-detail.pgm Full-res detail of assembled.pgm. 430004 assembled-detail.eps Postscript version of the same. 1717 assembled.ctrs Center of each piece in "assembled.pgm" ------- ---------------------- ----------------------------------------- FILE FORMATS General information Data files produced by our programs start with a a line of the form "begin PZXxxx.T (format of YY-MM-DD)" where "Xxxx" identifies the file type, and "YY-MM-DD" specifies a particular version of the format. Following the "begin" line are some comments, identified by "|" on column 1, usually written by the program(s) that produced the file. After the comments there are some file-specific parameters; two common ones are "samples = NNN" (the number of sample points in the file), and "unit = N.NNNN" (a scaling factor for the sample data). After the parameters comes the data samples, one sample per line. The number of coordinates in each sample depends on the file type. To save I/O time, each sample coordinate is usually written as an integer, that is implicitly scaled by the "unit" parameter. So, for example, if "unit = 0.001", then the coordinate "9542" actually means "9.542". The data file is closed by a line of the form "end PZXxxx.T". File-specific information The names [PZXxxx] in brackets are the program modules where the contents and/or the file format is defined. .flc Coordinates (X,Y,Z) of sample points along the boundary of a fragment. The Z coordinate is always 0 in these samples. The unit of measure is the scanner's pixel (1/300 of an inch) times the "unit" parameter. Thus "unit = 0.001" means the unit is actually 0.000084 mm. [PZLR3Chain,PZFilter] .lbl A numeric `label' attached to each sample point. The PZFilter program tries to preserve labels when filtering and resampling the outline, so that a point labeled, say, 0.65 in the filtered curve corresponds to some point between samples labeled 0.64 and 0.67 in the original curve. The labels are stored in the file as integer multiples of the "unit" parameter. [PZLRChain,PZFilter] .flv Tangent direction (velocity vector) of the curve at each of the sample points in the corresponding ".flc" file. The unit of measure is the "unit" parameter times a scaner pixel. The curve is approximately parametrized by arc length, so the length of the vector should be close to 1. The Z component of the velocity is always 0. [PZLR3Chain,PZComputeVelAcc] .fla Acceleration vector of the curve at each of the sample points in the corresponding ".flc" file. The unit of measure is the "unit" parameter times a scaner pixel. The vector should be approximately normal to the curve. The Z component of the velocity is always 0. [PZLR3Chain,PZComputeVelAcc] .fcv Estimated numerical curvature of the outline at each sample point. The unit is pixel^{-1}, times the "unit" parameter. [PZLRChain,PZComputeCurvature] .cvc Estimated curvature of the outline at each sample point, scaled and compressed to the range [-26..+26] and encoded as a letter: z-a for [-26..-1], 0 for 0, A-Z for [+1..+26]. The "sigma" parameter defines the compression function [PZSymbolChain,PZEncodeCurvature] .seg List of fragment outline segments. The "segments" parameter is the number of entries. Each segment is described by one line with five fields: the fragment number, the number of samples in the whole outline, the index of the first sample belonging to the segment, the number of samples in the segment, and the reading direction (`+' for increasing sample indices, `-' for the opposite direction) .can List of `candidates'. A `candidate' is a pair of outline segments that are claimed to have been adjacent in the original object. The parameter "candidates" defines the number of candidates in the file. Each candidate is specified by a line with either 13 or 19 fields. The first 10 fields specify the two outline segments, 5 fields each, as in the ".seg" file. After that there are 3 fields for the candidate: a `mismatch' measure (often 0, meaning `not available'), the number of steps averaged between the two segments, and the number of samples that were considered `matched' (often 0, meaning `not available'). After those 13 fields, there may an optional description of a proposed pairing of the samples between the two segments. The pairing is a sequence of pairs of sample indices, one in each segment, that increase at most by one at each step --- on only one segment, or on both at the same time. The pairing, if present, is defined by 6 additional fields --- the last 5 enclosed in parentheses --- giving: the number of pairs in the pairing; the indices of the first and last samples of the first segment; ditto for the second segment; and a string of caracters that provides a graphical description of the pairing, using "/" for a step only on segment 1, "\" for a step only on segment 2, and "|" for a step on both segments simultaneously. [PZCandidate,PZMatch,PZMapCands,PZRefineCands] .ctrs This file has no headers. The first two lines specify the dimensions "width = NNN" and "height = NNN" of the image showing the manually reassembled fragments. After that comes one line for each fragment, giving the fragment number (0000 to 0112) and the H and V coordinates of its approximate center in that image. .ps Postscript (printer-ready, not encapsulated) plot of the outline. Grid lines, when shown, are 50(??) scanner pixels (4.23 mm) apart. [PZDraw,PZDrawCand,PSPlot]