# Last edited on 2008-03-10 15:33:19 by stolfi IMAGE LIBRARY This directory subtree contains some "important" images --- mostly photos, logos, maps, cartoons, scanned book covers --- that are used in "important" webpages and such. For each image we keep the original source image (with as little manipulation is possible), versions of the same in several sizes and formats, processing parameters, documentation, etc.. The directory and file structure is maintained by hand with the help of several shell scripts, as shown below. This schema has become too complicated, and should be replaced by Makefiles --- one in each image set directory, that import general Makefiles. We should convert everyting to ".png" instead of ".gif". DIRECTORY STRUCTURE A typical image is represented by a "standard image set", consisting of the following files: orig.* Original image before cropping/scaling. p-raw.ppm.Z Image cropped to final aspect, before color correction. p-[01]-raw.ppm.Z Ditto, for transparent-background images. p-raw.jpg Alternative to p-raw.ppm.Z (100% quality). p-[01]-raw.jpg Alternatives to p-[01]-raw.ppm.Z (100% quality). p-cal.ppm.Z Calibration color scale for "p-raw.ppm.Z". p.ppm.Z Base image, with correct aspect and linear color scale. p-[01].ppm.Z Ditto, for transparent-background images. p.parms Color correction parameters for p-raw p.sizes List of GIF image sizes to generate (NNNxNNN). p-NNNxNNN.gif Image p.ppm scaled to size NNN by NNN and converted to GIF. p.gif A link to the p-*.gif at the default viewing size. p-icon.gif A link to the p-*.gif at the default icon size. p.comments Description of subject, date, photographer/author. p.html-inc HTML snippet for index page. Items marked "?" are optional, "!" are mandatory, "^" are alternatives to the preceding line. Except for "orig.*", all image files in the same set have exactly the same aspect ratio, usually a simple fraction like 2:3, 7:5, etc. Either "p-raw.ppm.Z" alone is present, or both "p-0-raw.ppm.Z and "p-1.raw.ppm.Z" are be present, or none of them is present. For each "XXX" in "p", "p-0", or "p-1": If "XXX-raw.ppm.Z" is present, it is assumed to be the official image, and "p.parms" must be present too. The file "XXX.ppm.Z" will be the result of running "XXX-raw.ppm.Z" through "ColorCorrect" with parameters "p.parms". If "XXX-raw.ppm.Z" is missing but "XXX.ppm.Z" is present, the latter is assumed to be already color-corrected (linear scale). The file "XXX-raw.jpg" (with 100% quality factor) is an alternative to "XXX-raw.ppm.Z". The file "p-cal.ppm.Z", if present, contains an image of some reference object (usually a gray scale) in the same calibration as "p-raw.ppm.Z". The file "p.parms" is created manually, either by trial and error or by analysis of "p-raw.ppm.Z" and "p-cal.ppm.Z". In any case, either "p.ppm.Z" alone must be present, or both "p-0.ppm.Z" and "p-1.ppm.Z" must be present. The latter are assumed to be the black- and white-background version of a GIF image (usually a synthetic one) with partially transparent background. All other versions of the image are derived automatically from "p.ppm.Z" (or "p-0.ppm.Z" and "p-1.ppm.Z"), maintaining its aspect ratio. The file "orig.*" is the original image --- the raw scanner output, or an image provided by someone else, in any format. It gets turned into "p-raw.ppm.Z" by semi-manual cropping and/or scaling (usually by the script "convert-and-crooriginal-images"). SPECIAL IMAGE SETS The image sets in "photos/people" are mug shots for homepages and departmental photo galleries. They should all have aspect ratio 5:7, and the GIF series must include the official sizes "photos/people/PORTRAIT.sizes" The image sets "figures/covers" are scans of book covers, also for departmental pages. They too should have aspect ratio 5:7, and the GIF series must include the official sizes "photos/people/STANDARD.sizes" The following directories do NOT contain standard image sets: movies Animated images. textures A library of texture files. misc Miscellaneous images not in standard sets. icons Components for HTML pages (bullets, buttons, etc.) tools Shell scripts and related files. JUNK Apparently useless images, to be discarded. temp Download area for scanned images. CHECKING AND (RE)GENERATING STANDARD IMAGE SETS Compressing all ".ppm" files: set topdirs = ( ./ photos/events-ic/./ ) find ${topdirs} -name '*.ppm' -print \ | sed -e 's:^[.][/]::' -e 's:[/][.][/]:/:' \ | egrep -v '^(JUNK|special|misc|textures|temp|tools|movies|synthetic|icons)/' \ > .ppm-images wc .ppm-images if ( ! ( -z .ppm-images ) ) compress -f `cat .ppm-images` Finding all image files in directories which should be standard image sets: find-image-files ${topdirs} \ | sed -e 's:^[.][/]::' -e 's:[/][.][/]:/:' \ | egrep -v '^(JUNK|misc|textures|tests|temp|tools|movies|synthetic|icons)/' \ | sort | uniq \ > .images wc .images Finding all diretories which should be standard image sets: cat .images \ | grep '/' \ | sed -e 's:/[^/]*$::' \ | sort | uniq \ > .image-sets wc .image-sets Find directories which are not in the above list: find ${topdirs} -type d \ | sed -e 's:^[.][/]::' -e 's:[/][.][/]:/:' \ | egrep -v '^(JUNK|misc|textures|tests|temp|tools|synthetic|movies|icons)/' \ | sort | uniq \ > .dirs bool 1-2 .dirs .image-sets List files of each category: cat .images \ | egrep -i '/orig[.](gif|tif|tiff|jpg|jpeg|png|ppm|pgm|pbm|bmp|tga)$' \ > .files-original cat .images \ | egrep '/p(|-[01])-raw[.](ppm[.]Z|jpg)$' \ > .files-raw cat .images \ | egrep '/p(|-[01])[.]ppm[.]Z$' \ > .files-base foreach f ( p.gif p-icon.gif ) echo .files-$f cat .images \ | egrep '/'"${f}"'$' \ > .files-${f} end foreach f ( p.parms p.comments p.html-inc p.sizes ) echo .files-$f find `cat .image-sets` -name "$f" \ | sed -e 's:^[.][/]::' -e 's:[/][.][/]:/:' \ | egrep -v '^(JUNK|misc|textures|tests|temp|tools|synthetic|movies|icons)/' \ | sort | uniq \ > .files-${f} end For each category, list image sets that have and lack files of that category: foreach f ( original raw base p.parms p.gif p-icon.gif p.comments p.html-inc p.sizes ) echo ".have-$f .lack-$f" cat .files-$f \ | sed -e 's:/[^/]*$::' \ | sort | uniq \ > .have-$f bool 1-2 .image-sets .have-$f \ > .lack-$f end Listing non-standard files in directories which should have standard structure: cat .images \ | egrep -v '/orig[.](gif|tif|tiff|jpg|jpeg|png|ppm|pgm|pbm|bmp|tga)$' \ | egrep -v '/p(|-[01])-raw[.](ppm[.]Z|jpg)$' \ | egrep -v '/p-cal[.]ppm[.]Z$' \ | egrep -v '/p(|-[01])[.]ppm[.]Z$' \ | egrep -v '/p[.](gif|sizes|comments|html-inc|parms)$' \ | egrep -v '/p-[0-9]*x[0-9]*.gif$' \ | egrep -v '/p-icon.gif$' \ > .images-spurious wc .images-spurious; cat .images-spurious Listing image sets which MAY need cropping/scaling. Those are the sets which have an "orig.*" file but missing or obsolete "XXX-raw.ppm.Z" or "XXX-raw.jpg" file, where "XXX" is either "p" or "P-0" and "p-1". Note that sometimes, especially for synthetic images, the base image "XXX.ppm.Z" is generated from the "orig.*" file without going through the "XXX-raw.*" version. set out = ".needs-raw" /bin/rm -f ${out} foreach f ( `cat .have-original` ) echo "checking $f" set origs = ( `cd $f && echo orig.*` ) if ( ( -r $f/p-raw.ppm.Z ) || \ ( -r $f/p-0-raw.ppm.Z ) || ( -r $f/p-1-raw.ppm.Z ) || \ ( -r $f/p-raw.jpg ) || \ ( -r $f/p-0-raw.jpg ) || ( -r $f/p-1-raw.jpg ) \ ) then foreach raw ( `cd $f && echo p-raw.* p-[01]-raw.*` ) set remake = `cd $f && check-dependencies ${raw} ${origs}` if ( ${remake} ) echo $f >> ${out} end else echo $f >> ${out} endif end wc ${out}; cat ${out} Listing image sets that need color correction. These are sets with a "XXX-raw.pgm.Z" or "XXX-raw.jpg" file but missing or obsolete "XXX.ppm.Z" file. set out = ".needs-correct" /bin/rm -f ${out} foreach f ( `cat .have-raw` ) echo "checking $f" set raws = ( `cd $f && echo p*-raw.*` ) if ( ( -r $f/p.ppm.Z ) || ( -r $f/p-0.ppm.Z ) || ( -r $f/p-1.ppm.Z ) ) then foreach base ( `cd $f && echo p.ppm* p-[01].ppm*` ) set remake = `cd $f && check-dependencies ${base} ${raws}` if ( ${remake} ) echo $f >> ${out} end else echo $f >> ${out} endif end wc ${out}; cat ${out} Listing image sets that apparently need "p.parms": set out = ".needs-p.parms" bool 1-2 .needs-correct .have-p.parms > ${out} wc ${out}; cat ${out} IMAGE CAPTIONS AND DIRECTORY-BASED INDICES Collecting and concatenating all caption files: tools/bin/collect-all-p-comments ${topdirs} > ALL-COMMENTS.txt The file ALL-COMMENTS.txt can be edited with Emacs. Redistributing the captions to the dataset directories: tools/bin/distribute-p-comments < ALL-COMMENTS.txt Forcing creation of all index sets: find ${topdirs} -name 'TITLE.txt' -print \ | egrep -v '^(JUNK|misc|textures|tests|temp|tools|movies|synthetic|icons)/' \ | sed \ -e 's:/TITLE.txt::g' \ -e 's:^[.][/]::g' \ -e 's:[/][.][/]:/:g' \ > .indexed-dirs cat .indexed-dirs do-index `cat .indexed-dirs` SUBJECT INDEX The subject index is created from the files "p.comments" in all image directories. We exclude image directories that end with 'N' (e.g. at the request of people featured therein). Checking and updating the index normalization table: cat ALL-COMMENTS.txt \ | egrep -v -e '[N]/p[.]' \ | tools/bin/extract-index-entries -v unmarked=1 \ | sort | uniq \ > .all-entries.txt cat .all-entries.txt \ | map-field \ -v table=NORMAL-KEYS.tbl \ -v inField=1 -v outField=1 \ -v forgiving=1 \ | gawk '/[{}]/{print $1;}' \ | sort | uniq \ > .all-keys.txt Generating the raw index: cat ALL-COMMENTS.txt \ | egrep -v -e '[N]/p[.]' \ | tools/bin/extract-index-entries -v unmarked=0 \ | sort | uniq \ > .raw-index.tbl Generating the subject index: cat .raw-index.tbl \ | tools/bin/make-subject-index TO DO Color-correct images that need it. Put back the "orig" versions of images when they are not too big. Reorganize everything so as to have only one "orig.png" image with arbitrary type and scale; a cached (fully reconstructble) cropped and corrected "p-base.png" image; and a bunch of derived "p-NNNxNNN.jpg" files (no GIFs!). For that, must wait until we have a fast machine with a "xv" that understands PNGs. We must also make ColorCorrect faster.