# Last edited on 2008-03-10 15:33:19 by stolfi

IMAGE LIBRARY 

  This directory subtree contains some "important" images --- mostly photos,
  logos, maps, cartoons, scanned book covers --- that are used in
  "important" webpages and such.  
  
  For each image we keep the original source image (with as little 
  manipulation is possible), versions of the same in several sizes
  and formats, processing parameters, documentation, etc..
  
  The directory and file structure is maintained by hand with the help
  of several shell scripts, as shown below. This schema has become too
  complicated, and should be replaced by Makefiles --- one in each
  image set directory, that import general Makefiles.
  
  We should convert everyting to ".png" instead of ".gif".
  
DIRECTORY STRUCTURE 

  A typical image is represented by a "standard image set",
  consisting of the following files:

    orig.*            Original image before cropping/scaling. 
    p-raw.ppm.Z       Image cropped to final aspect, before color correction.
    p-[01]-raw.ppm.Z  Ditto, for transparent-background images.
    p-raw.jpg         Alternative to p-raw.ppm.Z (100% quality).
    p-[01]-raw.jpg    Alternatives to p-[01]-raw.ppm.Z (100% quality).
    p-cal.ppm.Z       Calibration color scale for "p-raw.ppm.Z".
    p.ppm.Z           Base image, with correct aspect and linear color scale.
    p-[01].ppm.Z      Ditto, for transparent-background images.
    p.parms           Color correction parameters for p-raw
    p.sizes           List of GIF image sizes to generate (NNNxNNN).
    p-NNNxNNN.gif     Image p.ppm scaled to size NNN by NNN and converted to GIF.
    p.gif             A link to the p-*.gif at the default viewing size.
    p-icon.gif        A link to the p-*.gif at the default icon size.
    p.comments        Description of subject, date, photographer/author.
    p.html-inc        HTML snippet for index page.

  Items marked "?" are optional, "!" are mandatory, "^" are alternatives
  to the preceding line.  
  
  Except for "orig.*", all image files in the same set have exactly
  the same aspect ratio, usually a simple fraction like 2:3, 7:5, etc.
  
  Either "p-raw.ppm.Z" alone is present, or both "p-0-raw.ppm.Z and
  "p-1.raw.ppm.Z" are be present, or none of them is present.

  For each "XXX" in "p", "p-0", or "p-1":
  
    If "XXX-raw.ppm.Z" is present, it is assumed to be the official
    image, and "p.parms" must be present too. The file "XXX.ppm.Z"
    will be the result of running "XXX-raw.ppm.Z" through "ColorCorrect"
    with parameters "p.parms".
  
    If "XXX-raw.ppm.Z" is missing but "XXX.ppm.Z" is present, the
    latter is assumed to be already color-corrected (linear scale).
  
    The file "XXX-raw.jpg" (with 100% quality factor) is 
    an alternative to "XXX-raw.ppm.Z".

  The file "p-cal.ppm.Z", if present, contains an image of some
  reference object (usually a gray scale) in the same calibration as
  "p-raw.ppm.Z".
  
  The file "p.parms" is created manually, either by trial and error or
  by analysis of "p-raw.ppm.Z" and "p-cal.ppm.Z".
    
  In any case, either "p.ppm.Z" alone must be present, or both
  "p-0.ppm.Z" and "p-1.ppm.Z" must be present. The latter are assumed
  to be the black- and white-background version of a GIF image
  (usually a synthetic one) with partially transparent background. All
  other versions of the image are derived automatically from "p.ppm.Z"
  (or "p-0.ppm.Z" and "p-1.ppm.Z"), maintaining its aspect ratio.

  The file "orig.*" is the original image --- the raw scanner
  output, or an image provided by someone else, in any format. It gets
  turned into "p-raw.ppm.Z" by semi-manual cropping and/or scaling
  (usually by the script "convert-and-crooriginal-images").

SPECIAL IMAGE SETS
  
  The image sets in "photos/people" are mug shots for homepages and
  departmental photo galleries. They should all have aspect ratio 5:7,
  and the GIF series must include the official sizes
  "photos/people/PORTRAIT.sizes"
  
  The image sets "figures/covers" are scans of book covers, also for
  departmental pages. They too should have aspect ratio 5:7,
  and the GIF series must include the official sizes
  "photos/people/STANDARD.sizes"
  
  The following directories do NOT contain standard image sets:

    movies          Animated images.

    textures        A library of texture files.

    misc            Miscellaneous images not in standard sets.

    icons           Components for HTML pages (bullets, buttons, etc.)
  
    tools           Shell scripts and related files. 

    JUNK            Apparently useless images, to be discarded.

    temp            Download area for scanned images.

CHECKING AND (RE)GENERATING STANDARD IMAGE SETS

  Compressing all ".ppm" files:

    set topdirs = ( ./ photos/events-ic/./ )
    
    find ${topdirs} -name '*.ppm' -print \
      | sed -e 's:^[.][/]::' -e 's:[/][.][/]:/:' \
      | egrep -v '^(JUNK|special|misc|textures|temp|tools|movies|synthetic|icons)/' \
      > .ppm-images
    wc .ppm-images
    
    if ( ! ( -z .ppm-images ) ) compress -f `cat .ppm-images`

  Finding all image files in directories which should be standard image sets:

    find-image-files ${topdirs} \
      | sed -e 's:^[.][/]::' -e 's:[/][.][/]:/:' \
      | egrep -v '^(JUNK|misc|textures|tests|temp|tools|movies|synthetic|icons)/' \
      | sort | uniq \
      > .images
    wc .images

  Finding all diretories which should be standard image sets:

    cat .images \
      | grep '/' \
      | sed -e 's:/[^/]*$::' \
      | sort | uniq \
      > .image-sets
    wc .image-sets

  Find directories which are not in the above list:
  
    find ${topdirs} -type d \
      | sed -e 's:^[.][/]::' -e 's:[/][.][/]:/:' \
      | egrep -v '^(JUNK|misc|textures|tests|temp|tools|synthetic|movies|icons)/' \
      | sort | uniq \
      > .dirs
    bool 1-2 .dirs .image-sets

  List files of each category:

    cat .images \
      | egrep -i '/orig[.](gif|tif|tiff|jpg|jpeg|png|ppm|pgm|pbm|bmp|tga)$' \
      > .files-original
      
    cat .images \
      | egrep '/p(|-[01])-raw[.](ppm[.]Z|jpg)$' \
      > .files-raw
      
    cat .images \
      | egrep '/p(|-[01])[.]ppm[.]Z$' \
      > .files-base

    foreach f ( p.gif p-icon.gif )
      echo .files-$f
      cat .images \
        | egrep '/'"${f}"'$' \
        > .files-${f}
    end
    
    foreach f ( p.parms p.comments p.html-inc p.sizes )
      echo .files-$f
      find  `cat .image-sets` -name "$f" \
        | sed -e 's:^[.][/]::' -e 's:[/][.][/]:/:' \
        | egrep -v '^(JUNK|misc|textures|tests|temp|tools|synthetic|movies|icons)/' \
        | sort | uniq \
        > .files-${f}
    end

  For each category, list image sets that have and lack files of that category:

    foreach f ( original raw base  p.parms p.gif p-icon.gif p.comments p.html-inc p.sizes )
      echo ".have-$f .lack-$f"
      cat .files-$f \
        | sed -e 's:/[^/]*$::' \
        | sort | uniq \
        > .have-$f
      bool 1-2 .image-sets .have-$f \
        > .lack-$f
    end

  Listing non-standard files in directories which should have standard structure:

    cat .images \
      | egrep -v '/orig[.](gif|tif|tiff|jpg|jpeg|png|ppm|pgm|pbm|bmp|tga)$' \
      | egrep -v '/p(|-[01])-raw[.](ppm[.]Z|jpg)$' \
      | egrep -v '/p-cal[.]ppm[.]Z$' \
      | egrep -v '/p(|-[01])[.]ppm[.]Z$' \
      | egrep -v '/p[.](gif|sizes|comments|html-inc|parms)$' \
      | egrep -v '/p-[0-9]*x[0-9]*.gif$' \
      | egrep -v '/p-icon.gif$' \
      > .images-spurious
    wc .images-spurious; cat .images-spurious

  Listing image sets which MAY need cropping/scaling. Those are the
  sets which have an "orig.*" file but missing or obsolete
  "XXX-raw.ppm.Z" or "XXX-raw.jpg" file, where "XXX" is either "p" or
  "P-0" and "p-1". Note that sometimes, especially for synthetic
  images, the base image "XXX.ppm.Z" is generated from the "orig.*"
  file without going through the "XXX-raw.*" version.
  
    set out = ".needs-raw"
    /bin/rm -f ${out}
    foreach f ( `cat .have-original` )
      echo "checking $f"
      set origs = ( `cd $f && echo orig.*` )
      if ( ( -r $f/p-raw.ppm.Z ) || \
           ( -r $f/p-0-raw.ppm.Z ) || ( -r $f/p-1-raw.ppm.Z ) || \
           ( -r $f/p-raw.jpg ) || \
           ( -r $f/p-0-raw.jpg ) || ( -r $f/p-1-raw.jpg ) \
         ) then 
        foreach raw ( `cd $f && echo p-raw.* p-[01]-raw.*` ) 
          set remake = `cd $f && check-dependencies ${raw} ${origs}`
          if ( ${remake} ) echo $f >> ${out}
        end
      else
        echo $f >> ${out}
      endif
    end
    wc ${out}; cat ${out}

  Listing image sets that need color correction. These are sets with a
  "XXX-raw.pgm.Z" or "XXX-raw.jpg" file but missing or obsolete
  "XXX.ppm.Z" file.
  
    set out = ".needs-correct"
    /bin/rm -f ${out}
    foreach f ( `cat .have-raw` )
      echo "checking $f"
      set raws = ( `cd $f && echo p*-raw.*` )
      if ( ( -r $f/p.ppm.Z ) || ( -r $f/p-0.ppm.Z ) || ( -r $f/p-1.ppm.Z ) ) then 
        foreach base ( `cd $f && echo p.ppm* p-[01].ppm*` ) 
          set remake = `cd $f && check-dependencies ${base} ${raws}`
          if ( ${remake} ) echo $f >> ${out}
        end
      else
        echo $f >> ${out}
      endif
    end
    wc ${out}; cat ${out}
    
  Listing image sets that apparently need "p.parms": 
  
    set out = ".needs-p.parms"
    bool 1-2 .needs-correct .have-p.parms > ${out}
    wc ${out}; cat ${out}

IMAGE CAPTIONS AND DIRECTORY-BASED INDICES

  Collecting and concatenating all caption files:
  
    tools/bin/collect-all-p-comments ${topdirs} > ALL-COMMENTS.txt

  The file ALL-COMMENTS.txt can be edited with Emacs.
  
  Redistributing the captions to the dataset directories:
  
    tools/bin/distribute-p-comments < ALL-COMMENTS.txt
    
  Forcing creation of all index sets:
  
    find ${topdirs} -name 'TITLE.txt' -print \
      | egrep -v '^(JUNK|misc|textures|tests|temp|tools|movies|synthetic|icons)/' \
      | sed \
          -e 's:/TITLE.txt::g' \
          -e 's:^[.][/]::g' \
          -e 's:[/][.][/]:/:g' \
      > .indexed-dirs
    cat .indexed-dirs
  
    do-index `cat .indexed-dirs`
    
SUBJECT INDEX

  The subject index is created from the files "p.comments" in all
  image directories. We exclude image directories that end with 'N'
  (e.g. at the request of people featured therein).
  
  Checking and updating the index normalization table:
  
    cat ALL-COMMENTS.txt \
      | egrep -v -e '[N]/p[.]' \
      | tools/bin/extract-index-entries -v unmarked=1 \
      | sort | uniq \
      > .all-entries.txt
      
    cat .all-entries.txt \
      | map-field \
          -v table=NORMAL-KEYS.tbl \
          -v inField=1 -v outField=1 \
          -v forgiving=1 \
      | gawk '/[{}]/{print $1;}' \
      | sort | uniq \
      > .all-keys.txt

  Generating the raw index:
  
    cat ALL-COMMENTS.txt \
      | egrep -v -e '[N]/p[.]' \
      | tools/bin/extract-index-entries -v unmarked=0 \
      | sort | uniq \
      > .raw-index.tbl
      
  Generating the subject index: 
  
    cat .raw-index.tbl \
      | tools/bin/make-subject-index 

TO DO 

   Color-correct images that need it.
   
   Put back the "orig" versions of images when they are not too big.
   
   Reorganize everything so as to have only one "orig.png" image with
   arbitrary type and scale; a cached (fully reconstructble) cropped
   and corrected "p-base.png" image; and a bunch of derived
   "p-NNNxNNN.jpg" files (no GIFs!). For that, must wait until we have
   a fast machine with a "xv" that understands PNGs. We must also make
   ColorCorrect faster.