# Last edited on 2003-12-31 16:33:01 by stolfi # Analyzing Reid's First Language List INTRO This directory records an attempt to analyze the evolution of Richard J. Reid's list of computer languages used in introductory programming courses. The main analysis was done in Sep. 2001, with a small update in Dec. 2003. DATA For the Sep/2001 summary, the raw lists were obtained from the postings by Richard Reid himself. Currently all those lists, plus a couple of new ones, became available at http://www.csee.wvu.edu/~vanscoy/reid.htm DATA CLEANUP The following list versions were considered (the last one in the Dec/2003 update only): set vers = ( 1995-04 1999-08 2001-02 2002-02 ) The raw lists were cleaned up in several ways, resuling in files "{VERS}-all.tbl", with the following format LANG FRAC VARIANT {SCHOOL} where the fields are LANG generic language (E.g. "Pascal", "Modula"); FRAC fraction of time/students devoted to LANG; VARIANT "-" or variant of LANG (e.g. "ObjPascal", "Modula-2"); SCHOOL school's name. Schools that were listed by Reid as using 2 or more languages were split into separate records with the appropriate {FRAC} fields. The generic language names were inserted by hand. For 2002-02, three lists were generated 2002-02-cfm.tbl Data updated or confirmed between 2001 and 2002 2002-02-unc.tbl Data from 2001 that was not confirmed/updated 2002-02-all.tbl Union of the above. The scripts {extract-univs} was used to extract the university names: foreach f ( ${vers} ) echo $f cat ${f}-all.tbl | extract-univs | sort -u > .uns-$f end cat .uns-????-?? | sort -u > .uns-all The script {fix-school-names} was generated by amnually editing this list, and was then applied to each file, so as to ensure that the same spelling and form of the name was used in all lists Checking that "new"schools are not typos: cat .uns-{1995-04,1999-08,2001-02} | sort -u > .uns-old bool 1-2 .uns-2002-02 .uns-old The output must be compared manually to .uns-old, looking for near-duplicates. Beware that "Brunei U." and "Brunel U." both exist, and ditto for "DePaul U." and "DePauw U.". TALLYING Tallying the tabulated data: foreach f ( ${vers} ) echo ${f}-all.tbl cat ${f}-all.tbl | tally-langs | sort -b +0 -1 +2 -3gr > ${f}-all.tally end cat 2002-02-cfm.tbl | tally-langs | sort -b +0 -1 +2 -3gr > 2002-02-cfm.tally cat 2002-02-unc.tbl | tally-langs | sort -b +0 -1 +2 -3gr > 2002-02-unc.tally 1995-04-all.tally 1999-08-all.tally 2001-02-all.tally 2002-02-all.tally 2001-02-common.tbl 1995-04-common.tbl 1999-08-common.tbl 2001-02-common.tally 1995-04-common.tally 1999-08-common.tally .