1 1.1 christos ### ==================================================================== 2 1.1 christos ### @Awk-file{ 3 1.1 christos ### author = "Nelson H. F. Beebe", 4 1.1 christos ### version = "1.00", 5 1.1 christos ### date = "09 October 1996", 6 1.1 christos ### time = "15:57:06 MDT", 7 1.1 christos ### filename = "journal-toc.awk", 8 1.1 christos ### address = "Center for Scientific Computing 9 1.1 christos ### Department of Mathematics 10 1.1 christos ### University of Utah 11 1.1 christos ### Salt Lake City, UT 84112 12 1.1 christos ### USA", 13 1.1 christos ### telephone = "+1 801 581 5254", 14 1.1 christos ### FAX = "+1 801 581 4148", 15 1.1 christos ### URL = "http://www.math.utah.edu/~beebe", 16 1.1 christos ### checksum = "25092 977 3357 26493", 17 1.1 christos ### email = "beebe (at] math.utah.edu (Internet)", 18 1.1 christos ### codetable = "ISO/ASCII", 19 1.1 christos ### keywords = "BibTeX, bibliography, HTML, journal table of 20 1.1 christos ### contents", 21 1.1 christos ### supported = "yes", 22 1.1 christos ### docstring = "Create a journal cover table of contents from 23 1.1 christos ### <at>Article{...} entries in a journal BibTeX 24 1.1 christos ### .bib file for checking the bibliography 25 1.1 christos ### database against the actual journal covers. 26 1.1 christos ### The output can be either plain text, or HTML. 27 1.1 christos ### 28 1.1 christos ### Usage: 29 1.1 christos ### bibclean -max-width 0 BibTeX-file(s) | \ 30 1.1 christos ### bibsort -byvolume | \ 31 1.1 christos ### awk -f journal-toc.awk \ 32 1.1 christos ### [-v HTML=nnn] [-v INDENT=nnn] \ 33 1.1 christos ### [-v BIBFILEURL=url] >foo.toc 34 1.1 christos ### 35 1.1 christos ### or if the bibliography is already sorted 36 1.1 christos ### by volume, 37 1.1 christos ### 38 1.1 christos ### bibclean -max-width 0 BibTeX-file(s) | \ 39 1.1 christos ### awk -f journal-toc.awk \ 40 1.1 christos ### [-v HTML=nnn] [-v INDENT=nnn] \ 41 1.1 christos ### [-v BIBFILEURL=url] >foo.toc 42 1.1 christos ### 43 1.1 christos ### A non-zero value of the command-line option, 44 1.1 christos ### HTML=nnn, results in HTML output instead of 45 1.1 christos ### the default plain ASCII text (corresponding 46 1.1 christos ### to HTML=0). The 47 1.1 christos ### 48 1.1 christos ### The INDENT=nnn command-line option specifies 49 1.1 christos ### the number of blanks to indent each logical 50 1.1 christos ### level of HTML. The default is INDENT=4. 51 1.1 christos ### INDENT=0 suppresses indentation. The INDENT 52 1.1 christos ### option has no effect when the default HTML=0 53 1.1 christos ### (plain text output) option is in effect. 54 1.1 christos ### 55 1.1 christos ### When HTML output is selected, the 56 1.1 christos ### BIBFILEURL=url command-line option provides a 57 1.1 christos ### way to request hypertext links from table of 58 1.1 christos ### contents page numbers to the complete BibTeX 59 1.1 christos ### entry for the article. These links are 60 1.1 christos ### created by appending a sharp (#) and the 61 1.1 christos ### citation label to the BIBFILEURL value, which 62 1.1 christos ### conforms with the practice of 63 1.1 christos ### bibtex-to-html.awk. 64 1.1 christos ### 65 1.1 christos ### The HTML output form may be useful as a more 66 1.1 christos ### compact representation of journal article 67 1.1 christos ### bibliography data than the original BibTeX 68 1.1 christos ### file provides. Of course, the 69 1.1 christos ### table-of-contents format provides less 70 1.1 christos ### information, and is considerably more 71 1.1 christos ### troublesome for a computer program to parse. 72 1.1 christos ### 73 1.1 christos ### When URL key values are provided, they will 74 1.1 christos ### be used to create hypertext links around 75 1.1 christos ### article titles. This supports journals that 76 1.1 christos ### provide article contents on the World-Wide 77 1.1 christos ### Web. 78 1.1 christos ### 79 1.1 christos ### For parsing simplicity, this program requires 80 1.1 christos ### that BibTeX 81 1.1 christos ### 82 1.1 christos ### key = "value" 83 1.1 christos ### 84 1.1 christos ### and 85 1.1 christos ### 86 1.1 christos ### @String{name = "value"} 87 1.1 christos ### 88 1.1 christos ### specifications be entirely contained on 89 1.1 christos ### single lines, which is readily provided by 90 1.1 christos ### the `bibclean -max-width 0' filter. It also 91 1.1 christos ### requires that bibliography entries begin and 92 1.1 christos ### end at the start of a line, and that 93 1.1 christos ### quotation marks, rather than balanced braces, 94 1.1 christos ### delimit string values. This is a 95 1.1 christos ### conventional format that again can be 96 1.1 christos ### guaranteed by bibclean. 97 1.1 christos ### 98 1.1 christos ### This program requires `new' awk, as described 99 1.1 christos ### in the book 100 1.1 christos ### 101 1.1 christos ### Alfred V. Aho, Brian W. Kernighan, and 102 1.1 christos ### Peter J. Weinberger, 103 1.1 christos ### ``The AWK Programming Language'', 104 1.1 christos ### Addison-Wesley (1988), ISBN 105 1.1 christos ### 0-201-07981-X, 106 1.1 christos ### 107 1.1 christos ### such as provided by programs named (GNU) 108 1.1 christos ### gawk, nawk, and recent AT&T awk. 109 1.1 christos ### 110 1.1 christos ### The checksum field above contains a CRC-16 111 1.1 christos ### checksum as the first value, followed by the 112 1.1 christos ### equivalent of the standard UNIX wc (word 113 1.1 christos ### count) utility output of lines, words, and 114 1.1 christos ### characters. This is produced by Robert 115 1.1 christos ### Solovay's checksum utility.", 116 1.1 christos ### } 117 1.1 christos ### ==================================================================== 118 1.1 christos 119 1.1 christos BEGIN { initialize() } 120 1.1 christos 121 1.1 christos /^ *@ *[Ss][Tt][Rr][Ii][Nn][Gg] *{/ { do_String(); next } 122 1.1 christos 123 1.1 christos /^ *@ *[Pp][Rr][Ee][Aa][Mm][Bb][Ll][Ee]/ { next } 124 1.1 christos 125 1.1 christos /^ *@ *[Aa][Rr][Tt][Ii][Cc][Ll][Ee]/ { do_Article(); next } 126 1.1 christos 127 1.1 christos /^ *@/ { do_Other(); next } 128 1.1 christos 129 1.1 christos /^ *author *= *\"/ { do_author(); next } 130 1.1 christos 131 1.1 christos /^ *journal *= */ { do_journal(); next } 132 1.1 christos 133 1.1 christos /^ *volume *= *\"/ { do_volume(); next } 134 1.1 christos 135 1.1 christos /^ *number *= *\"/ { do_number(); next } 136 1.1 christos 137 1.1 christos /^ *year *= *\"/ { do_year(); next } 138 1.1 christos 139 1.1 christos /^ *month *= */ { do_month(); next } 140 1.1 christos 141 1.1 christos /^ *title *= *\"/ { do_title(); next } 142 1.1 christos 143 1.1 christos /^ *pages *= *\"/ { do_pages(); next } 144 1.1 christos 145 1.1 christos /^ *URL *= *\"/ { do_URL(); next } 146 1.1 christos 147 1.1 christos /^ *} *$/ { if (In_Article) do_end_entry(); next } 148 1.1 christos 149 1.1 christos END { terminate() } 150 1.1 christos 151 1.1 christos 152 1.1 christos ######################################################################## 153 1.1 christos # NB: The programming conventions for variables in this program are: # 154 1.1 christos # UPPERCASE global constants and user options # 155 1.1 christos # Initialuppercase global variables # 156 1.1 christos # lowercase local variables # 157 1.1 christos # Any deviation is an error! # 158 1.1 christos ######################################################################## 159 1.1 christos 160 1.1 christos 161 1.1 christos function do_Article() 162 1.1 christos { 163 1.1 christos In_Article = 1 164 1.1 christos 165 1.1 christos Citation_label = $0 166 1.1 christos sub(/^[^\{]*{/,"",Citation_label) 167 1.1 christos sub(/ *, *$/,"",Citation_label) 168 1.1 christos 169 1.1 christos Author = "" 170 1.1 christos Title = "" 171 1.1 christos Journal = "" 172 1.1 christos Volume = "" 173 1.1 christos Number = "" 174 1.1 christos Month = "" 175 1.1 christos Year = "" 176 1.1 christos Pages = "" 177 1.1 christos Url = "" 178 1.1 christos } 179 1.1 christos 180 1.1 christos 181 1.1 christos function do_author() 182 1.1 christos { 183 1.1 christos Author = TeX_to_HTML(get_value($0)) 184 1.1 christos } 185 1.1 christos 186 1.1 christos 187 1.1 christos function do_end_entry( k,n,parts) 188 1.1 christos { 189 1.1 christos n = split(Author,parts," and ") 190 1.1 christos if (Last_number != Number) 191 1.1 christos do_new_issue() 192 1.1 christos for (k = 1; k < n; ++k) 193 1.1 christos print_toc_line(parts[k] " and", "", "") 194 1.1 christos Title_prefix = html_begin_title() 195 1.1 christos Title_suffix = html_end_title() 196 1.1 christos if (html_length(Title) <= (MAX_TITLE_CHARS + MIN_LEADERS)) # complete title fits on line 197 1.1 christos print_toc_line(parts[n], Title, html_begin_pages() Pages html_end_pages()) 198 1.1 christos else # need to split long title over multiple lines 199 1.1 christos do_long_title(parts[n], Title, html_begin_pages() Pages html_end_pages()) 200 1.1 christos } 201 1.1 christos 202 1.1 christos 203 1.1 christos function do_journal() 204 1.1 christos { 205 1.1 christos if ($0 ~ /[=] *"/) # have journal = "quoted journal name", 206 1.1 christos Journal = get_value($0) 207 1.1 christos else # have journal = journal-abbreviation, 208 1.1 christos { 209 1.1 christos Journal = get_abbrev($0) 210 1.1 christos if (Journal in String) # replace abbrev by its expansion 211 1.1 christos Journal = String[Journal] 212 1.1 christos } 213 1.1 christos gsub(/\\-/,"",Journal) # remove discretionary hyphens 214 1.1 christos } 215 1.1 christos 216 1.1 christos 217 1.1 christos function do_long_title(author,title,pages, last_title,n) 218 1.1 christos { 219 1.1 christos title = trim(title) # discard leading and trailing space 220 1.1 christos while (length(title) > 0) 221 1.1 christos { 222 1.1 christos n = html_breakpoint(title,MAX_TITLE_CHARS+MIN_LEADERS) 223 1.1 christos last_title = substr(title,1,n) 224 1.1 christos title = substr(title,n+1) 225 1.1 christos sub(/^ +/,"",title) # discard any leading space 226 1.1 christos print_toc_line(author, last_title, (length(title) == 0) ? pages : "") 227 1.1 christos author = "" 228 1.1 christos } 229 1.1 christos } 230 1.1 christos 231 1.1 christos 232 1.1 christos function do_month( k,n,parts) 233 1.1 christos { 234 1.1 christos Month = ($0 ~ /[=] *"/) ? get_value($0) : get_abbrev($0) 235 1.1 christos gsub(/[\"]/,"",Month) 236 1.1 christos gsub(/ *# *\\slash *# */," / ",Month) 237 1.1 christos gsub(/ *# *-+ *# */," / ",Month) 238 1.1 christos n = split(Month,parts," */ *") 239 1.1 christos Month = "" 240 1.1 christos for (k = 1; k <= n; ++k) 241 1.1 christos Month = Month ((k > 1) ? " / " : "") \ 242 1.1 christos ((parts[k] in Month_expansion) ? Month_expansion[parts[k]] : parts[k]) 243 1.1 christos } 244 1.1 christos 245 1.1 christos 246 1.1 christos function do_new_issue() 247 1.1 christos { 248 1.1 christos Last_number = Number 249 1.1 christos if (HTML) 250 1.1 christos { 251 1.1 christos if (Last_volume != Volume) 252 1.1 christos { 253 1.1 christos Last_volume = Volume 254 1.1 christos print_line(prefix(2) "<BR>") 255 1.1 christos } 256 1.1 christos html_end_toc() 257 1.1 christos html_begin_issue() 258 1.1 christos print_line(prefix(2) Journal "<BR>") 259 1.1 christos } 260 1.1 christos else 261 1.1 christos { 262 1.1 christos print_line("") 263 1.1 christos print_line(Journal) 264 1.1 christos } 265 1.1 christos 266 1.1 christos print_line(strip_html(vol_no_month_year())) 267 1.1 christos 268 1.1 christos if (HTML) 269 1.1 christos { 270 1.1 christos html_end_issue() 271 1.1 christos html_toc_entry() 272 1.1 christos html_begin_toc() 273 1.1 christos } 274 1.1 christos else 275 1.1 christos print_line("") 276 1.1 christos } 277 1.1 christos 278 1.1 christos 279 1.1 christos function do_number() 280 1.1 christos { 281 1.1 christos Number = get_value($0) 282 1.1 christos } 283 1.1 christos 284 1.1 christos 285 1.1 christos function do_Other() 286 1.1 christos { 287 1.1 christos In_Article = 0 288 1.1 christos } 289 1.1 christos 290 1.1 christos 291 1.1 christos function do_pages() 292 1.1 christos { 293 1.1 christos Pages = get_value($0) 294 1.1 christos sub(/--[?][?]/,"",Pages) 295 1.1 christos } 296 1.1 christos 297 1.1 christos 298 1.1 christos function do_String() 299 1.1 christos { 300 1.1 christos sub(/^[^\{]*\{/,"",$0) # discard up to and including open brace 301 1.1 christos sub(/\} *$/,"",$0) # discard from optional whitespace and trailing brace to end of line 302 1.1 christos String[get_key($0)] = get_value($0) 303 1.1 christos } 304 1.1 christos 305 1.1 christos 306 1.1 christos function do_title() 307 1.1 christos { 308 1.1 christos Title = TeX_to_HTML(get_value($0)) 309 1.1 christos } 310 1.1 christos 311 1.1 christos 312 1.1 christos function do_URL( parts) 313 1.1 christos { 314 1.1 christos Url = get_value($0) 315 1.1 christos split(Url,parts,"[,;]") # in case we have multiple URLs 316 1.1 christos Url = trim(parts[1]) 317 1.1 christos } 318 1.1 christos 319 1.1 christos 320 1.1 christos function do_volume() 321 1.1 christos { 322 1.1 christos Volume = get_value($0) 323 1.1 christos } 324 1.1 christos 325 1.1 christos 326 1.1 christos function do_year() 327 1.1 christos { 328 1.1 christos Year = get_value($0) 329 1.1 christos } 330 1.1 christos 331 1.1 christos 332 1.1 christos function get_abbrev(s) 333 1.1 christos { # return abbrev from ``key = abbrev,'' 334 1.1 christos sub(/^[^=]*= */,"",s) # discard text up to start of non-blank value 335 1.1 christos sub(/ *,? *$/,"",s) # discard trailing optional whitspace, quote, 336 1.1 christos # optional comma, and optional space 337 1.1 christos return (s) 338 1.1 christos } 339 1.1 christos 340 1.1 christos 341 1.1 christos function get_key(s) 342 1.1 christos { # return kay from ``key = "value",'' 343 1.1 christos sub(/^ */,"",s) # discard leading space 344 1.1 christos sub(/ *=.*$/,"",s) # discard everthing after key 345 1.1 christos 346 1.1 christos return (s) 347 1.1 christos } 348 1.1 christos 349 1.1 christos 350 1.1 christos function get_value(s) 351 1.1 christos { # return value from ``key = "value",'' 352 1.1 christos sub(/^[^\"]*\" */,"",s) # discard text up to start of non-blank value 353 1.1 christos sub(/ *\",? *$/,"",s) # discard trailing optional whitspace, quote, 354 1.1 christos # optional comma, and optional space 355 1.1 christos return (s) 356 1.1 christos } 357 1.1 christos 358 1.1 christos 359 1.1 christos function html_accents(s) 360 1.1 christos { 361 1.1 christos if (index(s,"\\") > 0) # important optimization 362 1.1 christos { 363 1.1 christos # Convert common lower-case accented letters according to the 364 1.1 christos # table on p. 169 of in Peter Flynn's ``The World Wide Web 365 1.1 christos # Handbook'', International Thomson Computer Press, 1995, ISBN 366 1.1 christos # 1-85032-205-8. The official table of ISO Latin 1 SGML 367 1.1 christos # entities used in HTML can be found in the file 368 1.1 christos # /usr/local/lib/html-check/lib/ISOlat1.sgml (your path 369 1.1 christos # may differ). 370 1.1 christos 371 1.1 christos gsub(/{\\\a}/, "\\à", s) 372 1.1 christos gsub(/{\\'a}/, "\\á", s) 373 1.1 christos gsub(/{\\[\^]a}/,"\\â", s) 374 1.1 christos gsub(/{\\~a}/, "\\ã", s) 375 1.1 christos gsub(/{\\\"a}/, "\\ä", s) 376 1.1 christos gsub(/{\\aa}/, "\\å", s) 377 1.1 christos gsub(/{\\ae}/, "\\æ", s) 378 1.1 christos 379 1.1 christos gsub(/{\\c{c}}/,"\\ç", s) 380 1.1 christos 381 1.1 christos gsub(/{\\\e}/, "\\è", s) 382 1.1 christos gsub(/{\\'e}/, "\\é", s) 383 1.1 christos gsub(/{\\[\^]e}/,"\\ê", s) 384 1.1 christos gsub(/{\\\"e}/, "\\ë", s) 385 1.1 christos 386 1.1 christos gsub(/{\\\i}/, "\\ì", s) 387 1.1 christos gsub(/{\\'i}/, "\\í", s) 388 1.1 christos gsub(/{\\[\^]i}/,"\\î", s) 389 1.1 christos gsub(/{\\\"i}/, "\\ï", s) 390 1.1 christos 391 1.1 christos # ignore eth and thorn 392 1.1 christos 393 1.1 christos gsub(/{\\~n}/, "\\ñ", s) 394 1.1 christos 395 1.1 christos gsub(/{\\\o}/, "\\ò", s) 396 1.1 christos gsub(/{\\'o}/, "\\ó", s) 397 1.1 christos gsub(/{\\[\^]o}/, "\\ô", s) 398 1.1 christos gsub(/{\\~o}/, "\\õ", s) 399 1.1 christos gsub(/{\\\"o}/, "\\ö", s) 400 1.1 christos gsub(/{\\o}/, "\\ø", s) 401 1.1 christos 402 1.1 christos gsub(/{\\\u}/, "\\ù", s) 403 1.1 christos gsub(/{\\'u}/, "\\ú", s) 404 1.1 christos gsub(/{\\[\^]u}/,"\\û", s) 405 1.1 christos gsub(/{\\\"u}/, "\\ü", s) 406 1.1 christos 407 1.1 christos gsub(/{\\'y}/, "\\ý", s) 408 1.1 christos gsub(/{\\\"y}/, "\\ÿ", s) 409 1.1 christos 410 1.1 christos # Now do the same for upper-case accents 411 1.1 christos 412 1.1 christos gsub(/{\\\A}/, "\\À", s) 413 1.1 christos gsub(/{\\'A}/, "\\Á", s) 414 1.1 christos gsub(/{\\[\^]A}/, "\\Â", s) 415 1.1 christos gsub(/{\\~A}/, "\\Ã", s) 416 1.1 christos gsub(/{\\\"A}/, "\\Ä", s) 417 1.1 christos gsub(/{\\AA}/, "\\Å", s) 418 1.1 christos gsub(/{\\AE}/, "\\Æ", s) 419 1.1 christos 420 1.1 christos gsub(/{\\c{C}}/,"\\Ç", s) 421 1.1 christos 422 1.1 christos gsub(/{\\\e}/, "\\È", s) 423 1.1 christos gsub(/{\\'E}/, "\\É", s) 424 1.1 christos gsub(/{\\[\^]E}/, "\\Ê", s) 425 1.1 christos gsub(/{\\\"E}/, "\\Ë", s) 426 1.1 christos 427 1.1 christos gsub(/{\\\I}/, "\\Ì", s) 428 1.1 christos gsub(/{\\'I}/, "\\Í", s) 429 1.1 christos gsub(/{\\[\^]I}/, "\\Î", s) 430 1.1 christos gsub(/{\\\"I}/, "\\Ï", s) 431 1.1 christos 432 1.1 christos # ignore eth and thorn 433 1.1 christos 434 1.1 christos gsub(/{\\~N}/, "\\Ñ", s) 435 1.1 christos 436 1.1 christos gsub(/{\\\O}/, "\\Ò", s) 437 1.1 christos gsub(/{\\'O}/, "\\Ó", s) 438 1.1 christos gsub(/{\\[\^]O}/, "\\Ô", s) 439 1.1 christos gsub(/{\\~O}/, "\\Õ", s) 440 1.1 christos gsub(/{\\\"O}/, "\\Ö", s) 441 1.1 christos gsub(/{\\O}/, "\\Ø", s) 442 1.1 christos 443 1.1 christos gsub(/{\\\U}/, "\\Ù", s) 444 1.1 christos gsub(/{\\'U}/, "\\Ú", s) 445 1.1 christos gsub(/{\\[\^]U}/, "\\Û", s) 446 1.1 christos gsub(/{\\\"U}/, "\\Ü", s) 447 1.1 christos 448 1.1 christos gsub(/{\\'Y}/, "\\Ý", s) 449 1.1 christos 450 1.1 christos gsub(/{\\ss}/, "\\ß", s) 451 1.1 christos 452 1.1 christos # Others not mentioned in Flynn's book 453 1.1 christos gsub(/{\\'\\i}/,"\\í", s) 454 1.1 christos gsub(/{\\'\\j}/,"j", s) 455 1.1 christos } 456 1.1 christos return (s) 457 1.1 christos } 458 1.1 christos 459 1.1 christos 460 1.1 christos function html_begin_issue() 461 1.1 christos { 462 1.1 christos print_line("") 463 1.1 christos print_line(prefix(2) "<HR>") 464 1.1 christos print_line("") 465 1.1 christos print_line(prefix(2) "<H1>") 466 1.1 christos print_line(prefix(3) "<A NAME=\"" html_label() "\">") 467 1.1 christos } 468 1.1 christos 469 1.1 christos 470 1.1 christos function html_begin_pages() 471 1.1 christos { 472 1.1 christos return ((HTML && (BIBFILEURL != "")) ? ("<A HREF=\"" BIBFILEURL "#" Citation_label "\">") : "") 473 1.1 christos } 474 1.1 christos 475 1.1 christos 476 1.1 christos function html_begin_pre() 477 1.1 christos { 478 1.1 christos In_PRE = 1 479 1.1 christos print_line("<PRE>") 480 1.1 christos } 481 1.1 christos 482 1.1 christos 483 1.1 christos function html_begin_title() 484 1.1 christos { 485 1.1 christos return ((HTML && (Url != "")) ? ("<A HREF=\"" Url "\">") : "") 486 1.1 christos } 487 1.1 christos 488 1.1 christos 489 1.1 christos function html_begin_toc() 490 1.1 christos { 491 1.1 christos html_end_toc() 492 1.1 christos html_begin_pre() 493 1.1 christos } 494 1.1 christos 495 1.1 christos 496 1.1 christos function html_body( k) 497 1.1 christos { 498 1.1 christos for (k = 1; k <= BodyLines; ++k) 499 1.1 christos print Body[k] 500 1.1 christos } 501 1.1 christos 502 1.1 christos function html_breakpoint(title,maxlength, break_after,k) 503 1.1 christos { 504 1.1 christos # Return the largest character position in title AFTER which we 505 1.1 christos # can break the title across lines, without exceeding maxlength 506 1.1 christos # visible characters. 507 1.1 christos if (html_length(title) > maxlength) # then need to split title across lines 508 1.1 christos { 509 1.1 christos # In the presence of HTML markup, the initialization of 510 1.1 christos # k here is complicated, because we need to advance it 511 1.1 christos # until html_length(title) is at least maxlength, 512 1.1 christos # without invoking the expensive html_length() function 513 1.1 christos # too frequently. The need to split the title makes the 514 1.1 christos # alternative of delayed insertion of HTML markup much 515 1.1 christos # more complicated. 516 1.1 christos break_after = 0 517 1.1 christos for (k = min(maxlength,length(title)); k < length(title); ++k) 518 1.1 christos { 519 1.1 christos if (substr(title,k+1,1) == " ") 520 1.1 christos { # could break after position k 521 1.1 christos if (html_length(substr(title,1,k)) <= maxlength) 522 1.1 christos break_after = k 523 1.1 christos else # advanced too far, retreat back to last break_after 524 1.1 christos break 525 1.1 christos } 526 1.1 christos } 527 1.1 christos if (break_after == 0) # no breakpoint found by forward scan 528 1.1 christos { # so switch to backward scan 529 1.1 christos for (k = min(maxlength,length(title)) - 1; \ 530 1.1 christos (k > 0) && (substr(title,k+1,1) != " "); --k) 531 1.1 christos ; # find space at which to break title 532 1.1 christos if (k < 1) # no break point found 533 1.1 christos k = length(title) # so must print entire string 534 1.1 christos } 535 1.1 christos else 536 1.1 christos k = break_after 537 1.1 christos } 538 1.1 christos else # title fits on one line 539 1.1 christos k = length(title) 540 1.1 christos return (k) 541 1.1 christos } 542 1.1 christos 543 1.1 christos 544 1.1 christos 545 1.1 christos function html_end_issue() 546 1.1 christos { 547 1.1 christos print_line(prefix(3) "</A>") 548 1.1 christos print_line(prefix(2) "</H1>") 549 1.1 christos } 550 1.1 christos 551 1.1 christos 552 1.1 christos function html_end_pages() 553 1.1 christos { 554 1.1 christos return ((HTML && (BIBFILEURL != "")) ? "</A>" : "") 555 1.1 christos } 556 1.1 christos 557 1.1 christos 558 1.1 christos function html_end_pre() 559 1.1 christos { 560 1.1 christos if (In_PRE) 561 1.1 christos { 562 1.1 christos print_line("</PRE>") 563 1.1 christos In_PRE = 0 564 1.1 christos } 565 1.1 christos } 566 1.1 christos 567 1.1 christos 568 1.1 christos function html_end_title() 569 1.1 christos { 570 1.1 christos return ((HTML && (Url != "")) ? "</A>" : "") 571 1.1 christos } 572 1.1 christos 573 1.1 christos 574 1.1 christos function html_end_toc() 575 1.1 christos { 576 1.1 christos html_end_pre() 577 1.1 christos } 578 1.1 christos 579 1.1 christos 580 1.1 christos function html_fonts(s, arg,control_word,k,level,n,open_brace) 581 1.1 christos { 582 1.1 christos open_brace = index(s,"{") 583 1.1 christos if (open_brace > 0) # important optimization 584 1.1 christos { 585 1.1 christos level = 1 586 1.1 christos for (k = open_brace + 1; (level != 0) && (k <= length(s)); ++k) 587 1.1 christos { 588 1.1 christos if (substr(s,k,1) == "{") 589 1.1 christos level++ 590 1.1 christos else if (substr(s,k,1) == "}") 591 1.1 christos level-- 592 1.1 christos } 593 1.1 christos 594 1.1 christos # {...} is now found at open_brace ... (k-1) 595 1.1 christos for (control_word in Font_decl_map) # look for {\xxx ...} 596 1.1 christos { 597 1.1 christos if (substr(s,open_brace+1,length(control_word)+1) ~ \ 598 1.1 christos ("\\" control_word "[^A-Za-z]")) 599 1.1 christos { 600 1.1 christos n = open_brace + 1 + length(control_word) 601 1.1 christos arg = trim(substr(s,n,k - n)) 602 1.1 christos if (Font_decl_map[control_word] == "toupper") # arg -> ARG 603 1.1 christos arg = toupper(arg) 604 1.1 christos else if (Font_decl_map[control_word] != "") # arg -> <TAG>arg</TAG> 605 1.1 christos arg = "<" Font_decl_map[control_word] ">" arg "</" Font_decl_map[control_word] ">" 606 1.1 christos return (substr(s,1,open_brace-1) arg html_fonts(substr(s,k))) 607 1.1 christos } 608 1.1 christos } 609 1.1 christos for (control_word in Font_cmd_map) # look for \xxx{...} 610 1.1 christos { 611 1.1 christos if (substr(s,open_brace - length(control_word),length(control_word)) ~ \ 612 1.1 christos ("\\" control_word)) 613 1.1 christos { 614 1.1 christos n = open_brace + 1 615 1.1 christos arg = trim(substr(s,n,k - n)) 616 1.1 christos if (Font_cmd_map[control_word] == "toupper") # arg -> ARG 617 1.1 christos arg = toupper(arg) 618 1.1 christos else if (Font_cmd_map[control_word] != "") # arg -> <TAG>arg</TAG> 619 1.1 christos arg = "<" Font_cmd_map[control_word] ">" arg "</" Font_cmd_map[control_word] ">" 620 1.1 christos n = open_brace - length(control_word) - 1 621 1.1 christos return (substr(s,1,n) arg html_fonts(substr(s,k))) 622 1.1 christos } 623 1.1 christos } 624 1.1 christos } 625 1.1 christos return (s) 626 1.1 christos } 627 1.1 christos 628 1.1 christos 629 1.1 christos function html_header() 630 1.1 christos { 631 1.1 christos USER = ENVIRON["USER"] 632 1.1 christos if (USER == "") 633 1.1 christos USER = ENVIRON["LOGNAME"] 634 1.1 christos if (USER == "") 635 1.1 christos USER = "????" 636 1.1 christos "hostname" | getline HOSTNAME 637 1.1 christos "date" | getline DATE 638 1.1 christos ("ypcat passwd | grep '^" USER ":' | awk -F: '{print $5}'") | getline PERSONAL_NAME 639 1.1 christos if (PERSONAL_NAME == "") 640 1.1 christos ("grep '^" USER ":' /etc/passwd | awk -F: '{print $5}'") | getline PERSONAL_NAME 641 1.1 christos 642 1.1 christos 643 1.1 christos print "<!-- WARNING: Do NOT edit this file. It was converted from -->" 644 1.1 christos print "<!-- BibTeX format to HTML by journal-toc.awk version " VERSION_NUMBER " " VERSION_DATE " -->" 645 1.1 christos print "<!-- on " DATE " -->" 646 1.1 christos print "<!-- for " PERSONAL_NAME " (" USER "@" HOSTNAME ") -->" 647 1.1 christos print "" 648 1.1 christos print "" 649 1.1 christos print "<!DOCTYPE HTML public \"-//IETF//DTD HTML//EN\">" 650 1.1 christos print "" 651 1.1 christos print "<HTML>" 652 1.1 christos print prefix(1) "<HEAD>" 653 1.1 christos print prefix(2) "<TITLE>" 654 1.1 christos print prefix(3) Journal 655 1.1 christos print prefix(2) "</TITLE>" 656 1.1 christos print prefix(2) "<LINK REV=\"made\" HREF=\"mailto:" USER "@" HOSTNAME "\">" 657 1.1 christos print prefix(1) "</HEAD>" 658 1.1 christos print "" 659 1.1 christos print prefix(1) "<BODY>" 660 1.1 christos } 661 1.1 christos 662 1.1 christos 663 1.1 christos function html_label( label) 664 1.1 christos { 665 1.1 christos label = Volume "(" Number "):" Month ":" Year 666 1.1 christos gsub(/[^A-Za-z0-9():,;.\/\-]/,"",label) 667 1.1 christos return (label) 668 1.1 christos } 669 1.1 christos 670 1.1 christos 671 1.1 christos function html_length(s) 672 1.1 christos { # Return visible length of s, ignoring any HTML markup 673 1.1 christos if (HTML) 674 1.1 christos { 675 1.1 christos gsub(/<\/?[^>]*>/,"",s) # remove SGML tags 676 1.1 christos gsub(/&[A-Za-z0-9]+;/,"",s) # remove SGML entities 677 1.1 christos } 678 1.1 christos return (length(s)) 679 1.1 christos } 680 1.1 christos 681 1.1 christos 682 1.1 christos function html_toc() 683 1.1 christos { 684 1.1 christos print prefix(2) "<H1>" 685 1.1 christos print prefix(3) "Table of contents for issues of " Journal 686 1.1 christos print prefix(2) "</H1>" 687 1.1 christos print HTML_TOC 688 1.1 christos } 689 1.1 christos 690 1.1 christos 691 1.1 christos function html_toc_entry() 692 1.1 christos { 693 1.1 christos HTML_TOC = HTML_TOC " <A HREF=\"#" html_label() "\">" 694 1.1 christos HTML_TOC = HTML_TOC vol_no_month_year() 695 1.1 christos HTML_TOC = HTML_TOC "</A><BR>" "\n" 696 1.1 christos } 697 1.1 christos 698 1.1 christos 699 1.1 christos function html_trailer() 700 1.1 christos { 701 1.1 christos html_end_pre() 702 1.1 christos print prefix(1) "</BODY>" 703 1.1 christos print "</HTML>" 704 1.1 christos } 705 1.1 christos 706 1.1 christos 707 1.1 christos function initialize() 708 1.1 christos { 709 1.1 christos # NB: Update these when the program changes 710 1.1 christos VERSION_DATE = "[09-Oct-1996]" 711 1.1 christos VERSION_NUMBER = "1.00" 712 1.1 christos 713 1.1 christos HTML = (HTML == "") ? 0 : (0 + HTML) 714 1.1 christos 715 1.1 christos if (INDENT == "") 716 1.1 christos INDENT = 4 717 1.1 christos 718 1.1 christos if (HTML == 0) 719 1.1 christos INDENT = 0 # indentation suppressed in ASCII mode 720 1.1 christos 721 1.1 christos LEADERS = " . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ." 722 1.1 christos 723 1.1 christos MAX_TITLE_CHARS = 36 # 36 produces a 79-char output line when there is 724 1.1 christos # just an initial page number. If this is 725 1.1 christos # increased, the LEADERS string may need to be 726 1.1 christos # lengthened. 727 1.1 christos 728 1.1 christos MIN_LEADERS = 4 # Minimum number of characters from LEADERS 729 1.1 christos # required when leaders are used. The total 730 1.1 christos # number of characters that can appear in a 731 1.1 christos # title line is MAX_TITLE_CHARS + MIN_LEADERS. 732 1.1 christos # Leaders are omitted when the title length is 733 1.1 christos # between MAX_TITLE_CHARS and this sum. 734 1.1 christos 735 1.1 christos MIN_LEADERS_SPACE = " " # must be at least MIN_LEADERS characters long 736 1.1 christos 737 1.1 christos Month_expansion["jan"] = "January" 738 1.1 christos Month_expansion["feb"] = "February" 739 1.1 christos Month_expansion["mar"] = "March" 740 1.1 christos Month_expansion["apr"] = "April" 741 1.1 christos Month_expansion["may"] = "May" 742 1.1 christos Month_expansion["jun"] = "June" 743 1.1 christos Month_expansion["jul"] = "July" 744 1.1 christos Month_expansion["aug"] = "August" 745 1.1 christos Month_expansion["sep"] = "September" 746 1.1 christos Month_expansion["oct"] = "October" 747 1.1 christos Month_expansion["nov"] = "November" 748 1.1 christos Month_expansion["dec"] = "December" 749 1.1 christos 750 1.1 christos Font_cmd_map["\\emph"] = "EM" 751 1.1 christos Font_cmd_map["\\textbf"] = "B" 752 1.1 christos Font_cmd_map["\\textit"] = "I" 753 1.1 christos Font_cmd_map["\\textmd"] = "" 754 1.1 christos Font_cmd_map["\\textrm"] = "" 755 1.1 christos Font_cmd_map["\\textsc"] = "toupper" 756 1.1 christos Font_cmd_map["\\textsl"] = "I" 757 1.1 christos Font_cmd_map["\\texttt"] = "t" 758 1.1 christos Font_cmd_map["\\textup"] = "" 759 1.1 christos 760 1.1 christos Font_decl_map["\\bf"] = "B" 761 1.1 christos Font_decl_map["\\em"] = "EM" 762 1.1 christos Font_decl_map["\\it"] = "I" 763 1.1 christos Font_decl_map["\\rm"] = "" 764 1.1 christos Font_decl_map["\\sc"] = "toupper" 765 1.1 christos Font_decl_map["\\sf"] = "" 766 1.1 christos Font_decl_map["\\tt"] = "TT" 767 1.1 christos Font_decl_map["\\itshape"] = "I" 768 1.1 christos Font_decl_map["\\upshape"] = "" 769 1.1 christos Font_decl_map["\\slshape"] = "I" 770 1.1 christos Font_decl_map["\\scshape"] = "toupper" 771 1.1 christos Font_decl_map["\\mdseries"] = "" 772 1.1 christos Font_decl_map["\\bfseries"] = "B" 773 1.1 christos Font_decl_map["\\rmfamily"] = "" 774 1.1 christos Font_decl_map["\\sffamily"] = "" 775 1.1 christos Font_decl_map["\\ttfamily"] = "TT" 776 1.1 christos } 777 1.1 christos 778 1.1 christos function min(a,b) 779 1.1 christos { 780 1.1 christos return (a < b) ? a : b 781 1.1 christos } 782 1.1 christos 783 1.1 christos 784 1.1 christos function prefix(level) 785 1.1 christos { 786 1.1 christos # Return a prefix of up to 60 blanks 787 1.1 christos 788 1.1 christos if (In_PRE) 789 1.1 christos return ("") 790 1.1 christos else 791 1.1 christos return (substr(" ", \ 792 1.1 christos 1, INDENT * level)) 793 1.1 christos } 794 1.1 christos 795 1.1 christos 796 1.1 christos function print_line(line) 797 1.1 christos { 798 1.1 christos if (HTML) # must buffer in memory so that we can accumulate TOC 799 1.1 christos Body[++BodyLines] = line 800 1.1 christos else 801 1.1 christos print line 802 1.1 christos } 803 1.1 christos 804 1.1 christos 805 1.1 christos function print_toc_line(author,title,pages, extra,leaders,n,t) 806 1.1 christos { 807 1.1 christos # When we have a multiline title, the hypertext link goes only 808 1.1 christos # on the first line. A multiline hypertext link looks awful 809 1.1 christos # because of long underlines under the leading indentation. 810 1.1 christos 811 1.1 christos if (pages == "") # then no leaders needed in title lines other than last one 812 1.1 christos t = sprintf("%31s %s%s%s", author, Title_prefix, title, Title_suffix) 813 1.1 christos else # last title line, with page number 814 1.1 christos { 815 1.1 christos n = html_length(title) # potentially expensive 816 1.1 christos extra = n % 2 # extra space for aligned leader dots 817 1.1 christos if (n <= MAX_TITLE_CHARS) # then need leaders 818 1.1 christos leaders = substr(LEADERS, 1, MAX_TITLE_CHARS + MIN_LEADERS - extra - \ 819 1.1 christos min(MAX_TITLE_CHARS,n)) 820 1.1 christos else # title (almost) fills line, so no leaders 821 1.1 christos leaders = substr(MIN_LEADERS_SPACE,1, \ 822 1.1 christos (MAX_TITLE_CHARS + MIN_LEADERS - extra - n)) 823 1.1 christos t = sprintf("%31s %s%s%s%s%s %4s", \ 824 1.1 christos author, Title_prefix, title, Title_suffix, \ 825 1.1 christos (extra ? " " : ""), leaders, pages) 826 1.1 christos } 827 1.1 christos 828 1.1 christos Title_prefix = "" # forget any hypertext 829 1.1 christos Title_suffix = "" # link material 830 1.1 christos 831 1.1 christos # Efficency note: an earlier version accumulated the body in a 832 1.1 christos # single scalar like this: "Body = Body t". Profiling revealed 833 1.1 christos # this statement as the major hot spot, and the change to array 834 1.1 christos # storage made the program more than twice as fast. This 835 1.1 christos # suggests that awk might benefit from an optimization of 836 1.1 christos # "s = s t" that uses realloc() instead of malloc(). 837 1.1 christos if (HTML) 838 1.1 christos Body[++BodyLines] = t 839 1.1 christos else 840 1.1 christos print t 841 1.1 christos } 842 1.1 christos 843 1.1 christos 844 1.1 christos function protect_SGML_characters(s) 845 1.1 christos { 846 1.1 christos gsub(/&/,"\\&",s) # NB: this one MUST be first 847 1.1 christos gsub(/</,"\\<",s) 848 1.1 christos gsub(/>/,"\\>",s) 849 1.1 christos gsub(/\"/,"\\"",s) 850 1.1 christos return (s) 851 1.1 christos } 852 1.1 christos 853 1.1 christos 854 1.1 christos function strip_braces(s, k) 855 1.1 christos { # strip non-backslashed braces from s and return the result 856 1.1 christos 857 1.1 christos return (strip_char(strip_char(s,"{"),"}")) 858 1.1 christos } 859 1.1 christos 860 1.1 christos 861 1.1 christos function strip_char(s,c, k) 862 1.1 christos { # strip non-backslashed instances of c from s, and return the result 863 1.1 christos k = index(s,c) 864 1.1 christos if (k > 0) # then found the character 865 1.1 christos { 866 1.1 christos if (substr(s,k-1,1) != "\\") # then not backslashed char 867 1.1 christos s = substr(s,1,k-1) strip_char(substr(s,k+1),c) # so remove it (recursively) 868 1.1 christos else # preserve backslashed char 869 1.1 christos s = substr(s,1,k) strip_char(s,k+1,c) 870 1.1 christos } 871 1.1 christos return (s) 872 1.1 christos } 873 1.1 christos 874 1.1 christos 875 1.1 christos function strip_html(s) 876 1.1 christos { 877 1.1 christos gsub(/<\/?[^>]*>/,"",s) 878 1.1 christos return (s) 879 1.1 christos } 880 1.1 christos 881 1.1 christos 882 1.1 christos function terminate() 883 1.1 christos { 884 1.1 christos if (HTML) 885 1.1 christos { 886 1.1 christos html_end_pre() 887 1.1 christos 888 1.1 christos HTML = 0 # NB: stop line buffering 889 1.1 christos html_header() 890 1.1 christos html_toc() 891 1.1 christos html_body() 892 1.1 christos html_trailer() 893 1.1 christos } 894 1.1 christos } 895 1.1 christos 896 1.1 christos 897 1.1 christos function TeX_to_HTML(s, k,n,parts) 898 1.1 christos { 899 1.1 christos # First convert the four SGML reserved characters to SGML entities 900 1.1 christos if (HTML) 901 1.1 christos { 902 1.1 christos gsub(/>/, "\\>", s) 903 1.1 christos gsub(/</, "\\<", s) 904 1.1 christos gsub(/"/, "\\"", s) 905 1.1 christos } 906 1.1 christos 907 1.1 christos gsub(/[$][$]/,"$$",s) # change display math to triple dollars for split 908 1.1 christos n = split(s,parts,/[$]/)# split into non-math (odd) and math (even) parts 909 1.1 christos 910 1.1 christos s = "" 911 1.1 christos for (k = 1; k <= n; ++k) # unbrace non-math part, leaving math mode intact 912 1.1 christos s = s ((k > 1) ? "$" : "") \ 913 1.1 christos ((k % 2) ? strip_braces(TeX_to_HTML_nonmath(parts[k])) : \ 914 1.1 christos TeX_to_HTML_math(parts[k])) 915 1.1 christos 916 1.1 christos gsub(/[$][$][$]/,"$$",s) # restore display math 917 1.1 christos 918 1.1 christos return (s) 919 1.1 christos } 920 1.1 christos 921 1.1 christos 922 1.1 christos function TeX_to_HTML_math(s) 923 1.1 christos { 924 1.1 christos # Mostly a dummy for now, but HTML 3 could support some math translation 925 1.1 christos 926 1.1 christos gsub(/\\&/,"\\&",s) # reduce TeX ampersands to SGML entities 927 1.1 christos 928 1.1 christos return (s) 929 1.1 christos } 930 1.1 christos 931 1.1 christos 932 1.1 christos function TeX_to_HTML_nonmath(s) 933 1.1 christos { 934 1.1 christos if (index(s,"\\") > 0) # important optimization 935 1.1 christos { 936 1.1 christos gsub(/\\slash +/,"/",s) # replace TeX slashes with conventional ones 937 1.1 christos gsub(/ *\\emdash +/," --- ",s) # replace BibNet emdashes with conventional ones 938 1.1 christos gsub(/\\%/,"%",s) # reduce TeX percents to conventional ones 939 1.1 christos gsub(/\\[$]/,"$",s) # reduce TeX dollars to conventional ones 940 1.1 christos gsub(/\\#/,"#",s) # reduce TeX sharps to conventional ones 941 1.1 christos 942 1.1 christos if (HTML) # translate TeX markup to HTML 943 1.1 christos { 944 1.1 christos gsub(/\\&/,"\\&",s) # reduce TeX ampersands to SGML entities 945 1.1 christos s = html_accents(s) 946 1.1 christos s = html_fonts(s) 947 1.1 christos } 948 1.1 christos else # plain ASCII text output: discard all TeX markup 949 1.1 christos { 950 1.1 christos gsub(/\\\&/, "\\&", s) # reduce TeX ampersands to conventional ones 951 1.1 christos 952 1.1 christos gsub(/\\[a-z][a-z] +/,"",s) # remove TeX font changes 953 1.1 christos gsub(/\\[^A-Za-z]/,"",s) # remove remaining TeX control symbols 954 1.1 christos } 955 1.1 christos } 956 1.1 christos return (s) 957 1.1 christos } 958 1.1 christos 959 1.1 christos 960 1.1 christos function trim(s) 961 1.1 christos { 962 1.1 christos gsub(/^[ \t]+/,"",s) 963 1.1 christos gsub(/[ \t]+$/,"",s) 964 1.1 christos return (s) 965 1.1 christos } 966 1.1 christos 967 1.1 christos 968 1.1 christos function vol_no_month_year() 969 1.1 christos { 970 1.1 christos return ("Volume " wrap(Volume) ", Number " wrap(Number) ", " wrap(Month) ", " wrap(Year)) 971 1.1 christos } 972 1.1 christos 973 1.1 christos 974 1.1 christos function wrap(value) 975 1.1 christos { 976 1.1 christos return (HTML ? ("<STRONG>" value "</STRONG>") : value) 977 1.1 christos } 978