nanpa.sed revision 1.2
11.2Swiz# $NetBSD: nanpa.sed,v 1.2 2006/12/25 18:39:48 wiz Exp $ 21.1Sjhawk# 31.1Sjhawk# Parse HTML tables output by 41.1Sjhawk# http://docs.nanpa.com/cgi-bin/npa_reports/nanpa 51.1Sjhawk# Specifically, for each html table row (TR), 61.2Swiz# print the <TD> elements separated by colons. 71.1Sjhawk# 81.1Sjhawk# This could break on HTML comments. 91.1Sjhawk# 101.1Sjhawk:top 111.1Sjhawk# Strip ^Ms 121.1Sjhawks/ 131.1Sjhawk//g 141.1Sjhawk# Join all lines with unterminated HTML tags 151.1Sjhawk/<[^>]*$/{ 161.1Sjhawk N 171.1Sjhawk b top 181.1Sjhawk} 191.1Sjhawk# Replace all </TR> with EOL tag 201.1Sjhawks;</[Tt][Rr]>;$;g 211.1Sjhawk# Join lines with only <TR>. 221.1Sjhawk/<[Tt][Rr][^>]*>$/{ 231.1Sjhawk N 241.1Sjhawk s/\n//g 251.1Sjhawk b top 261.1Sjhawk} 271.1Sjhawk# Also, join all lines starting with <TR>. 281.1Sjhawk/<[TtRr][^>]*>[^$]*$/{ 291.1Sjhawk N 301.1Sjhawk s/\n//g 311.1Sjhawk b top 321.1Sjhawk} 331.1Sjhawk# Remove EOL markers 341.1Sjhawks/\$$// 351.1Sjhawk# Remove lines not starting with <TR> 361.1Sjhawk/<[Tt][Rr][^>]*>/!d 371.1Sjhawk# Replace all <TD> with colon 381.1Sjhawks/[ ]*<TD[^>]*> */:/g 391.1Sjhawk# Strip all HTML tags 401.1Sjhawks/<[^>]*>//g 411.1Sjhawk# Handle HTML characters 421.1Sjhawks/ / /g 431.1Sjhawk# Compress spaces/tabs 441.1Sjhawks/[ ][ ]*/ /g 451.1Sjhawk# Strip leading colons 461.1Sjhawks/^:// 471.1Sjhawk# Strip leading/trailing whitespace 481.1Sjhawks/^ // 49s/ $// 50