Version: 4.0.2
contents
HTML: Parsing Library
The html library provides
functions to read html documents and structures to represent them.
Reads (X)HTML from a port, producing an html instance.
Reads HTML from a port, producing an xexpr compatible with the
xml library (which defines content?).
1 Example
(module html-example scheme |
|
htmlxml |
|
|
(require (prefix-in h: html) |
(prefix-in x: xml)) |
|
(define an-html |
(h:read-xhtml |
(open-input-string |
(string-append |
"<html><head><title>My title</title></head><body>" |
"<p>Hello world</p><p><b>Testing</b>!</p>" |
"</body></html>")))) |
|
|
|
(define (extract-pcdata some-content) |
(cond [(x:pcdata? some-content) |
(list (x:pcdata-string some-content))] |
[(x:entity? some-content) |
(list)] |
[else |
(extract-pcdata-from-element some-content)])) |
|
|
|
(define (extract-pcdata-from-element an-html-element) |
(match an-html-element |
[(struct h:html-full (content)) |
(apply append (map extract-pcdata content))] |
|
[(struct h:html-element (attributes)) |
'()])) |
|
(printf "~s~n" (extract-pcdata an-html))) |
> (require 'html-example) |
("My title" "Hello world" "Testing" "!") |
2 HTML Structures
pcdata, entity, and attribute are defined
in the xml documentation.
A html-content is either
Any of the structures below inherits from html-element.
|
content : (listof html-content) |
Any html tag that may include content also inherits from
html-full without adding any additional fields.
A html is
(make-html (listof attribute) (listof Contents-of-html))
A Contents-of-html is either
A div is
(make-div (listof attribute) (listof G2))
A center is
(make-center (listof attribute) (listof G2))
A blockquote is
(make-blockquote (listof attribute) G2)
An Ins is
(make-ins (listof attribute) (listof G2))
A del is
(make-del (listof attribute) (listof G2))
A dd is
(make-dd (listof attribute) (listof G2))
A li is
(make-li (listof attribute) (listof G2))
A th is
(make-th (listof attribute) (listof G2))
A td is
(make-td (listof attribute) (listof G2))
An iframe is
(make-iframe (listof attribute) (listof G2))
A noframes is
(make-noframes (listof attribute) (listof G2))
A noscript is
(make-noscript (listof attribute) (listof G2))
A style is
(make-style (listof attribute) (listof pcdata))
A script is
(make-script (listof attribute) (listof pcdata))
A basefont is
(make-basefont (listof attribute))
A br is
(make-br (listof attribute))
An area is
(make-area (listof attribute))
A alink is
(make-alink (listof attribute))
An img is
(make-img (listof attribute))
A param is
(make-param (listof attribute))
A hr is
(make-hr (listof attribute))
An input is
(make-input (listof attribute))
A col is
(make-col (listof attribute))
An isindex is
(make-isindex (listof attribute))
A base is
(make-base (listof attribute))
A meta is
(make-meta (listof attribute))
An option is
(make-option (listof attribute) (listof pcdata))
A textarea is
(make-textarea (listof attribute) (listof pcdata))
A title is
(make-title (listof attribute) (listof pcdata))
A head is
(make-head (listof attribute) (listof Contents-of-head))
A Contents-of-head is either
A tr is
(make-tr (listof attribute) (listof Contents-of-tr))
A Contents-of-tr is either
A colgroup is
(make-colgroup (listof attribute) (listof col))
A thead is
(make-thead (listof attribute) (listof tr))
A tfoot is
(make-tfoot (listof attribute) (listof tr))
A tbody is
(make-tbody (listof attribute) (listof tr))
A tt is
(make-tt (listof attribute) (listof G5))
An i is
(make-i (listof attribute) (listof G5))
A b is
(make-b (listof attribute) (listof G5))
An u is
(make-u (listof attribute) (listof G5))
A s is
(make-s (listof attribute) (listof G5))
A strike is
(make-strike (listof attribute) (listof G5))
A big is
(make-big (listof attribute) (listof G5))
A small is
(make-small (listof attribute) (listof G5))
An em is
(make-em (listof attribute) (listof G5))
A strong is
(make-strong (listof attribute) (listof G5))
A dfn is
(make-dfn (listof attribute) (listof G5))
A code is
(make-code (listof attribute) (listof G5))
A samp is
(make-samp (listof attribute) (listof G5))
A kbd is
(make-kbd (listof attribute) (listof G5))
A var is
(make-var (listof attribute) (listof G5))
A cite is
(make-cite (listof attribute) (listof G5))
An abbr is
(make-abbr (listof attribute) (listof G5))
An acronym is
(make-acronym (listof attribute) (listof G5))
A sub is
(make-sub (listof attribute) (listof G5))
A sup is
(make-sup (listof attribute) (listof G5))
A span is
(make-span (listof attribute) (listof G5))
A bdo is
(make-bdo (listof attribute) (listof G5))
A font is
(make-font (listof attribute) (listof G5))
A p is
(make-p (listof attribute) (listof G5))
A h1 is
(make-h1 (listof attribute) (listof G5))
A h2 is
(make-h2 (listof attribute) (listof G5))
A h3 is
(make-h3 (listof attribute) (listof G5))
A h4 is
(make-h4 (listof attribute) (listof G5))
A h5 is
(make-h5 (listof attribute) (listof G5))
A h6 is
(make-h6 (listof attribute) (listof G5))
A q is
(make-q (listof attribute) (listof G5))
A dt is
(make-dt (listof attribute) (listof G5))
A legend is
(make-legend (listof attribute) (listof G5))
A caption is
(make-caption (listof attribute) (listof G5))
A table is
(make-table (listof attribute) (listof Contents-of-table))
A Contents-of-table is either
A button is
(make-button (listof attribute) (listof G4))
A fieldset is
(make-fieldset (listof attribute) (listof Contents-of-fieldset))
A Contents-of-fieldset is either
An optgroup is
(make-optgroup (listof attribute) (listof option))
A select is
(make-select (listof attribute) (listof Contents-of-select))
A Contents-of-select is either
A label is
(make-label (listof attribute) (listof G6))
A form is
(make-form (listof attribute) (listof G3))
An ol is
(make-ol (listof attribute) (listof li))
An ul is
(make-ul (listof attribute) (listof li))
A dir is
(make-dir (listof attribute) (listof li))
A menu is
(make-menu (listof attribute) (listof li))
A dl is
(make-dl (listof attribute) (listof Contents-of-dl))
A Contents-of-dl is either
A pre is
(make-pre (listof attribute) (listof Contents-of-pre))
A Contents-of-pre is either
An object is
(make-object (listof attribute) (listof Contents-of-object-applet))
An applet is
(make-applet (listof attribute) (listof Contents-of-object-applet))
A Contents-of-object-applet is either
A Map is
(make-map (listof attribute) (listof Contents-of-map))
A Contents-of-map is either
An a is
(make-a (listof attribute) (listof Contents-of-a))
A Contents-of-a is either
An address is
(make-address (listof attribute) (listof Contents-of-address))
A Contents-of-address is either
A body is
(make-body (listof attribute) (listof Contents-of-body))
A Contents-of-body is either
A G12 is either
A G11 is either
A G10 is either
A G9 is either
A G8 is either
A G7 is either
A G6 is either
A G5 is either
A G4 is either
A G3 is either
A G2 is either
contents