Paste: html extraction by example

Author: refaktor
Mode: rebol
Date: Fri, 27 May 2011 15:38:44
Plain Text |
-----------
USER MARKS WHAT VALUES TO EXTRACT
<html><title>TITLE</title><body><h2>{HEADING}</h2><p>{COMPANY}<p>{ADDRESS}</html>
-----------
A HTML PAGE TO EXTRACT FROM
<html><title>car providers</title><h2>Fast cars</h2><p>Honda</p><p>Japan</html>
-----------
LOADED TEMPLATE TREE
{[root [] [html [] [head [] [title [] [text [value "TITLE"]]]] [body [] [h2 [] [text [value "{HEADING}"]]] [p [] [text [value "{COMPANY}"]]] [p
[] [text [value "{ADDRESS}"]]]]]]}
-----------
LOADED SAMPLE TREE
{[root [] [html [] [head [] [title [] [text [value "car providers"]]]] [body [] [h2 [] [text [value "Fast cars"]]] [p [] [text [value "Honda"]]]
 [p [] [text [value "Japan"]]]]]]}
-----------
FIXED TEMPLATE HTML:
<html>
    <head>
        <title>TITLE</title>
        </head>
    <body>
        <h2>{HEADING}</h2>
        <p>{COMPANY}</p>
        <p>{ADDRESS}</p>
        </body>
    </html>

-----------
FIXED SAMLe HTML:
<html>
    <head>
        <title>car providers</title>
        </head>
    <body>
        <h2>Fast cars</h2>
        <p>Honda</p>
        <p>Japan</p>
        </body>
    </html>

-----------
BUILD PATH FROM TEMPLATE
[in "root" in "html" in "head" in "title" out _ out _ in "body" in "h2" found "HEADING" out _ in "p" found "COMPANY" out _ in "p" found "ADDRESS
" out _ out _ out _ out _]
-----------
OPTIMIZE PATH TO TEXT: HEADING
[in "root" in "html" in "head" out _ in "body" in "h2" found "HEADING"]
-----------
OPTIMIZE PATH TO TEXT: COMPANY
[in "root" in "html" in "head" out _ in "body" in "h2" found "HEADING" out _ in "p" found "COMPANY"]
-----------
OPTIMIZE PATH TO TEXT: ADDRESS
[in "root" in "html" in "head" out _ in "body" in "h2" found "HEADING" out _ in "p" found "COMPANY" out _ in "p" found "ADDRESS"]
-----------
EXTRACT VALUES FROM SAMPLE BASED ON TEMPLATE:
["HEADING" "Fast cars" "COMPANY" "Honda" "ADDRESS" "Honda"]
>>

New Annotation

Summary:
Author:
Mode:
Body: