Paste: html extraction by example
Author: | refaktor |
Mode: | rebol |
Date: | Fri, 27 May 2011 15:38:44 |
Plain Text |
-----------
USER MARKS WHAT VALUES TO EXTRACT
<html><title>TITLE</title><body><h2>{HEADING}</h2><p>{COMPANY}<p>{ADDRESS}</html>
-----------
A HTML PAGE TO EXTRACT FROM
<html><title>car providers</title><h2>Fast cars</h2><p>Honda</p><p>Japan</html>
-----------
LOADED TEMPLATE TREE
{[root [] [html [] [head [] [title [] [text [value "TITLE"]]]] [body [] [h2 [] [text [value "{HEADING}"]]] [p [] [text [value "{COMPANY}"]]] [p
[] [text [value "{ADDRESS}"]]]]]]}
-----------
LOADED SAMPLE TREE
{[root [] [html [] [head [] [title [] [text [value "car providers"]]]] [body [] [h2 [] [text [value "Fast cars"]]] [p [] [text [value "Honda"]]]
[p [] [text [value "Japan"]]]]]]}
-----------
FIXED TEMPLATE HTML:
<html>
<head>
<title>TITLE</title>
</head>
<body>
<h2>{HEADING}</h2>
<p>{COMPANY}</p>
<p>{ADDRESS}</p>
</body>
</html>
-----------
FIXED SAMLe HTML:
<html>
<head>
<title>car providers</title>
</head>
<body>
<h2>Fast cars</h2>
<p>Honda</p>
<p>Japan</p>
</body>
</html>
-----------
BUILD PATH FROM TEMPLATE
[in "root" in "html" in "head" in "title" out _ out _ in "body" in "h2" found "HEADING" out _ in "p" found "COMPANY" out _ in "p" found "ADDRESS
" out _ out _ out _ out _]
-----------
OPTIMIZE PATH TO TEXT: HEADING
[in "root" in "html" in "head" out _ in "body" in "h2" found "HEADING"]
-----------
OPTIMIZE PATH TO TEXT: COMPANY
[in "root" in "html" in "head" out _ in "body" in "h2" found "HEADING" out _ in "p" found "COMPANY"]
-----------
OPTIMIZE PATH TO TEXT: ADDRESS
[in "root" in "html" in "head" out _ in "body" in "h2" found "HEADING" out _ in "p" found "COMPANY" out _ in "p" found "ADDRESS"]
-----------
EXTRACT VALUES FROM SAMPLE BASED ON TEMPLATE:
["HEADING" "Fast cars" "COMPANY" "Honda" "ADDRESS" "Honda"]
>>
New Annotation