Paste: extractor

Author: refaktor
Mode: rebol
Date: Wed, 29 Jun 2011 17:17:53
Plain Text |
-----------
BUILD PATH FROM TEMPLATE
[in "root" in "declaration" out _ in "html" in "head" in "title" out _ in "meta" out _ in "meta" out _ in "meta" out _ in "link" out _ in "base"
 out _ in "link" out _ out _ in "body" in "noscript" in "img" out _ out _ in "div" in "div" in "div" in "ul" in "a" out _ in "a" out _ out _ in
"ul" in "li" in "a" out _ in "div" in "ul" in "li" in "a" out _ out _ in "li" in "a" out _ out _ in "li" in "a" out _ out _ in "li" in "a" out _
 out _ in "li" in "a" out _ out _ out _ out _ out _ in "li" in "a" out _ in "div" in "ul" in "li" in "a" out _ out _ in "li" in "a" out _ out _
in "li" in "a" out _ out _ out _ out _ out _ in "li" in "a" out _ in "div" in "ul" in "li" in "a" out _ out _ in "li" in "a" out _ out _ in "li"
 in "a" out _ out _ in "li" in "a" out _ out _ in "li" in "a" out _ out _ in "li" in "a" out _ out _ out _ out _ out _ in "li" in "a" out _ in "
div" in "ul" in "li" in "a" in "span" out _ out _ out _ in "li" in "a" out _ out _ in "li" in "a" out _ out _ in "li" in "a" out _ out _ in "li"
 in "a" out _ out _ in "li" in "a" out _ out _ out _ out _ out _ in "li" in "a" out _ in "div" in "ul" out _ out _ out _ in "li" in "a" out _ in
 "div" in "ul" in "li" in "a" out _ out _ in "li" in "a" out _ out _ in "li" in "a" out _ out _ in "li" in "a" out _ out _ in "li" in "a" out _
out _ out _ out _ out _ out _ out _ in "div" in "a" out _ out _ out _ out _ in "div" in "div" in "div" in "ul" in "li" in "a" out _ out _ in "li
" in "a" out _ out _ in "li" in "a" out _ out _ out _ out _ in "a" out _ in "div" in "div" out _ in "div" in "form" in "div" in "input" out _ ou
t _ in "select" in "option" out _ out _ in "input" out _ in "input" out _ in "input" out _ in "input" out _ in "input" out _ in "input" out _ in
 "input" out _ in "input" out _ out _ out _ in "div" out _ out _ in "div" in "a" out _ out _ in "div" in "strong" out _ in "span" in "a" out _ o
ut _ in "a" out _ in "a" out _ out _ out _ in "div" in "div" in "span" out _ in "a" out _ in "span" out _ in "a" out _ out _ in "div" in "strong
" out _ in "span" in "a" out _ in "span" out _ in "a" out _ in "span" out _ in "a" out _ in "span" out _ in "a" out _ in "span" out _ in "a" out
 _ out _ out _ out _ in "div" in "div" in "div" in "div" in "a" out _ in "a" out _ in "a" out _ in "a" out _ in "a" out _ in "a" out _ in "span"
 out _ out _ out _ in "div" in "span" in "a" out _ out _ in "a" in "img" out _ out _ in "ul" in "li" in "a" out _ out _ in "li" in "a" out _ out
 _ in "li" in "a" out _ out _ in "li" in "a" out _ out _ in "li" in "a" out _ out _ in "li" in "a" out _ out _ in "li" in "a" out _ out _ in "li
" in "a" out _ out _ out _ out _ out _ in "div" in "div" in "div" in "h2" found "PRODUCT_TITLE" out _ in "div" out _ in "table" in "tbody" in "t
r" in "th" out _ in "td" in "a" out _ in "a" out _ out _ out _ in "tr" in "th" out _ in "td" out _ out _ in "tr" in "th" out _ in "td" in "div"
out _ out _ out _ in "tr" in "th" out _ in "td" out _ out _ in "tr" in "th" out _ in "td" out _ out _ in "tr" in "th" out _ in "td" in "div" in
"p" in "a" out _ out _ in "p" in "span" out _ out _ out _ out _ out _ out _ out _ in "div" in "div" in "div" in "a" out _ in "a" out _ in "a" ou
t _ out _ in "div" in "a" in "img" out _ out _ in "div" out _ out _ out _ out _ out _ out _ in "div" in "div" in "div" in "div" out _ in "a" out
 _ in "div" in "div" out _ in "span" out _ out _ in "div" out _ in "div" out _ in "a" out _ in "a" in "sup" out _ out _ in "a" out _ in "div" in
 "div" in "a" out _ out _ in "a" in "span" out _ out _ in "a" out _ out _ out _ out _ in "a" in "span" out _ out _ out _ in "div" in "a" in "img
" out _ out _ in "div" in "a" out _ out _ in "div" in "div" in "span" out _ in "span" out _ in "span" out _ out _ in "div" in "span" out _ in "s
pan" out _ in "span" out _ in "div" in "ul" in "li" out _ in "li" out _ out _ out _ out _ out _ out _ out _ in "div" in "div" in "span" out _ in
 "span" out _ out _ in "div" in "div" in "div" in "div" in "div" out _ out _ in "table" in "tbody" in "tr" in "td" in "span" out _ out _ in "td"
 in "span" out _ out _ in "td" in "span" out _ out _ out _ in "tr" in "td" in "span" out _ out _ in "td" in "span" out _ out _ out _ out _ out _
 out _ in "div" in "div" in "div" out _ out _ in "table" in "tbody" in "tr" in "td" in "span" out _ out _ in "td" out _ out _ in "tr" in "td" in
 "span" out _ out _ in "td" found "DELIVERY_DETAIL" out _ out _ out _ out _ in "div" in "div" out _ out _ in "p" in "br" out _ in "br" out _ in
"br" out _ out _ in "div" in "p" in "strong" out _ in "br" out _ in "br" out _ in "br" out _ in "br" out _ in "br" out _ in "br" out _ in "br" o
ut _ in "br" out _ in "br" out _ in "br" out _ in "br" out _ in "br" out _ out _ out _ out _ out _ in "div" in "div" in "br" out _ in "br" out _
 in "br" out _ in "br" out _ in "br" out _ in "a" out _ in "span" out _ in "br" out _ in "a" in "img" out _ out _ out _ in "div" in "h3" out _ i
n "table" in "tbody" in "tr" in "td" out _ in "td" out _ out _ in "tr" in "td" out _ in "td" found "WHAT_WE_SELL" out _ out _ in "tr" in "td" ou
t _ in "td" out _ out _ in "tr" in "td" out _ in "td" found "NUM_OF_EMPLOYEES" out _ out _ out _ out _ out _ in "div" in "h3" out _ in "table" i
n "tbody" in "tr" in "td" out _ in "td" in "br" out _ in "br" out _ in "br" out _ out _ out _ in "tr" in "td" out _ in "td" out _ out _ in "tr"
in "td" out _ in "td" out _ out _ out _ out _ out _ in "div" in "h3" out _ in "table" in "tbody" in "tr" in "td" out _ in "td" out _ out _ in "t
r" in "td" out _ in "td" out _ out _ in "tr" in "td" out _ in "td" out _ out _ in "tr" in "td" out _ in "td" out _ out _ in "tr" in "td" out _ i
n "td" out _ out _ in "tr" in "td" out _ in "td" out _ out _ in "tr" in "td" out _ in "td" out _ out _ out _ out _ out _ in "div" in "h3" out _
in "table" in "tbody" in "tr" in "td" in "a" out _ out _ out _ in "tr" in "td" in "span" out _ out _ out _ out _ out _ in "div" in "div" in "a"
out _ in "h3" out _ in "div" in "div" in "span" out _ in "input" out _ in "input" out _ in "input" out _ in "table" in "tbody" in "tr" in "th" i
n "span" out _ out _ in "td" in "div" in "input" out _ in "span" out _ in "br" out _ in "div" out _ out _ out _ out _ in "tr" in "th" in "span"
out _ out _ in "td" in "div" in "input" out _ in "span" out _ in "br" out _ in "div" out _ out _ in "a" in "font" out _ out _ out _ out _ in "tr
" in "th" in "span" out _ out _ in "td" in "div" in "input" out _ in "input" out _ in "a" in "img" out _ in "font" out _ out _ in "div" out _ ou
t _ out _ out _ in "tr" in "th" in "span" out _ out _ in "td" in "div" in "div" out _ out _ out _ out _ out _ out _ in "form" in "input" out _ i
n "input" out _ in "input" out _ in "input" out _ in "input" out _ in "textarea" out _ in "div" in "div" out _ out _ in "div" in "div" out _ out
 _ in "div" in "a" out _ out _ out _ out _ out _ out _ out _ out _ out _ out _ out _ in "div" in "div" in "div" out _ in "div" in "form" in "inp
ut" out _ in "input" out _ in "input" out _ in "input" out _ in "input" out _ in "input" out _ in "div" in "div" in "div" in "span" in "span" ou
t _ out _ in "input" out _ in "span" out _ in "div" in "div" out _ out _ out _ out _ in "div" in "div" in "span" in "span" out _ out _ in "input
" out _ in "span" in "a" out _ out _ in "div" in "div" out _ out _ out _ out _ in "div" in "div" in "span" in "span" out _ out _ in "input" out
_ in "input" out _ in "img" out _ in "a" in "font" out _ out _ in "div" in "div" out _ out _ out _ out _ in "div" in "div" out _ out _ out _ in
"span" out _ in "div" in "div" in "a" in "img" out _ out _ in "div" in "div" in "a" out _ in "a" out _ in "a" out _ out _ out _ out _ out _ in "
div" in "span" in "span" out _ out _ in "textarea" out _ in "div" in "div" out _ out _ out _ in "div" in "div" out _ out _ in "a" out _ out _ ou
t _ out _ out _ in "form" in "input" out _ in "input" out _ in "input" out _ in "input" out _ in "input" out _ out _ in "div" out _ in "input" o
ut _ in "div" in "div" out _ in "div" in "a" in "img" out _ out _ in "a" out _ in "div" in "span" out _ in "span" out _ out _ out _ in "div" in
"a" in "img" out _ out _ in "a" out _ in "div" in "span" out _ in "span" out _ out _ out _ in "div" in "a" in "img" out _ out _ in "a" out _ in
"div" out _ out _ in "a" out _ in "div" out _ out _ in "div" in "div" in "div" out _ in "div" in "div" in "div" in "div" in "a" in "img" out _ o
ut _ out _ in "div" in "div" in "a" out _ out _ in "div" out _ in "div" out _ out _ out _ in "div" in "div" in "a" in "img" out _ out _ out _ in
 "div" in "div" in "a" out _ out _ in "div" out _ in "div" out _ out _ out _ in "div" in "div" in "a" in "img" out _ out _ out _ in "div" in "di
v" in "a" out _ out _ in "div" out _ in "div" out _ out _ out _ in "div" out _ in "div" in "div" in "a" in "img" out _ out _ out _ in "div" in "
div" in "a" out _ out _ in "div" out _ in "div" out _ out _ out _ in "div" in "div" in "a" in "img" out _ out _ out _ in "div" in "div" in "a" o
ut _ out _ in "div" out _ in "div" out _ out _ out _ in "div" in "div" in "a" in "img" out _ out _ out _ in "div" in "div" in "a" out _ out _ in
 "div" out _ in "div" out _ out _ out _ in "div" out _ in "div" in "div" in "a" in "img" out _ out _ out _ in "div" in "div" in "a" out _ out _
in "div" out _ in "div" out _ out _ out _ in "div" in "div" in "a" in "img" out _ out _ out _ in "div" in "div" in "a" out _ out _ in "div" out
_ in "div" out _ out _ out _ in "div" in "div" in "a" in "img" out _ out _ out _ in "div" in "div" in "a" out _ out _ in "div" out _ in "div" ou
t _ out _ out _ in "div" out _ in "div" in "a" out _ out _ in "div" out _ out _ out _ out _ out _ in "div" out _ in "div" out _ in "div" out _ i
n "br" out _ in "div" in "form" in "div" in "span" out _ in "input" out _ in "input" out _ in "span" out _ out _ in "input" out _ in "input" out
 _ in "input" out _ in "input" out _ in "input" out _ in "input" out _ out _ in "div" in "strong" out _ in "span" in "a" out _ in "a" out _ in "
a" out _ in "a" out _ out _ out _ out _ in "div" in "img" out _ in "span" in "a" out _ out _ out _ out _ in "div" in "div" in "a" out _ in "h3"
out _ in "div" in "div" in "p" out _ in "input" out _ in "table" in "tbody" in "tr" in "th" in "span" out _ out _ in "td" in "div" in "input" ou
t _ in "span" out _ in "br" out _ in "div" out _ out _ out _ out _ in "tr" in "th" in "span" out _ out _ in "td" in "div" in "input" out _ in "s
pan" out _ in "br" out _ in "div" out _ out _ in "a" in "font" out _ out _ out _ out _ in "tr" in "th" in "span" out _ out _ in "td" in "div" in
 "input" out _ in "input" out _ in "img" out _ in "a" in "font" out _ out _ in "div" out _ out _ out _ out _ in "tr" in "th" in "span" out _ out
 _ in "td" in "div" in "div" out _ out _ out _ out _ out _ out _ in "form" in "input" out _ in "input" out _ in "input" out _ in "input" out _ i
n "input" out _ in "input" out _ in "textarea" out _ in "textarea" out _ in "div" in "div" out _ out _ in "div" in "div" out _ out _ in "span" i
n "a" out _ out _ out _ out _ in "div" out _ out _ in "div" out _ in "iframe" out _ out _ out _ in "form" in "input" out _ in "input" out _ in "
input" out _ in "input" out _ in "input" out _ out _ in "input" out _ in "input" out _ in "div" in "a" out _ out _ in "div" in "div" in "h4" in
"span" out _ in "a" out _ out _ in "div" out _ out _ out _ in "div" in "a" out _ in "a" out _ in "a" out _ out _ in "div" in "p" in "a" in "stro
ng" out _ out _ in "a" in "strong" out _ out _ in "a" in "strong" out _ out _ in "br" out _ in "a" out _ in "a" out _ in "a" out _ in "a" out _
in "a" out _ in "a" out _ in "a" out _ in "a" out _ out _ in "br" out _ in "p" in "a" out _ in "a" out _ in "a" out _ in "a" out _ in "a" out _
in "a" out _ in "a" out _ in "a" out _ in "a" out _ in "a" out _ out _ in "p" in "a" out _ in "a" out _ in "a" out _ in "a" out _ in "a" out _ i
n "a" out _ in "a" out _ in "a" out _ in "a" out _ in "a" out _ out _ in "p" in "a" out _ in "a" out _ in "a" out _ in "a" out _ out _ in "p" in
 "a" out _ in "span" out _ out _ out _ in "comment" out _ in "div" out _ out _ out _ out _ out _]
-----------
OPTIMIZE PATH TO TARGET:
[in "root" skip "declaration" in "html" skip "head" in "body" skip "noscript" skip "div" in "div" skip "div" skip "div" in "div" skip "div" skip
 "div" in "div" skip "div" in "div" skip "div" in "div" skip "div" in "div" skip "h3" in "table" in "tbody" skip "tr" in "tr" skip "td" in "td"
found "WHAT_WE_SELL"]

==========================================
SECOND LEVEL OF PATH OPTIMIZER
(optimizes path for more robust extraction, not speed)
[in "root" seek-into [tag "html" skip 0] seek-into [tag "body" skip 0] seek-into [tag "div" skip 1] seek-into [tag "div" skip 2] seek-into [tag
"div" skip 2] seek-into [tag "div" skip 1] seek-into [tag "div" skip 1] seek-into [tag "div" skip 1] seek-into [tag "table" skip 0] in "tbody" s
eek-into [tag "tr" skip 1] seek-into [tag "td" skip 1] found "WHAT_WE_SELL"]

###########################################
TEST ON THE SAMPLES
FINDING: **WHAT_WE_SELL**
-----------
A HTML PAGE TO EXTRACT FROM
-----------
LOADED SAMPLE TREE
-----------
FIXED SAMLe HTML:
-----------
EXTRACT VALUES FROM SAMPLE BASED ON PATH:
##ERROR: This path doesn't exist in this tree ! (1)
-----------
EXTRACT VALUES FROM SAMPLE BASED ON LVL2 PATH:
["WHAT_WE_SELL" {Inflatables,Lightbox,Handbag,Printed clothes,Advertising products,Cap,shoes  }]
==============
-----------
A HTML PAGE TO EXTRACT FROM
-----------
LOADED SAMPLE TREE
-----------
FIXED SAMLe HTML:
-----------
EXTRACT VALUES FROM SAMPLE BASED ON PATH:
##ERROR: This path doesn't exist in this tree ! (1)
-----------
EXTRACT VALUES FROM SAMPLE BASED ON LVL2 PATH:
["WHAT_WE_SELL" {Leather Racing Suit,Motorbike Leather Jacket,Biker Leather Gloves,Textile Motorbike Pants,Cordura Garments,Motocross Jacket,Sad
dle Bags,Motorcycle Boots,Leather Fashion Garments,Biker Chaps,Cordura Trousers,Textile Racing Jacket,Biker Jacket,Fashion Jacket,Racing Jacket,
Gothic Wears,Denim Garments,Motocross,Bavarian Garments,Fashion Wears  }]
==============
-----------
A HTML PAGE TO EXTRACT FROM
-----------
LOADED SAMPLE TREE
-----------
FIXED SAMLe HTML:
-----------
EXTRACT VALUES FROM SAMPLE BASED ON PATH:
##ERROR: This path doesn't exist in this tree ! (1)
-----------
EXTRACT VALUES FROM SAMPLE BASED ON LVL2 PATH:
["WHAT_WE_SELL" {Inflatables,Lightbox,Handbag,Printed clothes,Advertising products,Cap,shoes  }]
==============
-----------
A HTML PAGE TO EXTRACT FROM
-----------
LOADED SAMPLE TREE
-----------
FIXED SAMLe HTML:
-----------
EXTRACT VALUES FROM SAMPLE BASED ON PATH:
##ERROR: This path doesn't exist in this tree ! (1)
-----------
EXTRACT VALUES FROM SAMPLE BASED ON LVL2 PATH:
["WHAT_WE_SELL" {Inflatables,Lightbox,Handbag,Printed clothes,Advertising products,Cap,shoes  }]
==============
>>

New Annotation

Summary:
Author:
Mode:
Body: