----------- USER MARKS WHAT VALUES TO EXTRACT
{COMPANY}
{ADDRESS} ----------- LOADED TEMPLATE TREE {[root [] [html [] [head [] [title [] [text [value "TITLE"]]]] [body [] [h2 [] [text [value "{HEADING}"]]] [p [] [text [value "{COMPANY}"]]] [p [] [text [value "{ADDRESS}"]]]]]]} ----------- FIXED TEMPLATE HTML:
{COMPANY}
{ADDRESS}
----------- BUILD PATH FROM TEMPLATE root html head title text : TITLE body h2 text : {HEADING} p text : {COMPANY} p text : {ADDRESS} [in "root" in "html" in "head" in "title" out _ out _ in "body" in "h2" found "HEADING" out _ in "p" found "COMPANY" out _ in "p" found "ADDRESS " out _ out _ out _ out _] ----------- OPTIMIZE PATH TO TEXT: ADDRESS [in "root" in "html" in "head" out _ in "body" in "h2" out _ in "p" out _ in "p" found "ADDRESS"] ========================================== SECOND LEVEL OF PATH OPTIMIZER AND WALKER (optimizes path for more robust extraction, not speed) [in "root" in "html" seek-next 0 in "body" seek-next 1 in "p" found "ADDRESS"] ########################################### TEST ON THE SAMPLES ----------- A HTML PAGE TO EXTRACT FROMHonda
Japan ----------- LOADED SAMPLE TREE {[root [] [html [] [head [] [title [] [text [value "car providers"]]]] [body [] [h2 [] [text [value "Fast cars"]]] [p [] [text [value "Honda"]]] [p [] [text [value "Japan"]]]]]]} ----------- FIXED SAMLe HTML:
Honda
Japan
----------- EXTRACT VALUES FROM SAMPLE BASED ON PATH: [in "root" in "html" in "head" out _ in "body" in "h2" out _ in "p" out _ in "p" found "ADDRESS"] ["ADDRESS" "Japan"] ADDRESS Japan ----------- EXTRACT VALUES FROM SAMPLE BASED ON LVL2 PATH: [in "root" in "html" seek-next 0 in "body" seek-next 1 in "p" found "ADDRESS"] ["ADDRESS" "Japan"] ============== ----------- A HTML PAGE TO EXTRACT FROMHonda
Japan ----------- LOADED SAMPLE TREE {[root [] [html [] [head [] [title [] [text [value "car providers"]]]] [body [] [div [] [text [value "ALERT"]]] [h2 [] [text [value "Fast cars"] ]] [p [] [text [value "Honda"]]] [p [] [text [value "Japan"]]]]]]} ----------- FIXED SAMLe HTML:
Honda
Japan
----------- EXTRACT VALUES FROM SAMPLE BASED ON PATH: [in "root" in "html" in "head" out _ in "body" in "h2" out _ in "p" out _ in "p" found "ADDRESS"] ##ERROR: This path doesn't exist in this tree ! ----------- EXTRACT VALUES FROM SAMPLE BASED ON LVL2 PATH: [in "root" in "html" seek-next 0 in "body" seek-next 1 in "p" found "ADDRESS"] ["ADDRESS" "Japan"] ============== ----------- A HTML PAGE TO EXTRACT FROMHonda
Japan
Additional
----------- LOADED SAMPLE TREE {[root [] [html [] [head [] [title [] [text [value "car providers"]]]] [body [] [div [] [text [value "ALERT"]]] [h2 [] [text [value "Fast cars"] ]] [p [] [text [value "Honda"]]] [p [] [text [value "Japan"]]] [p [] [text [value "Additional"]]]]]]} ----------- FIXED SAMLe HTML:Honda
Japan
Additional
----------- EXTRACT VALUES FROM SAMPLE BASED ON PATH: [in "root" in "html" in "head" out _ in "body" in "h2" out _ in "p" out _ in "p" found "ADDRESS"] ##ERROR: This path doesn't exist in this tree ! ----------- EXTRACT VALUES FROM SAMPLE BASED ON LVL2 PATH: [in "root" in "html" seek-next 0 in "body" seek-next 1 in "p" found "ADDRESS"] ["ADDRESS" "Japan"] ============== ----------- A HTML PAGE TO EXTRACT FROMALERT
Honda
Japan
Additio nal
More add
----------- LOADED SAMPLE TREE {[root [] [html [] [head [] [title [] [text [value "car providers"]]]] [body [] [div [] [p [] [text [value "ALERT"]]]] [h2 [] [text [value "Fast cars"]]] [table []] [p [] [text [value "Honda"]]] [p [] [text [value "Japan"]]] [p [] [text [value "Additional"]]] [p [] [text [value "More add "]]]]]]} ----------- FIXED SAMLe HTML:ALERT
Honda
Japan
Additional
More add
----------- EXTRACT VALUES FROM SAMPLE BASED ON PATH: [in "root" in "html" in "head" out _ in "body" in "h2" out _ in "p" out _ in "p" found "ADDRESS"] ##ERROR: This path doesn't exist in this tree ! ----------- EXTRACT VALUES FROM SAMPLE BASED ON LVL2 PATH: [in "root" in "html" seek-next 0 in "body" seek-next 1 in "p" found "ADDRESS"] ["ADDRESS" "Japan"] ==============