----------- USER MARKS WHAT VALUES TO EXTRACT TITLE

{HEADING}

{COMPANY}

{ADDRESS} ----------- LOADED TEMPLATE TREE {[root [] [html [] [head [] [title [] [text [value "TITLE"]]]] [body [] [h2 [] [text [value "{HEADING}"]]] [p [] [text [value "{COMPANY}"]]] [p [] [text [value "{ADDRESS}"]]]]]]} ----------- FIXED TEMPLATE HTML: TITLE

{HEADING}

{COMPANY}

{ADDRESS}

----------- BUILD PATH FROM TEMPLATE root html head title text : TITLE body h2 text : {HEADING} p text : {COMPANY} p text : {ADDRESS} [in "root" in "html" in "head" in "title" out _ out _ in "body" in "h2" found "HEADING" out _ in "p" found "COMPANY" out _ in "p" found "ADDRESS " out _ out _ out _ out _] ----------- OPTIMIZE PATH TO TEXT: ADDRESS [in "root" in "html" in "head" out _ in "body" in "h2" out _ in "p" out _ in "p" found "ADDRESS"] ========================================== SECOND LEVEL OF PATH OPTIMIZER AND WALKER (optimizes path for more robust extraction, not speed) [in "root" in "html" seek-next 0 in "body" seek-next 1 in "p" found "ADDRESS"] ########################################### TEST ON THE SAMPLES ----------- A HTML PAGE TO EXTRACT FROM car providers

Fast cars

Honda

Japan ----------- LOADED SAMPLE TREE {[root [] [html [] [head [] [title [] [text [value "car providers"]]]] [body [] [h2 [] [text [value "Fast cars"]]] [p [] [text [value "Honda"]]] [p [] [text [value "Japan"]]]]]]} ----------- FIXED SAMLe HTML: car providers

Fast cars

Honda

Japan

----------- EXTRACT VALUES FROM SAMPLE BASED ON PATH: [in "root" in "html" in "head" out _ in "body" in "h2" out _ in "p" out _ in "p" found "ADDRESS"] ["ADDRESS" "Japan"] ADDRESS Japan ----------- EXTRACT VALUES FROM SAMPLE BASED ON LVL2 PATH: [in "root" in "html" seek-next 0 in "body" seek-next 1 in "p" found "ADDRESS"] ["ADDRESS" "Japan"] ============== ----------- A HTML PAGE TO EXTRACT FROM car providers
ALERT

Fast cars

Honda

Japan ----------- LOADED SAMPLE TREE {[root [] [html [] [head [] [title [] [text [value "car providers"]]]] [body [] [div [] [text [value "ALERT"]]] [h2 [] [text [value "Fast cars"] ]] [p [] [text [value "Honda"]]] [p [] [text [value "Japan"]]]]]]} ----------- FIXED SAMLe HTML: car providers

ALERT

Fast cars

Honda

Japan

----------- EXTRACT VALUES FROM SAMPLE BASED ON PATH: [in "root" in "html" in "head" out _ in "body" in "h2" out _ in "p" out _ in "p" found "ADDRESS"] ##ERROR: This path doesn't exist in this tree ! ----------- EXTRACT VALUES FROM SAMPLE BASED ON LVL2 PATH: [in "root" in "html" seek-next 0 in "body" seek-next 1 in "p" found "ADDRESS"] ["ADDRESS" "Japan"] ============== ----------- A HTML PAGE TO EXTRACT FROM car providers
ALERT

Fast cars

Honda

Japan

Additional

----------- LOADED SAMPLE TREE {[root [] [html [] [head [] [title [] [text [value "car providers"]]]] [body [] [div [] [text [value "ALERT"]]] [h2 [] [text [value "Fast cars"] ]] [p [] [text [value "Honda"]]] [p [] [text [value "Japan"]]] [p [] [text [value "Additional"]]]]]]} ----------- FIXED SAMLe HTML: car providers
ALERT

Fast cars

Honda

Japan

Additional

----------- EXTRACT VALUES FROM SAMPLE BASED ON PATH: [in "root" in "html" in "head" out _ in "body" in "h2" out _ in "p" out _ in "p" found "ADDRESS"] ##ERROR: This path doesn't exist in this tree ! ----------- EXTRACT VALUES FROM SAMPLE BASED ON LVL2 PATH: [in "root" in "html" seek-next 0 in "body" seek-next 1 in "p" found "ADDRESS"] ["ADDRESS" "Japan"] ============== ----------- A HTML PAGE TO EXTRACT FROM car providers

ALERT

Fast cars

some stuff

Honda

Japan

Additio nal

More add

----------- LOADED SAMPLE TREE {[root [] [html [] [head [] [title [] [text [value "car providers"]]]] [body [] [div [] [p [] [text [value "ALERT"]]]] [h2 [] [text [value "Fast cars"]]] [table []] [p [] [text [value "Honda"]]] [p [] [text [value "Japan"]]] [p [] [text [value "Additional"]]] [p [] [text [value "More add "]]]]]]} ----------- FIXED SAMLe HTML: car providers

ALERT

Fast cars

Honda

Japan

Additional

More add

----------- EXTRACT VALUES FROM SAMPLE BASED ON PATH: [in "root" in "html" in "head" out _ in "body" in "h2" out _ in "p" out _ in "p" found "ADDRESS"] ##ERROR: This path doesn't exist in this tree ! ----------- EXTRACT VALUES FROM SAMPLE BASED ON LVL2 PATH: [in "root" in "html" seek-next 0 in "body" seek-next 1 in "p" found "ADDRESS"] ["ADDRESS" "Japan"] ==============