Scrape and follow Tutorial
This tutorial will show you how to use Visual Pipe-Fu to grab the source of a web page, parse it for
a link, follow the link and grab the source of that link.
Start by clicking this       icon in the toolbar to
add a webscrape tool to the Tool Chain.  
Next type 'http://www.google.com' in the URL
entry field.  Hit the run button, the screen
should look like figure 1.  The text in the
lower right of the screen is the source for
Google's home page.  

Note: the http in the text is not redundant
and if you omit it, you will get a connection
error.  The http differentiates the URL from
one describing an FTP site, an intra-net site,
a hard drive location, or any other phylum of
cyberspace.
Figure 1
Click the      icon to add a match tool to the
Tool Chain.  Now, to get that URL we need
to match a pattern like
'http://
something...something....'  Set the
'Regular Expression' check box as in
figure 2 and set the 'Return match as:' to
'Expression Only'.  Regular expressions
are a kind of pattern matching argot that
can identify very simple to very
complicated strings.  Even a cursory
explanation of regular expressions is
beyond the scope of this tutorial, but we'll
dissect the one used here which is:
http://.+?\s
The http:// is the beginning of a web page.  
The .+ means match any combination of
one or more characters.  It's kind of like
the * wild card character used file
searches and such except that there must
be at least one character between the two
end points to match.  The second end
point to match is described by the \s which
means any whitespace character; ie a
space or TAB.  That '?' means make the .+
non-greedy which means return the first
match, not the biggest match.

Hit the run button, you should see the first
address in the page in the lower right text
panel.
Figure 2
You may have noticed a quote symbol at
the end of the extracted URL in figure 2.  
That could be a problem.  Someone really
good at regular expressions could
probably fix that with a fancier reg ex, but
there is another, more straightforward way
to solve it with veefu.  We can translate
the " into a space.  Hit the      icon to add a
translate tool to the Tool Chain.  The
screen should look like figure 3.  Put a " in
the 'Replace' column of the first row and a
space character in the 'With' column.  Hit
the run button.  You should see the
address as it appears in the lower right
text panel without the quote.
Figure 3
Figure 4
Now all we need to do is put that address
back into a web tool.  Click the      icon, but
this time we're not entering an explicit
address, we need one passed in from the
previous tool in the chain.  Set the "Pipe
To:" radio button to Parameters as in
figure 4.  Setting that switch makes an
array available that is accessed with the &
character.  Put &1 in the text field under
"URL" to tell veefu to put the first string
passed in by standard input in this field
before the tool runs.  Hit the run button
and compare the text to that in the lower
right text panel.  The "<title>Google Image
Search" text in that panel seems to
correspond with the "imghp" in the
address but you can open the link directly
in a browser to verify for yourself.
And that's all there is to it.  You can of
course make bigger, better, more
sophisticated scripts to do bigger, better
more sophisticated things by adding
more tools, capturing bigger pages and
using more sophisticated reg
expressions.  
Veefu Home

Basics
Table Maker
Reference
Tools
Command Line

Download Veefu

Other Producs
Download
Now
Download Page