How to create a website spider1 min read ruby
Creating a spider that generates a list of URLs for a given domain is very easy to do.
Spider simply visits each page in a domain, finds all the links and visits those.
With help from a gem called Spidr we can achieve that with few lines of code.
require 'spidr' urls =  Spidr.site('http://www.layer22.com/') do |spider| spider.every_url do |url| urls << url end end
Fetching page content is also easy:
require 'spidr' contents =  Spidr.site('http://www.layer22.com/') do |spider| spider.every_page do |page| contents << page.body end end
What if we are only interested in paragraphs?
require 'spidr' paragraphs =  Spidr.site('http://www.layer22.com/') do |spider| spider.every_page do |page| next unless page.content_type =~ %r(text/html) paragraphs << page.doc.search('p').map(&:text) end end
Last modified: 27-Aug-23