Class RDig::UrlFilters::VisitedUrlFilter
In: lib/rdig/url_filters.rb
Parent: Object

takes care of a list of all Urls visited during a crawl, to avoid indexing pages more than once implemented as a thread safe singleton as it has to be shared between all crawler threads

Methods

apply   new  

Included Modules

MonitorMixin Singleton

Public Class methods

[Source]

    # File lib/rdig/url_filters.rb, line 69
69:       def initialize
70:         @visited_urls = Set.new
71:         super
72:       end

Public Instance methods

return document if this document’s url has not been visited yet, nil otherwise

[Source]

    # File lib/rdig/url_filters.rb, line 76
76:       def apply(document)
77:         synchronize do
78:           @visited_urls.add?(document.uri.to_s) ? document : nil 
79:         end
80:       end

[Validate]