Class RDig::ETagFilter
In: lib/rdig/crawler.rb
Parent: Object

checks fetched documents’ E-Tag headers against the list of E-Tags of the documents already indexed. This is supposed to help against double-indexing documents which can be reached via different URLs (think host.com/ and host.com/index.html ) Documents without ETag are allowed to pass through

Methods

apply   new  

Included Modules

MonitorMixin

Public Class methods

[Source]

    # File lib/rdig/crawler.rb, line 95
95:     def initialize
96:       @etags = Set.new
97:       super
98:     end

Public Instance methods

[Source]

     # File lib/rdig/crawler.rb, line 100
100:     def apply(document)
101:       return document unless (document.respond_to?(:etag) && document.etag)
102:       synchronize do
103:         @etags.add?(document.etag) ? document : nil 
104:       end
105:     end

[Validate]