Are license tags common in web pages?

    Ben Bildstein talks about his attempts to determine if license tags are common on web pages. This seems like a perfect use of PlanetLab to me, where downloading a few million web pages and performing an analysis isn't hard. For example, I downloaded over a million web pages in a few days a little while ago.

    Ben's problem seems easier than the parking analysis though, as I presume that he doesn't need to actually store the downloaded pages. If a simple regexp check of the content is sufficient, then storage (which is the slow) bit goes away as an issue.

posted at: 02:30 | path: /research | permanent link to this entry

    Add a comment to this post:

    Your name:

    Your email: Email me new comments on this post
      (Your email will not be published on this site, and will only be used to contact you directly with a reply to your comment if needed. Oh, and we'll use it to send you new comments on this post it you selected that checkbox.)


    Your website:

    Comments: