Are license tags common in web pages?

    Ben Bildstein talks about his attempts to determine if license tags are common on web pages. This seems like a perfect use of PlanetLab to me, where downloading a few million web pages and performing an analysis isn't hard. For example, I downloaded over a million web pages in a few days a little while ago.

    Ben's problem seems easier than the parking analysis though, as I presume that he doesn't need to actually store the downloaded pages. If a simple regexp check of the content is sufficient, then storage (which is the slow) bit goes away as an issue.

    Tags for this post: research(S)

posted at: 02:30 | path: /research | permanent link to this entry



    Add a comment to this post:

    Your name:

    Your email: Email me new comments on this post
      (Your email will not be published on this site, and will only be used to contact you directly with a reply to your comment if needed. Oh, and we'll use it to send you new comments on this post it you selected that checkbox.)


    Your website:

    Comments:


    Because of excessive load, this site is generated statically every several hours. Therefore, your comment may take some time to appear here. Unless you get an error message when you click the select button below, then all is normal and the comment will appear in due course. If you want you can checkout all recently approved comments while your waiting.