The Internet is a strange place

    As mentioned previously, I've been downloading HTTP pages as part of my survey of Internet mail servers in order to detect domain parking behaviour. I should have thought a bit harder about that code though, because the implementation is a bit naive. Specifically, the code downloads the source of the web page (to RAM), and then base64 encodes it (to RAM), and finally writes it to the log file. That means that there is a little bit more than two copies of a given page's source in RAM before the operation is complete. However, it hadn't occurred to me that sites such as http://sixela.com/ would exist. That URL results in an endless stream of the word "blah". It took me three worker deaths before I had figured out what the problem was, mainly because when workers use to much RAM their slice is killed, and often the log files are lost.

    So the moral of this tale? Don't trust the Internets.

    Tags for this post: research(S)

posted at: 07:14 | path: /research | permanent link to this entry





    Michael Carden

    "...It took me three worker deaths ..."

    Here I was envisioning a cadre of Google Slave Interns that you'd had put to death as you whiled away the minutes pondering the problem.

    "Yep," I thought, "those rumours out of MS about Google chewing up recruits are all true."

    --
    MC




    Add a comment to this post:

    Your name:

    Your email: Email me new comments on this post
      (Your email will not be published on this site, and will only be used to contact you directly with a reply to your comment if needed. Oh, and we'll use it to send you new comments on this post it you selected that checkbox.)


    Your website:

    Comments:


    Because of excessive load, this site is generated statically every several hours. Therefore, your comment may take some time to appear here. Unless you get an error message when you click the select button below, then all is normal and the comment will appear in due course.