As mentioned previously, I've been downloading HTTP pages as part of my survey of Internet mail servers in order to detect domain parking behaviour. I should have thought a bit harder about that code though, because the implementation is a bit naive. Specifically, the code downloads the source of the web page (to RAM), and then base64 encodes it (to RAM), and finally writes it to the log file. That means that there is a little bit more than two copies of a given page's source in RAM before the operation is complete. However, it hadn't occurred to me that sites such as http://sixela.com/ would exist. That URL results in an endless stream of the word "blah". It took me three worker deaths before I had figured out what the problem was, mainly because when workers use to much RAM their slice is killed, and often the log files are lost.
So the moral of this tale? Don't trust the Internets.
Tags for this post: research(
posted at: 07:14 | path: /research | permanent link to this entry
There are 1 comments on this post, and 3 comments which didn't survive moderation. 0 were blocked by trained gerbils. Click here to see them.
