Content here is by:
Michael Still
mikal@stillhq.com

All my Open Source projects
Extracted view of CVS
Home
Site map

See recent comments. RSS feed of all comments. Raw dump of all comments for research purposes.

ImageMagick book

MythTV book








Thu, 10 Jul 2008



Dealing with remote HTTP servers with buggy chunking implementations

    HTTP 1.1 implements chunking as a way of servers telling clients how much content is left for a given request, which enables you to send more than one piece of content in a given HTTP connection. Unfortunately for me, the site I was trying to access has a buggy chunking implementation, and that causes the somewhat fragile python urllib2 code to throw an exception:

      Traceback (most recent call last):
        File "./mythingie.py", line 55, in ?
          xml = remote.readlines()
        File "/usr/lib/python2.4/socket.py", line 382, in readlines
          line = self.readline()
        File "/usr/lib/python2.4/socket.py", line 332, in readline
          data = self._sock.recv(self._rbufsize)
        File "/usr/lib/python2.4/httplib.py", line 460, in read
          return self._read_chunked(amt)
        File "/usr/lib/python2.4/httplib.py", line 499, in _read_chunked
          chunk_left = int(line, 16)
      ValueError: invalid literal for int(): 
      


    I muttered about this earlier today, including finding the bug tracking the problem in pythonistan. However, finding the will not fix bug wasn't satisfying enough...

    It turns out you can just have urllib2 lie to the server about what HTTP version it talks, and therefore turn off chunking. Here's my sample code for how to do that:

      import httplib
      import urllib2
      
      class HTTP10Connection(httplib.HTTPConnection):
        """HTTP10Connection -- a HTTP connection which is forced to ask for HTTP
           1.0
        """
      
        _http_vsn_str = 'HTTP/1.0'
      
      
      class HTTP10Handler(urllib2.HTTPHandler):
        """HTTP10Handler -- don't use HTTP 1.1"""
      
        def http_open(self, req):
          return self.do_open(HTTP10Connection, req)
      
      // ...
      
        request = urllib2.Request(feed)
        request.add_header('User-Agent', 'mythingie')
        opener = urllib2.build_opener(HTTP10Handler())
        
        remote = opener.open(request)
        content = remote.readlines()
        remote.close()
      


    I hereby declare myself Michael Still, bringer of the gross python hacks.

    Tags for this post: python(S)

posted at: 22:27 | path: /python | permanent link to this entry


Blathering for Thursday, 10 July 2008

    15:30: Mikal shared: Iran hacks world media with Photoshop [Great Moments In Journalism]
      Its interesting to see that photoshopping news photos for effect isn't just reserved for Hollywood articles I don't care about. Its good to see people doing it get caught.

    18:45: Mikal shared: Petrol prices are going up: Yay!
      I wonder what percentage of Australian freight uses the rail network, and if rail is cost effective per tonne compared with trucking. Perhaps its finally time to spend some money on a more modern rail network between the major cities?

    22:00: Mikal shared: Issue 1205: urllib fail to read URL contents, urllib2 crash Python - Python tracker
      I just hit this bug with a python app that is trying to read a new site in Australia. Its quite annoying, and seems to be because of a server bug. Then again, it seems like urllib2 is quite vulnerable to remote servers causing it to throw an exception. Annoying.



    Tags for this post: blather(S)

posted at: 22:00 | path: /blather | permanent link to this entry


The Stainless Steel Rat Sings The Blues




    ISBN: 0533405012
    LibraryThing
    The underlying premise of this book is weak (a criminal forced into a band in order to find a stolen item), but like I've said in the past the Stainless Steel Rat books are fun, and not really intended to make you a better person. This one is along those lines too -- its an enjoyable light read, with a much better plot twist than the other Stainless Steel Rat books I've read. I liked it, even with the weak premise.

    Tags for this post: book(S) Harry_Harrison(S)


posted at: 05:16 | path: /book/Harry_Harrison | permanent link to this entry