|Dealing with remote HTTP servers with buggy chunking implementations|
HTTP 1.1 implements chunking as a way of servers telling clients how much content is left for a given request, which enables you to send more than one piece of content in a given HTTP connection. Unfortunately for me, the site I was trying to access has a buggy chunking implementation, and that causes the somewhat fragile python urllib2 code to throw an exception:
Traceback (most recent call last): File "./mythingie.py", line 55, in ? xml = remote.readlines() File "/usr/lib/python2.4/socket.py", line 382, in readlines line = self.readline() File "/usr/lib/python2.4/socket.py", line 332, in readline data = self._sock.recv(self._rbufsize) File "/usr/lib/python2.4/httplib.py", line 460, in read return self._read_chunked(amt) File "/usr/lib/python2.4/httplib.py", line 499, in _read_chunked chunk_left = int(line, 16) ValueError: invalid literal for int():
I muttered about this earlier today, including finding the bug tracking the problem in pythonistan. However, finding the will not fix bug wasn't satisfying enough...
It turns out you can just have urllib2 lie to the server about what HTTP version it talks, and therefore turn off chunking. Here's my sample code for how to do that:
import httplib import urllib2 class HTTP10Connection(httplib.HTTPConnection): """HTTP10Connection -- a HTTP connection which is forced to ask for HTTP 1.0 """ _http_vsn_str = 'HTTP/1.0' class HTTP10Handler(urllib2.HTTPHandler): """HTTP10Handler -- don't use HTTP 1.1""" def http_open(self, req): return self.do_open(HTTP10Connection, req) // ... request = urllib2.Request(feed) request.add_header('User-Agent', 'mythingie') opener = urllib2.build_opener(HTTP10Handler()) remote = opener.open(request) content = remote.readlines() remote.close()
I hereby declare myself Michael Still, bringer of the gross python hacks.
Tags for this post: python urllib2 buggy chunking
Related posts: On syncing with Google Contacts; Twisted Python and Jabber SSL; Twisted conch; Learning Python; SSL, X509, ASN.1 and certificate validity dates; Python DNS modules
posted at: 22:27 | path: /python | permanent link to this entry