stillhq.com : Mikal, a geek from Canberra living in Silicon Valley http://www.stillhq.com The life, times, travel and software of Michael Still en Copyright (c) Michael Still 2000 - 2006 blosxom simplerss20 v20050208hh 180 http://blogs.law.harvard.edu/tech/rss Python effective TLD library bug fix /python/etld Tue, 24 Nov 2009 13:57:00 PST Some cool people commented on bugs in the etld library in <a href="http://www.stillhq.com/python/etld/000002.html">the previous post about it</a>. I've taken the opportunity to fix the bug, and a new release is now available at <a href="http://www.stillhq.com/python/etld/etld.py">http://www.stillhq.com/python/etld/etld.py</a>. If you've got specific examples of domains which either didn't work previously, or don't work now, let me know. I want to add unit tests to this code ASAP. <br/><br/><i>Tags for this post: python(<a href="http://www.stillhq.com/python"><img src="http://www.stillhq.com/tagicon.cgi?post=/python/etld/000003&tag=python&format=.png" border="0" alt="S"></a>) etld(<a href="http://www.stillhq.com/etld"><img src="http://www.stillhq.com/tagicon.cgi?post=/python/etld/000003&tag=etld&format=.png" border="0" alt="S"></a>) </i><br/><i>Related posts: <a href="http://www.stillhq.com/python/etld/000002.html">Python effective TLD library update</a>; <a href="http://www.stillhq.com/python/etld/000001.html">Python effective TLD library</a></i> <a href="http://www.stillhq.com/python/etld/000003.commentform.html">Comment</a> http://www.stillhq.com/python/etld/000003.html http://www.stillhq.com/python/etld/000003.html Python effective TLD library update /python/etld Sun, 01 Nov 2009 09:45:00 PST The effective TLD library is now being used for a couple of projects of mine, but I've had some troubles with it being almost unusable slow. I ended up waking up this morning with the revelation that the problem is that I use regexps to match domain names, but the failure of a match occurs at the end of a string. That means that the FSA has to scan the entire string before it gets to decide that it isn't a match. That's expensive. <br/><br/> I ran some tests on tweaks to try and fix this. Without any changes, scanning 1,000 semi-random domain names took 6.941666 seconds. I then tweaked the implementation to reverse the strings it was scanning, and that halved the run time of the test to 3.212203 seconds. That's a big improvement, but still way too slow. The next thing I tried was then adding buckets of rules on top of those reverse matches.... In other words, the code now assumes that anything after the last dot is some for of TLD approximation, and only executes rules which also have that string after the last dot. This was a massive improvement, with 1,000 domains taking only 0.026120 seconds. <br/><br/> I've updated the code at <a href="http://www.stillhq.com/python/etld/etld.py">http://www.stillhq.com/python/etld/etld.py</a>. <br/><br/><i>Tags for this post: python(<a href="http://www.stillhq.com/python"><img src="http://www.stillhq.com/tagicon.cgi?post=/python/etld/000002&tag=python&format=.png" border="0" alt="S"></a>) etld(<a href="http://www.stillhq.com/etld"><img src="http://www.stillhq.com/tagicon.cgi?post=/python/etld/000002&tag=etld&format=.png" border="0" alt="S"></a>) </i><br/><i>Related posts: <a href="http://www.stillhq.com/python/etld/000003.html">Python effective TLD library bug fix</a>; <a href="http://www.stillhq.com/python/etld/000001.html">Python effective TLD library</a>; <a href="http://www.stillhq.com/diary/001005.html">Apple's Safari javascript implementation</a>; <a href="http://www.stillhq.com/diary/toys/000038.html">Thinkpad x41 tablet PCMCIA IO</a>; <a href="http://www.stillhq.com/mysql/000004.html">MySQL Tech Talks</a>; <a href="http://www.stillhq.com/mysql/000009.html">Is there any way to access the match text in MySQL rlike selects?</a></i> <a href="http://www.stillhq.com/python/etld/000002.commentform.html">Comment</a> http://www.stillhq.com/python/etld/000002.html http://www.stillhq.com/python/etld/000002.html Python effective TLD library /python/etld Mon, 26 Oct 2009 06:42:00 PST I had a need recently for a library which would take a host name and return the domain-specific portion of the name, and the effective TLD being used. "Effective TLD" is a term coined by the Mozilla project for something which acts like a TLD. For example, .com is a TLD and has domains allocated under it. However, .au is a TLD with no domains under it. The effective TLDs for the .au domain are things like .com.au and .edu.au. Whilst there are libraries for other languages, I couldn't find anything for python. <br/><br/> I therefore wrote one. Its very simple, and not optimal. For example, I could do most of the processing with a single regexp if python supported more than 100 match groups in a regexp, but it doesn't. I'm sure I'll end up revisiting this code sometime in the future. Additionally, the code ended up being much easier to write than I expected, mainly because the Mozilla project has gone to the trouble of building a list of rules to determine the effective TLD of a host name. This is awesome, because it saved me heaps and heaps of work. <br/><br/> The code is at <a href="http://www.stillhq.com/python/etld/etld.py">http://www.stillhq.com/python/etld/etld.py</a> if you're interested. <br/><br/><i>Tags for this post: python(<a href="http://www.stillhq.com/python"><img src="http://www.stillhq.com/tagicon.cgi?post=/python/etld/000001&tag=python&format=.png" border="0" alt="S"></a>) etld(<a href="http://www.stillhq.com/etld"><img src="http://www.stillhq.com/tagicon.cgi?post=/python/etld/000001&tag=etld&format=.png" border="0" alt="S"></a>) </i><br/><i>Related posts: <a href="http://www.stillhq.com/python/etld/000003.html">Python effective TLD library bug fix</a>; <a href="http://www.stillhq.com/python/etld/000002.html">Python effective TLD library update</a></i> <a href="http://www.stillhq.com/python/etld/000001.commentform.html">Comment</a> http://www.stillhq.com/python/etld/000001.html http://www.stillhq.com/python/etld/000001.html Calculating a SSH host key with paramiko /python/paramiko Mon, 05 Jan 2009 16:28:00 PST I needed to compare a host key from something other than a known_hosts file with what paramiko reports as part of the SSH connection today. If you must know, the host keys for these machines are retrieved a XMLRPC API... It turned out to be a lot easier than I thought. Here's how I produced the host key entry as it appears in that API (as well as in the known_hosts file): <br/><br/> <ul><pre> #!/usr/bin/python # A host key calculation example for Paramiko. # Args: # 1: hostname import base64 import os import paramiko import socket import sys # Socket connection to remote host sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.connect((sys.argv[1], 22)) # Build a SSH transport t = paramiko.Transport(sock) t.start_client() key = t.get_remote_server_key() print '%s %s' %(key.get_name(), base64.encodestring(key.__str__()).replace('\n', '')) t.close() sock.close() </pre></ul> <br/><br/> Note that I could also have constructed a paramiko key object based on the output of the XMLRPC API and then compared those two objects, but I prefer the human readable strings. <br/><br/><i>Tags for this post: python(<a href="http://www.stillhq.com/python"><img src="http://www.stillhq.com/tagicon.cgi?post=/python/paramiko/000005&tag=python&format=.png" border="0" alt="S"></a>) paramiko(<a href="http://www.stillhq.com/paramiko"><img src="http://www.stillhq.com/tagicon.cgi?post=/python/paramiko/000005&tag=paramiko&format=.png" border="0" alt="S"></a>) </i><br/><i>Related posts: <a href="http://www.stillhq.com/research/000005.html">Dear Lazyweb: how do I check SSL keys for vulnerability?</a></i> <a href="http://www.stillhq.com/python/paramiko/000005.commentform.html">Comment</a> http://www.stillhq.com/python/paramiko/000005.html http://www.stillhq.com/python/paramiko/000005.html Killing a blocking thread in python? /python Wed, 10 Dec 2008 14:03:00 PST It seems that there is no way of killing a blocking thread in python? The standard way of implementing thread death seems to be to implement an exit() method on the class which is the thread, and then call that when you want the thread to die. However, if the run() method of the thread class is blocking when you call exit(), then the thread doesn't get killed. I can't find a way of killing these threads cleanly on Linux -- does anyone have any hints? <br/><br/><i>Tags for this post: python(<a href="http://www.stillhq.com/python"><img src="http://www.stillhq.com/tagicon.cgi?post=/python/000007&tag=python&format=.png" border="0" alt="S"></a>) </i> <a href="http://www.stillhq.com/python/000007.commentform.html">Comment</a> http://www.stillhq.com/python/000007.html http://www.stillhq.com/python/000007.html Packet capture in python /python/pcapy Tue, 25 Nov 2008 10:22:00 PST I'm home sick with a cold today and got bored. I wanted to play with packet capture in python, and the documentation for <a href="http://oss.coresecurity.com/pcapy/doc/pt01.html">pcapy</a> is a little sparse. I therefore wrote this simple little sample script: <br/><br/> <ul><pre> #!/usr/bin/python # A simple example of how to use pcapy. This needs to be run as root. import datetime import gflags import pcapy import sys FLAGS = gflags.FLAGS gflags.DEFINE_string('i', 'eth1', 'The name of the interface to monitor') def main(argv): # Parse flags try: argv = FLAGS(argv) except gflags.FlagsError, e: print FLAGS print 'Opening %s' % FLAGS.i # Arguments here are: # device # snaplen (maximum number of bytes to capture _per_packet_) # promiscious mode (1 for true) # timeout (in milliseconds) cap = pcapy.open_live(FLAGS.i, 100, 1, 0) # Read packets -- header contains information about the data from pcap, # payload is the actual packet as a string (header, payload) = cap.next() while header: print ('%s: captured %d bytes, truncated to %d bytes' %(datetime.datetime.now(), header.getlen(), header.getcaplen())) (header, payload) = cap.next() if __name__ == "__main__": main(sys.argv) </pre></ul> <br/><br/> Which outputs something like this: <br/><br/> <ul><pre> 2008-11-25 10:09:53.308310: captured 98 bytes, truncated to 98 bytes 2008-11-25 10:09:53.308336: captured 66 bytes, truncated to 66 bytes 2008-11-25 10:09:53.315028: captured 66 bytes, truncated to 66 bytes 2008-11-25 10:09:53.316520: captured 130 bytes, truncated to 100 bytes 2008-11-25 10:09:53.317030: captured 450 bytes, truncated to 100 bytes 2008-11-25 10:09:53.324414: captured 124 bytes, truncated to 100 bytes 2008-11-25 10:09:53.327770: captured 114 bytes, truncated to 100 bytes 2008-11-25 10:09:53.328001: captured 210 bytes, truncated to 100 bytes </pre></ul> <br/><br/> Next step, decode me some headers! <br/><br/><i>Tags for this post: python(<a href="http://www.stillhq.com/python"><img src="http://www.stillhq.com/tagicon.cgi?post=/python/pcapy/000001&tag=python&format=.png" border="0" alt="S"></a>) pcapy(<a href="http://www.stillhq.com/pcapy"><img src="http://www.stillhq.com/tagicon.cgi?post=/python/pcapy/000001&tag=pcapy&format=.png" border="0" alt="S"></a>) </i><br/><i>Related posts: <a href="http://www.stillhq.com/python/000003.html">Dear lazy web: writing to the win32 event log in Python</a></i> <a href="http://www.stillhq.com/python/pcapy/000001.commentform.html">Comment</a> http://www.stillhq.com/python/pcapy/000001.html http://www.stillhq.com/python/pcapy/000001.html Finding locking deadlocks in python /python Tue, 11 Nov 2008 15:46:00 PST I re-factored some code today, and in the process managed to create a lock deadlock for myself. In the end it turned out to be an exception was being thrown when a lock was held, and adding a try / finally resolved the real underlying problem. However, in the process I ended up writing this little helper that I am sure will be useful in the future. <br/><br/> <ul><pre> import gflags import thread import threading import traceback import logging ... FLAGS = gflags.FLAGS gflags.DEFINE_boolean('dumplocks', False, 'If true, dumps information about lock activity') ... class LockHelper(object): """A wrapper which makes it easier to see what locks are doing.""" lock = thread.allocate_lock() def acquire(self): if FLAGS.dumplocks: logging.info('%s acquiring lock' % threading.currentThread().getName()) for s in traceback.extract_stack(): logging.info(' Trace %s:%s [%s] %s' % s) self.lock.acquire() def release(self): if FLAGS.dumplocks: logging.info('%s releasing lock' % threading.currentThread().getName()) for s in traceback.extract_stack(): logging.info(' Trace %s:%s [%s] %s' % s) self.lock.release() </pre></ul> <br/><br/> Now I can just use this helper in the place of thread.allocate_lock() when I want to see what is happening with locking. It saved me a lot of staring at random code today. <br/><br/><i>Tags for this post: python(<a href="http://www.stillhq.com/python"><img src="http://www.stillhq.com/tagicon.cgi?post=/python/000006&tag=python&format=.png" border="0" alt="S"></a>) </i><br/><i>Related posts: <a href="http://www.stillhq.com/mysql/000010.html">Reducing the MySQL query lock timeout?</a>; <a href="http://www.stillhq.com/link/000031.html">Interesting technique for finding leaks in code</a></i> <a href="http://www.stillhq.com/python/000006.commentform.html">Comment</a> http://www.stillhq.com/python/000006.html http://www.stillhq.com/python/000006.html paramiko exec_command timeout /python/paramiko Sun, 05 Oct 2008 12:20:00 PST I have a paramiko program which sshs to a large number of machines, and sometimes it hits a machine where Channel.exec_command() doesn't return. I know this is a problem with the remote machine, because the same thing happens when I try to ssh to the machine from the command line. However, I don't have any way of determining which machines are broken beforehand. <br/><br/> Paramiko doesn't support a timeout for exec_command(), so I am looking for a generic way of running a function call with a timeout. I can see sample code which does this using threads, but that's pretty ugly. I can't use SIGALARM because I am not running on the main thread. <br/><br/> Can anyone think of a better way of doing this? <br/><br/><i>Tags for this post: python(<a href="http://www.stillhq.com/python"><img src="http://www.stillhq.com/tagicon.cgi?post=/python/paramiko/000004&tag=python&format=.png" border="0" alt="S"></a>) paramiko(<a href="http://www.stillhq.com/paramiko"><img src="http://www.stillhq.com/tagicon.cgi?post=/python/paramiko/000004&tag=paramiko&format=.png" border="0" alt="S"></a>) </i><br/><i>Related posts: <a href="http://www.stillhq.com/mysql/000010.html">Reducing the MySQL query lock timeout?</a></i> <a href="http://www.stillhq.com/python/paramiko/000004.commentform.html">Comment</a> http://www.stillhq.com/python/paramiko/000004.html http://www.stillhq.com/python/paramiko/000004.html Weird paramiko problem /python/paramiko Tue, 16 Sep 2008 11:41:00 PST I had a strange paramiko problem the other day. Sometimes executing a command through a channel (via the exec_command() call) would result in an exit code being returned, but no stdout or stderr. This was for a command I was absolutely sure always returns output, and it wasn't consistent -- I'd run batches of commands and about 10% of them would fail, but not always on the same machine and not always at the same time. I spent ages looking at my code, and the code for the command running at the other end of the channel. <br/><br/> Then it occurred to me that this seemed a lot like a race condition. I started looking at the code for the paramiko Channel class, and ended up deciding that the answer was to check that the eof_received member variable was true before trying to close the channel. <br/><br/> It turns out this just works. I've my code running commands for a couple of days now and have had zero more instances of the "no output, but did exit" error. So, there you go. Its a shame that member variable doesn't have accessors and isn't documented though. I guess that makes my code a little more fragile than I would be happy with. <br/><br/><i>Tags for this post: python(<a href="http://www.stillhq.com/python"><img src="http://www.stillhq.com/tagicon.cgi?post=/python/paramiko/000003&tag=python&format=.png" border="0" alt="S"></a>) paramiko(<a href="http://www.stillhq.com/paramiko"><img src="http://www.stillhq.com/tagicon.cgi?post=/python/paramiko/000003&tag=paramiko&format=.png" border="0" alt="S"></a>) </i><br/><i>Related posts: <a href="http://www.stillhq.com/pngtools/000005.html">PNGtools 0.4</a>; <a href="http://www.stillhq.com/imagemagick/000008.html">Why Debian?</a>; <a href="http://www.stillhq.com/samba/000002.html">Samba and MacOS X 10.4 (Tiger)</a>; <a href="http://www.stillhq.com/diary/001004.html">Bad blog, bad bad blog</a>; <a href="http://www.stillhq.com/bike/000008.html">Mont 24 hour race</a>; <a href="http://www.stillhq.com/mythtv/000007.html">This is why I went to MythTV</a>; <a href="http://www.stillhq.com/imagemagick/000009.html">ImageMagick bug?</a>; <a href="http://www.stillhq.com/link/000053.html">All racehorses descended from 28 horses</a></i> <a href="http://www.stillhq.com/python/paramiko/000003.commentform.html">Comment</a> http://www.stillhq.com/python/paramiko/000003.html http://www.stillhq.com/python/paramiko/000003.html Executing a command with paramiko /python/paramiko Wed, 03 Sep 2008 15:11:00 PST I wanted to provide a simple example of how to execute a command with paramiko as well. This is quite similar to the scp example, but is nicer than executing a command in a shell because there isn't any requirement to do parsing to determine when the command has finished executing. <br/><br/> <ul><pre> #!/usr/bin/python # A simple command example for Paramiko. # Args: # 1: hostname # 2: username # 3: command to run import getpass import os import paramiko import socket import sys # Socket connection to remote host sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.connect((sys.argv[1], 22)) # Build a SSH transport t = paramiko.Transport(sock) t.start_client() t.auth_password(sys.argv[2], getpass.getpass('Password: ')) # Start a cmd channel cmd_channel = t.open_session() cmd_channel.exec_command(sys.argv[3]) data = cmd_channel.recv(1024) while data: sys.stdout.write(data) data = cmd_channel.recv(1024) # Cleanup cmd_channel.close() t.close() sock.close() </pre></ul> <br/><br/><i>Tags for this post: python(<a href="http://www.stillhq.com/python"><img src="http://www.stillhq.com/tagicon.cgi?post=/python/paramiko/000002&tag=python&format=.png" border="0" alt="S"></a>) paramiko(<a href="http://www.stillhq.com/paramiko"><img src="http://www.stillhq.com/tagicon.cgi?post=/python/paramiko/000002&tag=paramiko&format=.png" border="0" alt="S"></a>) </i> <a href="http://www.stillhq.com/python/paramiko/000002.commentform.html">Comment</a> http://www.stillhq.com/python/paramiko/000002.html http://www.stillhq.com/python/paramiko/000002.html Implementing SCP with paramiko /python/paramiko Wed, 03 Sep 2008 13:28:00 PST Regular readers will note that I've been <a href="http://www.stillhq.com/blather/20080902.html">interested in how scp works</a> and <a href="http://www.stillhq.com/blather/20080903.html">paramiko</a> for the last couple of days. There are <a href="http://www.lag.net/pipermail/paramiko/2007-May/000489.html">previous examples of how to do scp with paramiko out there</a>, but the code isn't all on one page, you have to read through the mail thread and work it out from there. I figured I might save someone some time (possibly me!) and note a complete example of scp with paramiko... <br/><br/> <ul><pre> #!/usr/bin/python # A simple scp example for Paramiko. # Args: # 1: hostname # 2: username # 3: local filename # 4: remote filename import getpass import os import paramiko import socket import sys # Socket connection to remote host sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.connect((sys.argv[1], 22)) # Build a SSH transport t = paramiko.Transport(sock) t.start_client() t.auth_password(sys.argv[2], getpass.getpass('Password: ')) # Start a scp channel scp_channel = t.open_session() f = file(sys.argv[3], 'rb') scp_channel.exec_command('scp -v -t %s\n' % '/'.join(sys.argv[4].split('/')[:-1])) scp_channel.send('C%s %d %s\n' %(oct(os.stat(sys.argv[3]).st_mode)[-4:], os.stat(sys.argv[3])[6], sys.argv[4].split('/')[-1])) scp_channel.sendall(f.read()) # Cleanup f.close() scp_channel.close() t.close() sock.close() </pre></ul> <br/><br/><i>Tags for this post: python(<a href="http://www.stillhq.com/python"><img src="http://www.stillhq.com/tagicon.cgi?post=/python/paramiko/000001&tag=python&format=.png" border="0" alt="S"></a>) paramiko(<a href="http://www.stillhq.com/paramiko"><img src="http://www.stillhq.com/tagicon.cgi?post=/python/paramiko/000001&tag=paramiko&format=.png" border="0" alt="S"></a>) </i> <a href="http://www.stillhq.com/python/paramiko/000001.commentform.html">Comment</a> http://www.stillhq.com/python/paramiko/000001.html http://www.stillhq.com/python/paramiko/000001.html SSL, X509, ASN.1 and certificate validity dates /python/tlslite Tue, 05 Aug 2008 15:53:00 PST I was curious about how SSL certificates store validity information (for example when a certificate expires), so I ended up reading <a href="http://www.ietf.org/rfc/rfc2459.txt">the X509 specification</a> (excitingly called "Internet X.509 Public Key Infrastructure Certificate and CRL Profile"), as well as <a href="http://www.obj-sys.com/asn1tutorial/node15.html">the ASN.1 information for UTCTimes</a>. This is all new to me, but I am sure lots of other people understand this. <br/><br/> In the end it wasn't too hard, and now I have hacked support for displaying certificate validity into Python's TLSlite. The point of this post is mainly so I can find that documentation again if I need it, although I'll put the TLSlite patch online as soon as I have had a chance to test it a little better. <br/><br/><i>Tags for this post: python(<a href="http://www.stillhq.com/python"><img src="http://www.stillhq.com/tagicon.cgi?post=/python/tlslite/000001&tag=python&format=.png" border="0" alt="S"></a>) tlslite(<a href="http://www.stillhq.com/tlslite"><img src="http://www.stillhq.com/tagicon.cgi?post=/python/tlslite/000001&tag=tlslite&format=.png" border="0" alt="S"></a>) </i><br/><i>Related posts: <a href="http://www.stillhq.com/research/000005.html">Dear Lazyweb: how do I check SSL keys for vulnerability?</a>; <a href="http://www.stillhq.com/python/twisted/000001.html">Twisted Python and Jabber SSL</a>; <a href="http://www.stillhq.com/google/gtalk/000001.html">Getting Google Talk working with PyXMPP</a></i> <a href="http://www.stillhq.com/python/tlslite/000001.commentform.html">Comment</a> http://www.stillhq.com/python/tlslite/000001.html http://www.stillhq.com/python/tlslite/000001.html Dealing with remote HTTP servers with buggy chunking implementations /python Thu, 10 Jul 2008 22:27:00 PST HTTP 1.1 implements chunking as a way of servers telling clients how much content is left for a given request, which enables you to send more than one piece of content in a given HTTP connection. Unfortunately for me, the site I was trying to access has a buggy chunking implementation, and that causes the somewhat fragile python urllib2 code to throw an exception: <br/><br/> <ul><pre> Traceback (most recent call last): File "./mythingie.py", line 55, in ? xml = remote.readlines() File "/usr/lib/python2.4/socket.py", line 382, in readlines line = self.readline() File "/usr/lib/python2.4/socket.py", line 332, in readline data = self._sock.recv(self._rbufsize) File "/usr/lib/python2.4/httplib.py", line 460, in read return self._read_chunked(amt) File "/usr/lib/python2.4/httplib.py", line 499, in _read_chunked chunk_left = int(line, 16) ValueError: invalid literal for int(): </pre></ul> <br/><br/> <a href="http://www.stillhq.com/blather/20080710.html">I muttered about this earlier today</a>, including <a href="http://bugs.python.org/issue1205#">finding the bug tracking the problem in pythonistan</a>. However, finding the will not fix bug wasn't satisfying enough... <br/><br/> It turns out you can just have urllib2 lie to the server about what HTTP version it talks, and therefore turn off chunking. Here's my sample code for how to do that: <br/><br/> <ul><pre> import httplib import urllib2 class HTTP10Connection(httplib.HTTPConnection): """HTTP10Connection -- a HTTP connection which is forced to ask for HTTP 1.0 """ _http_vsn_str = 'HTTP/1.0' class HTTP10Handler(urllib2.HTTPHandler): """HTTP10Handler -- don't use HTTP 1.1""" def http_open(self, req): return self.do_open(HTTP10Connection, req) // ... request = urllib2.Request(feed) request.add_header('User-Agent', 'mythingie') opener = urllib2.build_opener(HTTP10Handler()) remote = opener.open(request) content = remote.readlines() remote.close() </pre></ul> <br/><br/> I hereby declare myself Michael Still, bringer of the gross python hacks. <br/><br/><i>Tags for this post: python(<a href="http://www.stillhq.com/python"><img src="http://www.stillhq.com/tagicon.cgi?post=/python/000005&tag=python&format=.png" border="0" alt="S"></a>) </i> <a href="http://www.stillhq.com/python/000005.commentform.html">Comment</a> http://www.stillhq.com/python/000005.html http://www.stillhq.com/python/000005.html Universal Feedparser and XML namespaces /python/feedparser Wed, 09 Jul 2008 05:22:00 PST I've always found <a href="http://www.feedparser.org/">python's Universal Feedparser</a> to be a bit hard to work with when using feeds with XML namespaces. Specifically, if you don't care about the stuff in the namespaces then you're fine, but if you want that data it gets a lot harder. <br/><br/> In the past I've had to do some gross hacks. For example this gem is from the <a href="http://www.stillhq.com/mythtv/mythnettv/">MythNetTV</a> code: <br/><br/> <ul><pre> # Modify the XML to work around namespace handling bugs in FeedParser lines = [] re_mediacontent = re.compile('(.*)&lt;media:content([^&gt;]*)/ *&gt;(.*)') for line in xmllines: m = re_mediacontent.match(line) count = 1 while m: line = '%s&lt;media:wannabe%d&gt;%s&lt;/media:wannabe%d&gt;%s' %(m.group(1), count, m.group(2), count, m.group(3)) m = re_mediacontent.match(line) count = count + 1 lines.append(line) # Parse the modified XML xml = ''.join(lines) parser = feedparser.parse(xml) </pre></ul> <br/><br/> Which is horrible, but works. This time around the problem is that I am having trouble getting to the gr:annotation tags in my <a href="http://www.google.com/reader/public/atom/user/09387883873401903052/state/com.google/broadcast">Google reader shared items feed</a>. How annoying. <br/><br/> In the case of the Google reader feed, the problem seems to be that the annotation is presented like this: <br/><br/> <ul><pre> &lt;gr:annotation&gt;&lt;content type="html"&gt;Awesome. Canberra has needed something better than buses between the towncenters for a while, and light rail seems like a great way to do it. I much prefer trains to buses, and catch a light rail service to work every day when I am in Mountain View. &lt;/content&gt;&lt;author gr:user-id="09387883873401903052" gr:profile-id="114835605728492647856"&gt;&lt;name&gt;mikal&lt;/name&gt; &lt;/author&gt;&lt;/gr:annotation&gt; </pre></ul> <br/><br/> Feedparser can only handle simple elements (not elements that contain other elements). Therefore, this gross hack is required to get this to parse correctly: <br/><br/> <ul><pre> simplify_re = re.compile('(.*)&lt;gr:annotation&gt;' '&lt;content type="html"&gt;(.*)&lt;/content&gt;' '&lt;author .*&gt;&lt;name&gt;.*&lt;/name&gt;&lt;/author&gt;' '&lt;/gr:annotation&gt;(.*)') new_lines = [] for line in lines: m = simplify_re.match(line) if m: new_lines.append('%s&lt;gr:annotation&gt;%s&lt;/gr:annotation&gt;%s' %(m.group(1), m.group(2), m.group(3))) else: new_lines.append(line) d = feedparser.parse(''.join(new_lines)) </pre></ul> <br/><br/> Gross, and fragile, but working. This is cool, because it now means that I can apply more logic in the shared links that end up in my <a href="http://www.stillhq.com/blather/">blather feed</a>. I'm thinking of something along the lines of only shared links with an annotation will end up in that feed, and the blather entry will include the annotation. Or something like that. <br/><br/><i>Tags for this post: python(<a href="http://www.stillhq.com/python"><img src="http://www.stillhq.com/tagicon.cgi?post=/python/feedparser/000001&tag=python&format=.png" border="0" alt="S"></a>) feedparser(<a href="http://www.stillhq.com/feedparser"><img src="http://www.stillhq.com/tagicon.cgi?post=/python/feedparser/000001&tag=feedparser&format=.png" border="0" alt="S"></a>) </i><br/><i>Related posts: <a href="http://www.stillhq.com/diary/toys/000022.html">CVS digital cameras and handy cams</a></i> <a href="http://www.stillhq.com/python/feedparser/000001.commentform.html">Comment</a> http://www.stillhq.com/python/feedparser/000001.html http://www.stillhq.com/python/feedparser/000001.html Domain name lookup helper for python? /python Tue, 01 May 2007 21:00:00 PST Hi. I have a list of the domain portion of URLs which looks a bit like this: <br/><br/> <pre> Whois lookup for fycnds.digitalpoimt.com Whois lookup for wvgpzdea.digitalpoimt.com Whois lookup for zhnsht.digitalpoimt.com Whois lookup for frigo25.php5.cz Whois lookup for handrovina.php5.cz Whois lookup for blabota.php5.cz Whois lookup for pctuzing.php5.cz Whois lookup for viagraviagra.php5.cz Whois lookup for poiu.php5.cz Whois lookup for flasa.php5.cz Whois lookup for yoy4.digitalpoimt.com Whois lookup for hskly.digitalpoimt.com Whois lookup for 2i0wjwbc.digitalpoimt.com Whois lookup for harnhjc.digitalpoimt.com Whois lookup for gqru.digitalpoimt.com </pre> <br/><br/> I need some code which determines which portion of these hostnames is a whois-able domain name. My problem is this doesn't seem all that simple to do -- some countries have a second layer of TLDs, and some do not. <br/><br/> Does anyone know of a python library, or failing that simple algorithm, which will do this for me? <br/><br/> (For those left wondering, I am trying to do some analysis of the spam I get on this blog, and for that I want to know if the whois information for a domain that left a suspect comment indicates anything suspicious.) <br/><br/><i>Tags for this post: python(<a href="http://www.stillhq.com/python"><img src="http://www.stillhq.com/tagicon.cgi?post=/python/000004&tag=python&format=.png" border="0" alt="S"></a>) </i><br/><i>Related posts: <a href="http://www.stillhq.com/diary/001076.html">I think I've worked out the problem with the hotel network</a>; <a href="http://www.stillhq.com/diary/001014.html">Mikal, the massive domain squatter</a>; <a href="http://www.stillhq.com/diary/001106.html">Internet traffic</a>; <a href="http://www.stillhq.com/research/smtpsurveys_feb2010.html">Measuring the popularity of SMTP server implementations on the Internet</a>; <a href="http://www.stillhq.com/diary/001090.html">Satellite internet at Walmart</a>; <a href="http://www.stillhq.com/research/000001.html">Interesting paper: "YouTube Traffic Characterization: A View From the Edge"</a>; <a href="http://www.stillhq.com/diary/001078.html">The witty worm with Vern Paxson</a>; <a href="http://www.stillhq.com/google/000002.html">Why does every man and his dog put man pages online?</a>; <a href="http://www.stillhq.com/diary/000986.html">Sensis Australian search</a>; <a href="http://www.stillhq.com/research/smtpsurvey_methodology_feb2010.html">Methodology for my SMTP survey</a></i> <a href="http://www.stillhq.com/python/000004.commentform.html">Comment</a> http://www.stillhq.com/python/000004.html http://www.stillhq.com/python/000004.html Dear lazy web: writing to the win32 event log in Python /python Thu, 14 Dec 2006 23:01:00 PST Dear Lazy Web, <br/><br/> I have a need to be able to write to the MS Windows event log in Python. I must admit I don't know a lot about Python on Windows. Does anyone have a good short sample they would like to share? <br/><br/> Hugs and kisses,<br/> Mikal <br/><br/><i>Tags for this post: python(<a href="http://www.stillhq.com/python"><img src="http://www.stillhq.com/tagicon.cgi?post=/python/000003&tag=python&format=.png" border="0" alt="S"></a>) </i><br/><i>Related posts: <a href="http://www.stillhq.com/diary/toys/000026.html">HP iPaq GPS FA256A</a>; <a href="http://www.stillhq.com/link/000079.html">A side by side comparison of MythTV and Windows Media Center </a>; <a href="http://www.stillhq.com/diary/001042.html">Gloat</a>; <a href="http://www.stillhq.com/dotnet/000059.html">Getting ASP.NET working on Windows XP Tablet PC edition</a>; <a href="http://www.stillhq.com/linux/ubuntu/000005.html">Nice touch</a>; <a href="http://www.stillhq.com/python/pcapy/000001.html">Packet capture in python</a>; <a href="http://www.stillhq.com/diary/toys/000027.html">I feel a little vindicated</a>; <a href="http://www.stillhq.com/mythtv/000004.html">On freely available guide data</a>; <a href="http://www.stillhq.com/link/000138.html">SQL Server is incompatible with Windows Vista?</a>; <a href="http://www.stillhq.com/vista/000001.html">Leon, get with the program</a>; <a href="http://www.stillhq.com/link/000134.html">Windows Vista, now with nagging</a>; <a href="http://www.stillhq.com/diary/001010.html">Ok, where does one buy PCs in the US?</a></i> <a href="http://www.stillhq.com/python/000003.commentform.html">Comment</a> http://www.stillhq.com/python/000003.html http://www.stillhq.com/python/000003.html Twisted conch /python/twisted Mon, 08 May 2006 09:55:00 PST It seems to me that every time I go to write some networking code in Python, the twisted guys have got there before me. Today's adventures are involving twisted conch, which seems very cool. The documentation is a bit patchy though. <br/><br/><i>Tags for this post: python(<a href="http://www.stillhq.com/python"><img src="http://www.stillhq.com/tagicon.cgi?post=/python/twisted/000002&tag=python&format=.png" border="0" alt="S"></a>) twisted(<a href="http://www.stillhq.com/twisted"><img src="http://www.stillhq.com/tagicon.cgi?post=/python/twisted/000002&tag=twisted&format=.png" border="0" alt="S"></a>) </i><br/><i>Related posts: <a href="http://www.stillhq.com/linux/000053.html">A ssh quickie</a>; <a href="http://www.stillhq.com/clusterssh/00001.html">clusterssh</a></i> <a href="http://www.stillhq.com/python/twisted/000002.commentform.html">Comment</a> http://www.stillhq.com/python/twisted/000002.html http://www.stillhq.com/python/twisted/000002.html Twisted Python and Jabber SSL /python/twisted Tue, 18 Apr 2006 22:07:00 PST Ok, so I thought it would be cool to be able to send Google Talk messages to my MythTV box. Can't be too hard to write a twisted python jabber client can it? Well, after an hour of surfing, I give up. I have the simple jabber client example, but it totally doesn't work with the Google servers, I suspect because it doesn't do SSL. I can see one of the twisted.words maintainers filing bugs against the xish stuff too, which I suspect means it's going to be a while. <br/><br/> A little bit disappointing me thinks. <br/><br/><i>Tags for this post: python(<a href="http://www.stillhq.com/python"><img src="http://www.stillhq.com/tagicon.cgi?post=/python/twisted/000001&tag=python&format=.png" border="0" alt="S"></a>) twisted(<a href="http://www.stillhq.com/twisted"><img src="http://www.stillhq.com/tagicon.cgi?post=/python/twisted/000001&tag=twisted&format=.png" border="0" alt="S"></a>) </i><br/><i>Related posts: <a href="http://www.stillhq.com/gtalkbot/000003.html">gtalkbot 1.1</a>; <a href="http://www.stillhq.com/mbot/000001.html">mbot: new hotness in Google Talk bots</a>; <a href="http://www.stillhq.com/google/gtalk/000001.html">Getting Google Talk working with PyXMPP</a>; <a href="http://www.stillhq.com/gtalkbot/000004.html">gtalkbot 1.2</a>; <a href="http://www.stillhq.com/gtalkbot/000001.html">mbot: new hotness in Google Talk bots</a>; <a href="http://www.stillhq.com/gtalkbot/000002.html">Renaming mbot to gtalkbot</a>; <a href="http://www.stillhq.com/blather/000001.html">Blather, an open source Twitter work-alike for Blosxom and Google Talk</a>; <a href="http://www.stillhq.com/link/000040.html">Worst timing evar!</a>; <a href="http://www.stillhq.com/mythtv/000009.html">A MythTV Jabber bot</a>; <a href="http://www.stillhq.com/mythtv/000017.html">MythTV talk at Google</a>; <a href="http://www.stillhq.com/mysql/000004.html">MySQL Tech Talks</a>; <a href="http://www.stillhq.com/link/000070.html">Seth Godin at Google</a>; <a href="http://www.stillhq.com/presentations/000018.html">Slack talk at SLUG</a>; <a href="http://www.stillhq.com/google/000001.html">Alternate queries on results pages making it easier for future evilness?</a>; <a href="http://www.stillhq.com/google/000006.html">Cool people I have met at work</a>; <a href="http://www.stillhq.com/python/tlslite/000001.html">SSL, X509, ASN.1 and certificate validity dates</a>; <a href="http://www.stillhq.com/diary/001032.html">Seth Godin</a>; <a href="http://www.stillhq.com/link/000032.html">Sydney Australia in Google Maps</a>; <a href="http://www.stillhq.com/mysql/000006.html">MySQL Camp</a>; <a href="http://www.stillhq.com/link/000132.html">Solar panel reflection effects in satellite imagery</a>; <a href="http://www.stillhq.com/diary/000995.html">My first keynote presentation</a></i> <a href="http://www.stillhq.com/python/twisted/000001.commentform.html">Comment</a> http://www.stillhq.com/python/twisted/000001.html http://www.stillhq.com/python/twisted/000001.html Python DNS modules /python Fri, 30 Dec 2005 17:07:00 PST My first python script involves doing some DNS lookups (for TXT records if that matters), and I am currently working through using the pydns module for this. Is this really the best DNS module to use for python though? For a start, it was last released in May 2002, and the documentation is somewhat sparse... <br/><br/><i>Tags for this post: python(<a href="http://www.stillhq.com/python"><img src="http://www.stillhq.com/tagicon.cgi?post=/python/000002&tag=python&format=.png" border="0" alt="S"></a>) </i><br/><i>Related posts: <a href="http://www.stillhq.com/research/domain_access.html">Compendium of TLD domain access agreements</a>; <a href="http://www.stillhq.com/research/parked_domains.html">Parked domains</a>; <a href="http://www.stillhq.com/diary/001017.html">Talk about a support life cycle...</a>; <a href="http://www.stillhq.com/google/000004.html">What's happening with frozenchicken.com?</a></i> <a href="http://www.stillhq.com/python/000002.commentform.html">Comment</a> http://www.stillhq.com/python/000002.html http://www.stillhq.com/python/000002.html Example 2.1 from Dive Into Python /python/diveintopython Mon, 28 Nov 2005 11:16:00 PST I've just started working through <a href="http://www.diveintopython.org">Dive Into Python</a>, so I don't really have an opinion of the book yet. I did notice that Example 2.1 produces different output on my machine than from the example... <br/><br/> The example says I should get: <br/><br/> <ul><pre> server=mpilgrim;uid=sa;database=master;pwd=secret </pre></ul> <br/><br/> I get: <br/><br/> <ul><pre> pwd=secret;database=master;uid=sa;server=mpilgrim </pre></ul> <br/><br/> It's interesting that this is exactly the reverse of what the book says I should get. I have no idea why, as I can't read Python yet, but there you go. <br/><br/><i>Tags for this post: python(<a href="http://www.stillhq.com/python"><img src="http://www.stillhq.com/tagicon.cgi?post=/python/diveintopython/000001&tag=python&format=.png" border="0" alt="S"></a>) diveintopython(<a href="http://www.stillhq.com/diveintopython"><img src="http://www.stillhq.com/tagicon.cgi?post=/python/diveintopython/000001&tag=diveintopython&format=.png" border="0" alt="S"></a>) </i> <a href="http://www.stillhq.com/python/diveintopython/000001.commentform.html">Comment</a> http://www.stillhq.com/python/diveintopython/000001.html http://www.stillhq.com/python/diveintopython/000001.html