See recent comments. RSS feed of all comments. Raw dump of all comments for research purposes.
ImageMagick book
MythTV book
|
 |
|
 |
|
Thu, 04 Dec 2008
|
|
|
|
|
 |
|
 |
|
I estimate (badly, I might add) that I currently use about 200gb of Internet traffic on my DSL link a month. If I'm going to move back to Australia sometime, that's going to become a killer. Unfortunately, because my ISP doesn't bill for traffic here in the US, they don't appear to track my use at all. I think it might be time for me to do some tracking myself.
So, one of life's little questions. Do I use pcap to snarf traffic on the DSL, or use iptable's conntrack stuff in /proc? Just one more thing to ponder.
Tags for this post: blog( )
posted at: 11:34 | path: /diary | permanent link to this entry
|
|
|
|
|
 |
|
 |
|
Wed, 03 Dec 2008
|
|
|
|
|
 |
|
 |
|
09:33: Mikal shared: Exploding Offer Season
I hadn't heard of an exploding offer until I moved to the US. The other thing I would suggest doing if you're subject to one is to make sure that the recruiters at the other companies know that you've been made an exploding offer. They shouldn't view it as a threat (these things are quite common) and ill quite often rearrange your interviews so that they can make you an offer before your other one explodes. Microsoft seems to be the big user of these exploding offers as best as I can tell.
09:33: Mikal shared: Swiss precision
It sounds like moving to Switzerland is about as much of a pain as it was moving to the US. Perhaps its painful to move to any country?
09:33: Mikal shared: Register of penalty notices | NSW Food Authority
Scary. A list of restaurants in New South Wales which have been served with a penalty notice for violating food safety standards. I should remember to check this next time I eat out...
09:33: Mikal shared: Apple's completely unsurprising Black Friday deals appear on Australian site
Why is Apple running Thanksgiving sales in Australia? Will they be running a Queens Birthday sale in the US soon?
09:33: Mikal shared: Notes on Hacking the Roku Netflix Player
This fellow has made some interesting progress on hacking the Roku Netflix player. I wonder if Roku have considered allowing a streaming frontend that either does uPNP or MythTV directly?
09:33: Mikal shared: InstallMythBuntu - atv-bootloader - Google Code
An alternative to Roku's box as a MythTV frontend is the AppleTV, which does currently work. This page is the install instructions for MythBuntu on an AppleTV. Pity its twice the price of the Roku box.
Tags for this post: blather( )
posted at: 09:33 | path: /blather | permanent link to this entry
|
|
|
|
|
 |
|
 |
|
Tue, 02 Dec 2008
|
|
|
|
|
 |
|
 |
|
Sun, 30 Nov 2008
|
|
|
|
|
 |
|
 |
|
Sat, 29 Nov 2008
|
|
|
|
|
 |
|
 |
|
Wed, 26 Nov 2008
|
|
|
|
|
 |
|
 |
|
Wow, this is a blast from the past. When I wrote the pngchunks command in 2003, I had never seen a 64 bit machine, and knew enough to check that an int was the right size, but not enough to just use the guaranteed-to-be-32-bit version from day 1. I'd pretty much forgotten about this code until I got pinged about this Debian bug. The bug reporter is entirely right, this was lame.
PNGtools 0.4 should be 64 bit safe. The pngchunks command works on my 64 bit machines at least.
Tags for this post: pngtools( )
posted at: 15:16 | path: /pngtools | permanent link to this entry
|
|
|
|
|
 |
|
 |
|
Tue, 25 Nov 2008
|
|
|
|
|
 |
|
 |
|
|
|
|
|
|
 |
|
 |
|
I'm home sick with a cold today and got bored. I wanted to play with packet capture in python, and the documentation for pcapy is a little sparse. I therefore wrote this simple little sample script:
#!/usr/bin/python
# A simple example of how to use pcapy. This needs to be run as root.
import datetime
import gflags
import pcapy
import sys
FLAGS = gflags.FLAGS
gflags.DEFINE_string('i', 'eth1',
'The name of the interface to monitor')
def main(argv):
# Parse flags
try:
argv = FLAGS(argv)
except gflags.FlagsError, e:
print FLAGS
print 'Opening %s' % FLAGS.i
# Arguments here are:
# device
# snaplen (maximum number of bytes to capture _per_packet_)
# promiscious mode (1 for true)
# timeout (in milliseconds)
cap = pcapy.open_live(FLAGS.i, 100, 1, 0)
# Read packets -- header contains information about the data from pcap,
# payload is the actual packet as a string
(header, payload) = cap.next()
while header:
print ('%s: captured %d bytes, truncated to %d bytes'
%(datetime.datetime.now(), header.getlen(), header.getcaplen()))
(header, payload) = cap.next()
if __name__ == "__main__":
main(sys.argv)
Which outputs something like this:
2008-11-25 10:09:53.308310: captured 98 bytes, truncated to 98 bytes
2008-11-25 10:09:53.308336: captured 66 bytes, truncated to 66 bytes
2008-11-25 10:09:53.315028: captured 66 bytes, truncated to 66 bytes
2008-11-25 10:09:53.316520: captured 130 bytes, truncated to 100 bytes
2008-11-25 10:09:53.317030: captured 450 bytes, truncated to 100 bytes
2008-11-25 10:09:53.324414: captured 124 bytes, truncated to 100 bytes
2008-11-25 10:09:53.327770: captured 114 bytes, truncated to 100 bytes
2008-11-25 10:09:53.328001: captured 210 bytes, truncated to 100 bytes
Next step, decode me some headers!
Tags for this post: python( ) pcapy( )
posted at: 10:22 | path: /python/pcapy | permanent link to this entry
|
|
|
|
|
 |
|
 |
|
Mon, 24 Nov 2008
|
|
|
|
|
 |
|
 |
|
12:15: Mikal shared: SOCKS - Wikipedia, the free encyclopedia
Huh. I didn't realize there is a socks proxy built into OpenSSH. Now if only there was a way to create new port forwards after the connection is opened.
14:52: The internets strike again. I am now assured in the comments to this post that you can in fact add a new port forward to an existing ssh connection. Next, can someone tell me how to get ssh to make me a cup of tea?
Tags for this post: blather( )
posted at: 14:52 | path: /blather | permanent link to this entry
|
|
|
|
|
 |
|
 |
|
Sat, 22 Nov 2008
|
|
|
|
|
 |
|
 |
|
Fri, 21 Nov 2008
|
|
|
|
|
 |
|
 |
|
09:45: Mikal shared: Buy One Dodge Ram, Get One Free [Deals]
You know the US auto industry is in trouble when they start offering buy one get one free deals on cars.
15:00: Mikal shared: Article about backyard chicken owners
I didn't realize that other people found chickens entertaining too. I figured it was just me. There is nothing more entertaining than throwing a mound of kitchen scraps into the coup and then watching the chickens argue over a banana peel. Its hard to explain... Perhaps when I move back to Australia I'll setup ChickenCam.
Tags for this post: blather( )
posted at: 15:00 | path: /blather | permanent link to this entry
|
|
|
|
|
 |
|
 |
|
Wed, 19 Nov 2008
|
|
|
|
|
 |
|
 |
|
Mon, 17 Nov 2008
|
|
|
|
|
 |
|
 |
|

ISBN: 0586058397 LibraryThing
| I'm back to reading Foundation Series books actually written by Isaac Asimov. This one is the fourth in the Foundation Series if you count them in the order they were written, but is the second last in chronological terms. Its set 500 years after the failure of the first galactic empire, and follows the first Foundation's attempt to discover if the second Foundation still exists. Well, its a bit more complicated than that, but I don't want to ruin it for you.
As an aside, the user interface described for the ship's computer is really cool. Its a bit like augmented reality, mixed with gesture control, mixed with a direct interface into the brain. I'm not saying I want one in my house, but its cool that a book written in 1983 still has a user interface description which isn't dated, and still seems plausible.
This book has minor inconsistencies with the story presented in the second foundation trilogy (Foundation's Fear, Foundation and Chaos and Foundation's Triumph), but I see that more as a failure in those followup authors than in this book. In fact, I've already complained about how untrue to Asimov's vision some of those books are elsewhere.
This is a good read, and I enjoyed it greatly.
Tags for this post: book( ) Isaac_Asimov( ) |
posted at: 18:40 | path: /book/Isaac_Asimov | permanent link to this entry
|
|
|
|
|
 |
|
 |
|
|
|
|
|
|
 |
|
 |
|
I've been using some simple procmail rules to automatically create folders for mailing lists for ages. Tony asked me for those rules today, so I figured I'd just put them online.
##########################################################################
# Mailman
:0:
* List-Id:.*<\/[^>]*
$MATCH
:0:
* List-Post: ]*
$MATCH
##########################################################################
# Majordomo lists (sometimes don't have <>'s around the address
:0:
* X-Mailing-List:.*<\/[^>]*
$MATCH
:0:
* X-Mailing-List:.*\/.*
$MATCH
##########################################################################
# Ezmlm
:0:
* Mailing-List: .* \/[^ ;]*
$MATCH
##########################################################################
# I'm not sure what creates this one...
:0:
* X-Loop: \/.*
$MATCH
Tags for this post: procmail( )
posted at: 14:59 | path: /procmail | permanent link to this entry
|
|
|
|
|
 |
|
 |
|
Sun, 16 Nov 2008
|
|
|
|
|
 |
|
 |
|
I'm looking for someone with solid MythTV experience and a good grasp of the English language to help me out with a project. All I can promise in return is glory, and that will be proportional to the eventual success of the project. If you're interested in spending some time (probably around 40 hours or so, spread over a couple of months) on such a project drop me a line.
Tags for this post: mythtv( )
posted at: 19:00 | path: /mythtv | permanent link to this entry
|
|
|
|
|
 |
|
 |
|
Sat, 15 Nov 2008
|
|
|
|
|
 |
|
 |
|
Fri, 14 Nov 2008
|
|
|
|
|
 |
|
 |
|
20:53: Memeomatic found a new meme: Stewart Smith: phrase from nearest book, Josh Stewart: random book content, Jon Oxer: Phrase from nearest book, Peter Lieverdink: blog meme #42, James Purser: The Meme of the Book, Michael Davies: Meme #42, Jeremy Visser: Book meme, Thomas Karpiniec: The Book Thing, Chris Neugebauer: Meme #42, Donna Benjamin: meme-ege, Simon Lyall: Economist plus meme
21:14: Memeomatic extended an existing meme: Chris Samuel: Book Meme
21:41: Memeomatic extended an existing meme: Book meme de jour
22:05: Memeomatic extended an existing meme: On a memeomatic
22:21: Memeomatic extended an existing meme: Kees Cook: phrase from nearest book meme
22:21: Memeomatic extended an existing meme: Mike Hommey: Book meme reloaded
22:29: Memeomatic extended an existing meme: Jacob Peddicord: Open Source Bridge
Tags for this post: blather( )
posted at: 22:29 | path: /blather | permanent link to this entry
|
|
|
|
|
 |
|
 |
|
I'm on vacation today, and so I had a bit more time that usual to just think. So, when Jeff posited a meme detector for planets, I wrote one. Except its of course never just that simple. My initial implementation only took a few minutes to write, but sucked.
What I did was I wrote a script which scanned through the list of posts from the planet's RSS feed, and kept a tally of which sequences of words (let's call them sentences, even though they're not) appear in which posts. Then, if a sentence appears in more than four posts, and those posts are from at least two domains, we've found a meme.
That's actually a reasonable algorithm. Its big advantage is that it only has to take one pass through the posts, which means its order is linear -- O(n). Now, the problem with that algorithm is that there a small differences in some of the sentences (for example people mistype a sentence), and I ended up finding too many copies of the same meme.
Here's some sample output from that version:
Found memes:
- "the cool book or the intellectual one pick the closest": Chris Neugebauer: Meme #42, Thomas Karpiniec: The Book Thing, Jeremy Visser: Book meme, Michael Davies: Meme #42, James Purser: The Meme of the Book, Jon Oxer: Phrase from nearest book, Josh Stewart: random book content, Stewart Smith: phrase from nearest book
- "find the fifth sentence post the text of the sentence": Donna Benjamin: meme-ege, Chris Neugebauer: Meme #42, Thomas Karpiniec: The Book Thing, Jeremy Visser: Book meme, Michael Davies: Meme #42, James Purser: The Meme of the Book, Peter Lieverdink: blog meme #42, Jon Oxer: Phrase from nearest book, Josh Stewart: random book content, Stewart Smith: phrase from nearest book
- "journal along with these instructions dont dig for your favorite": Jeremy Visser: Book meme, Michael Davies: Meme #42, James Purser: The Meme of the Book, Peter Lieverdink: blog meme #42, Jon Oxer: Phrase from nearest book, Josh Stewart: random book content, Stewart Smith: phrase from nearest book
- "grab the nearest book open it to page 56 find": Chris Neugebauer: Meme #42, Thomas Karpiniec: The Book Thing, Jeremy Visser: Book meme, Michael Davies: Meme #42, James Purser: The Meme of the Book, Peter Lieverdink: blog meme #42, Jon Oxer: Phrase from nearest book, Josh Stewart: random book content, Stewart Smith: phrase from nearest book
mi
If you look at those you'll see that they're all the same meme, but the code found it three different ways. I need an algorithm which accurately finds the meme only once.
I should stop here and mention that I think this problem would be an excellent interview question. If you were going to ask the question in an interview you'd probably phrase it more as:
Given a list of strings, find substrings repeated between the strings, and return a list of the substrings and the strings containing them.
When the problem is phrased like that, I am sure that some folk think of an algorithm which compares each string with each other, looks for some sort of largest substring between the two, and then builds a table of those. However, the problem with that is that the order would be O(N^2), which is ok for a planet RSS feed, but wouldn't be so great if the set of strings you wanted to compare was something like every page on the Internet.
Anyway, I think its possible to rescue my initial implementation by providing a final pass which checks if matches overlap and combines them if they do. For example, if the only difference between two detected memes is one post, then they're probably the same meme and can be combined.
That's a interesting problem in itself. Its easy to measure the difference in the list of matching posts for two memes, but that comparison has O(N^2), which I just said was a bad thing. However, this is a vacation day and I couldn't think of anything better, so that's what I ended up using. I guess I'll wait for a smart interview candidate to think of a better way for me.
You can see output from memeomatic in this blather post for today. The blather code I wrote a while ago makes it really to post messages to my site, which is why I've reused it here (you just call a method on a python module, and a pre-existing Rube Goldberg machine takes care of the rest).
My code:
import feedparser
import os
import re
import shelve
import sys
import unicodedata
import urllib
_SENTENCE_LENGTH = 5
def Normalize(value):
normalized = unicodedata.normalize('NFKD', unicode(value))
normalized = normalized.encode('ascii', 'ignore')
return normalized
def ListDifference(l1, l2):
delta = []
for l in l2:
if l not in l1:
delta.append(l)
return len(delta)
plugins_dir = '%s/plugins' % os.getcwd()
print 'Appending %s to module path' % plugins_dir
sys.path.append(plugins_dir)
import blather
data = shelve.open('memes.slf', writeback=True)
data.setdefault('sentences', {})
data.setdefault('titles', {})
data.setdefault('content', {})
data.setdefault('content_orig', {})
data.setdefault('memes', [])
ds = blather.DataStore()
changed = False
# Scan feeds, looking for new posts. This just populates the database.
for feed in data['feeds']:
print
print 'Fetching %s' % feed
d = feedparser.parse(feed)
# Newest entries are first
entries = d.entries
entries.reverse()
for ent in entries:
print ' Considering %s' % ent.title
data['titles'][ent.link] = ent.title
content = Normalize(ent.description)
data['content_orig'][ent.link] = content
content = ' '.join(content.split('\n'))
content = re.sub('<[^>]*>', '', content)
content = re.sub('[^\w]+', ' ', content)
content = content.lower()
data['content'][ent.link] = content
words = content.split()
for i in range(len(words) - _SENTENCE_LENGTH):
key = ' '.join(words[i:i + _SENTENCE_LENGTH])
data['sentences'].setdefault(key, [])
if not ent.link in data['sentences'][key]:
data['sentences'][key].append(ent.link)
# Now we have a database of sentences and the posts which share them. What we
# really want is a collection of shared sentences that form a meme, and the
# posts which contain those sentences.
for sentence in data['sentences']:
found = False
if len(data['sentences'][sentence]) > 4:
domains = {}
# Its possible that they're all from one domain...
for url in data['sentences'][key]:
domain = url.strip('http://').split('/')[0]
domains[domain] = True
# Its not a meme unless the sentence is shared by at least four posts.
# Try to find an existing meme which contains these posts.
for (sentences, urls, published) in data['memes']:
if not found and ListDifference(urls, data['sentences'][sentence]) < 2:
data['memes'].remove((sentences, urls, published))
if sentence not in sentences:
sentences.append(sentence)
new_titles = []
for u in data['sentences'][sentence]:
if not u in urls:
urls.append(u)
new_titles.append('<a href="%s">%s</a>'
%(u, data['titles'][u]))
data['memes'].append((sentences, urls, published))
found = True
if published and new_titles:
print 'Added posts to an existing meme'
ds.AddMessage('Memeomatic extended an existing meme: %s'
% ', '.join(new_titles))
changed = True
if not found and len(domains) > 1:
print ('Created a new meme for "%s" with %s'
%(sentence, data['sentences'][sentence]))
data['memes'].append(([sentence], data['sentences'][sentence], False))
# Publish new memes
for meme in data['memes']:
(sentences, urls, published) = meme
if not published:
titles = []
for url in urls:
titles.append('<a href="%s">%s</a>' %(url, data['titles'][url]))
ds.AddMessage('Memeomatic found a new meme: %s' % ', '.join(titles))
data['memes'].remove((sentences, urls, published))
data['memes'].append((sentences, urls, True))
print 'Published a new meme'
changed = True
if changed:
ds.Save()
data.close()
So there you go. I haven't set this as a cron job yet, as I want to baby sit it for a while to make sure its doing the right thing. I might one day get around to trusting it enough to just turn it on.
Tags for this post: meme( )
posted at: 22:04 | path: /meme | permanent link to this entry
|
|
|
|
|
 |
|
 |
|
It seems that planet is a bit too trusting with dates. For example, if you have a post with a date well into the future, then you can keep that post at the top of the planet output until that date comes around. Its interesting that no one has used that maliciously yet.
You can see an example of what I'm talking about at Planet Linux Australia, where some forward dated posts sit at the top of the page...
Tags for this post: blog( )
posted at: 21:55 | path: /diary | permanent link to this entry
|
|
|
|
|
 |
|
 |
|
I don't normally get involved in this whole meme thing, but I want to test memeomatic some more. So, here goes...
Instructions:
- Grab the nearest book.
- Open it to page 56.
- Find the fifth sentence.
- Post the text of the sentence in your journal along with these instructions.
- Don't dig for your favorite book, the cool book, or the intellectual one: pick the CLOSEST.
So, I'm currently reading A Darkness at Sethanon, which means its close to hand. The sentence is "They are correct as written, Commander."
Tags for this post: meme( )
posted at: 21:29 | path: /meme | permanent link to this entry
|
|
|
|
|
 |
|
 |
|
Thu, 13 Nov 2008
|
|
|
|
|
 |
|
 |
|
Paul has thoughts on how to avoid Rudd's internet filter. I am left wondering why he doesn't just suggest Tor though. Its designed for exactly this sort of censorship, requires no account in another country, and is cross platform. The only catch is that Tor does block some traffic (for example bittorrent), so you can't just use it for all your traffic.
Tags for this post: blog( )
posted at: 21:25 | path: /diary | permanent link to this entry
|
|
|
|
|
 |
|
 |
|
I got adventurous tonight, and whipped up some javascript which updates the sentence at the end of each post which lists how many comments there are on a post. This means that the site is always up to date, even though all the HTML is static files on disk. It also means I can finally kill that silly hourly regenerate cron job.
Oh, and this is post 3,000 on this site.
Tags for this post: site( )
posted at: 19:55 | path: /site | permanent link to this entry
| |