Content here is by Michael Still mikal@stillhq.com. All opinions are my own.
See recent comments. RSS feed of all comments.


Wed, 17 May 2017



The Collapsing Empire




    ISBN: 076538888X
    LibraryThing
    This is a fun fast read, as is everything by Mr Scalzi. The basic premise here is that of a set of interdependent colonies that are about to lose their ability to trade with each other, and are therefore doomed. Oh, except they don't know that and are busy having petty trade wars instead. It isn't a super intellectual read, but it is fun and does leave me wanting to know what happens to the empire...

    Tags for this post: book john_scalzi
    Related posts: The Android's Dream; The Sagan Diary ; Agent to the Stars; The Ghost Brigades (2); The Human Division; Redshirts


posted at: 21:46 | path: /book/John_Scalzi | permanent link to this entry


Thu, 11 May 2017



Python3 venvs for people who are old and grumpy

    I've been using virtualenvwrapper to make venvs for python2 for probably six or so years. I know it, and understand it. Now some bad man (hi Ramon!) is making me do python3, and virtualenvwrapper just isn't a thing over there as best as I can tell.

    So how do I make a venv? Its really not too bad...

    First, install the dependencies:

      git clone git://github.com/yyuu/pyenv.git .pyenv
      echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bashrc
      echo 'export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bashrc
      echo 'eval "$(pyenv init -)"' >> ~/.bashrc
      git clone https://github.com/yyuu/pyenv-virtualenv.git ~/.pyenv/plugins/pyenv-virtualenv
      source ~/.bashrc
      


    Now to make a venv, do something like this (in this case, infrasot is the name of the venv):

      mkdir -p ~/.virtualenvs/pyenv-infrasot
      cd ~/.virtualenvs/pyenv-infrasot
      pyenv virtualenv system infrasot
      


    You can see your installed venvs like this:

      $ pyenv versions
      * system (set by /home/user/.pyenv/version)
        infrasot
      


    Where system is the system installed python, and not a venv. To activate and deactivate the venv, do this:

      $ pyenv activate infrasot
      $ ... stuff you're doing ...
      $ pvenv deactivate
      


    I'll probably write wrappers at some point so that this looks like virtualenvwrapper, but its good enough for now.

    Tags for this post: python venv virtualenvwrapper python3
    Related posts: Terrible pong; More coding club; Example 2.1 from Dive Into Python; Dealing with remote HTTP servers with buggy chunking implementations; On syncing with Google Contacts; mbot: new hotness in Google Talk bots

posted at: 21:20 | path: /python | permanent link to this entry


Sun, 07 May 2017



Things I read today: the best description I've seen of metadata routing in neutron

posted at: 17:52 | path: /openstack | permanent link to this entry


Tue, 04 Apr 2017



Light to Light, Day Three

    The third and final day of the Light to Light Walk at Ben Boyd National Park. This was a shorter (8 kms) easier walk. A nice way to finish the journey.






                         

    Tags for this post: events pictures 20170313 photo scouts bushwalk
    Related posts: Exploring the Jagungal; Light to Light, Day Two; Light to Light, Day One; Scout activity: orienteering at Mount Stranger; Potato Point

posted at: 17:42 | path: /events/pictures/20170313 | permanent link to this entry


Light to Light, Day Two

    Our second day walking the Light to Light walk in Ben Boyd National Park. This second day was about 10 kms and was on easier terrain than the first day. That said, probably a little less scenic than the first day too.






                 

    Tags for this post: events pictures 20170312 photo scouts bushwalk
    Related posts: Exploring the Jagungal; Light to Light, Day Three; Light to Light, Day One; Scout activity: orienteering at Mount Stranger; Potato Point

posted at: 16:59 | path: /events/pictures/20170312 | permanent link to this entry


Light to Light, Day One

    Macarthur Scouts took a group of teenagers down to Ben Boyd National Park on the weekend to do the Light to Light walk. The first day was 14 kms through lovely undulating terrain. This was the hardest day of the walk, but very rewarding and I think we all had fun.






                                           

    See more thumbnails

    Tags for this post: events pictures 20170311 photo scouts bushwalk
    Related posts: Exploring the Jagungal; Light to Light, Day Three; Light to Light, Day Two; Scout activity: orienteering at Mount Stranger; Potato Point

posted at: 16:01 | path: /events/pictures/20170311 | permanent link to this entry


Thu, 02 Feb 2017



Nova vendordata deployment, an excessively detailed guide

    Nova presents configuration information to instances it starts via a mechanism called metadata. This metadata is made available via either a configdrive, or the metadata service. These mechanisms are widely used via helpers such as cloud-init to specify things like the root password the instance should use. There are three separate groups of people who need to be able to specify metadata for an instance.

    User provided data

    The user who booted the instance can pass metadata to the instance in several ways. For authentication keypairs, the keypairs functionality of the Nova APIs can be used to upload a key and then specify that key during the Nova boot API request. For less structured data, a small opaque blob of data may be passed via the user-data feature of the Nova API. Examples of such unstructured data would be the puppet role that the instance should use, or the HTTP address of a server to fetch post-boot configuration information from.

    Nova provided data

    Nova itself needs to pass information to the instance via its internal implementation of the metadata system. Such information includes the network configuration for the instance, as well as the requested hostname for the instance. This happens by default and requires no configuration by the user or deployer.

    Deployer provided data

    There is however a third type of data. It is possible that the deployer of OpenStack needs to pass data to an instance. It is also possible that this data is not known to the user starting the instance. An example might be a cryptographic token to be used to register the instance with Active Directory post boot -- the user starting the instance should not have access to Active Directory to create this token, but the Nova deployment might have permissions to generate the token on the user's behalf.

    Nova supports a mechanism to add "vendordata" to the metadata handed to instances. This is done by loading named modules, which must appear in the nova source code. We provide two such modules:

    • StaticJSON: a module which can include the contents of a static JSON file loaded from disk. This can be used for things which don't change between instances, such as the location of the corporate puppet server.
    • DynamicJSON: a module which will make a request to an external REST service to determine what metadata to add to an instance. This is how we recommend you generate things like Active Directory tokens which change per instance.


    Tell me more about DynamicJSON

    Having said all that, this post is about how to configure the DynamicJSON plugin, as I think its the most interesting bit here.

    To use DynamicJSON, you configure it like this:

    • Add "DynamicJSON" to the vendordata_providers configuration option. This can also include "StaticJSON" if you'd like.
    • Specify the REST services to be contacted to generate metadata in the vendordata_dynamic_targets configuration option. There can be more than one of these, but note that they will be queried once per metadata request from the instance, which can mean a fair bit of traffic depending on your configuration and the configuration of the instance.


    The format for an entry in vendordata_dynamic_targets is like this:

    <name>@<url>
    


    Where name is a short string not including the '@' character, and where the URL can include a port number if so required. An example would be:

    testing@http://127.0.0.1:125
    


    Metadata fetched from this target will appear in the metadata service at a new file called vendordata2.json, with a path (either in the metadata service URL or in the configdrive) like this:

    openstack/2016-10-06/vendor_data2.json
    


    For each dynamic target, there will be an entry in the JSON file named after that target. For example::

            {
                "testing": {
                    "value1": 1,
                    "value2": 2,
                    "value3": "three"
                }
            }
    


    Do not specify the same name more than once. If you do, we will ignore subsequent uses of a previously used name.

    The following data is passed to your REST service as a JSON encoded POST:

    • project-id: the UUID of the project that owns the instance
    • instance-id: the UUID of the instance
    • image-id: the UUID of the image used to boot this instance
    • user-data: as specified by the user at boot time
    • hostname: the hostname of the instance
    • metadata: as specified by the user at boot time


    Deployment considerations

    Nova provides authentication to external metadata services in order to provide some level of certainty that the request came from nova. This is done by providing a service token with the request -- you can then just deploy your metadata service with the keystone authentication WSGI middleware. This is configured using the keystone authentication parameters in the vendordata_dynamic_auth configuration group.

    This behavior is optional however, if you do not configure a service user nova will not authenticate with the external metadata service.

    Deploying the same vendordata service

    There is a sample vendordata service that is meant to model what a deployer would use for their custom metadata at http://github.com/mikalstill/vendordata. Deploying that service is relatively simple:

    $ git clone http://github.com/mikalstill/vendordata
    $ cd vendordata
    $ apt-get install virtualenvwrapper
    $ . /etc/bash_completion.d/virtualenvwrapper (only needed if virtualenvwrapper wasn't already installed)
    $ mkvirtualenv vendordata
    $ pip install -r requirements.txt
    


    We need to configure the keystone WSGI middleware to authenticate against the right keystone service. There is a sample configuration file in git, but its configured to work with an openstack-ansible all in one install that I setup up for my private testing, which probably isn't what you're using:

    [keystone_authtoken]
    insecure = False
    auth_plugin = password
    auth_url = http://172.29.236.100:35357
    auth_uri = http://172.29.236.100:5000
    project_domain_id = default
    user_domain_id = default
    project_name = service
    username = nova
    password = 5dff06ac0c43685de108cc799300ba36dfaf29e4
    region_name = RegionOne
    


    Per the README file in the vendordata sample repository, you can test the vendordata server in a stand alone manner by generating a token manually from keystone:

    $ curl -d @credentials.json -H "Content-Type: application/json" http://172.29.236.100:5000/v2.0/tokens > token.json
    $ token=`cat token.json | python -c "import sys, json; print json.loads(sys.stdin.read())['access']['token']['id'];"`
    


    We then include that token in a test request to the vendordata service:

    curl -H "X-Auth-Token: $token" http://127.0.0.1:8888/
    


    Configuring nova to use the external metadata service

    Now we're ready to wire up the sample metadata service with nova. You do that by adding something like this to the nova.conf configuration file:

    [api]
    vendordata_providers=DynamicJSON
    vendordata_dynamic_targets=testing@http://metadatathingie.example.com:8888
    


    Where metadatathingie.example.com is the IP address or hostname of the server running the external metadata service. Now if we boot an instance like this:

    nova boot --image 2f6e96ca-9f58-4832-9136-21ed6c1e3b1f --flavor tempest1 --nic net-name=public --config-drive true foo
    


    We end up with a config drive which contains the information or external metadata service returned (in the example case, handy Carrie Fischer quotes):

    # cat openstack/latest/vendor_data2.json | python -m json.tool
    {
        "testing": {
            "carrie_says": "I really love the internet. They say chat-rooms are the trailer park of the internet but I find it amazing."
        }
    }
    


    Tags for this post: openstack nova metadata vendordata configdrive cloud-init
    Related posts: Things I read today: the best description I've seen of metadata routing in neutron; One week of Nova Kilo specifications; How are we going with Nova Kilo specs after our review day?; Specs for Kilo; Juno nova mid-cycle meetup summary: nova-network to Neutron migration; Thoughts from the PTL

posted at: 19:49 | path: /openstack | permanent link to this entry


Tue, 31 Jan 2017



Giving serial devices meaningful names

    This is a hack I've been using for ages, but I thought it deserved a write up.

    I have USB serial devices. Lots of them. I use them for home automation things, as well as for talking to devices such as the console ports on switches and so forth. For the permanently installed serial devices one of the challenges is having them show up in predictable places so that the scripts which know how to drive each device are talking in the right place.

    For the trivial case, this is pretty easy with udev:

    $  cat /etc/udev/rules.d/60-local.rules 
    KERNEL=="ttyUSB*", \
        ATTRS{idVendor}=="0403", ATTRS{idProduct}=="6001", \
        ATTRS{serial}=="A8003Ye7", \
        SYMLINK+="radish"
    


    This says for any USB serial device that is discovered (either inserted post boot, or at boot), if the USB vendor and product ID match the relevant values, to symlink the device to "/dev/radish".

    You find out the vendor and product ID from lsusb like this:

    $ lsusb
    Bus 003 Device 003: ID 0624:0201 Avocent Corp. 
    Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
    Bus 007 Device 002: ID 0665:5161 Cypress Semiconductor USB to Serial
    Bus 007 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
    Bus 006 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
    Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
    Bus 005 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
    Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
    Bus 004 Device 002: ID 0403:6001 Future Technology Devices International, Ltd FT232 Serial (UART) IC
    Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
    Bus 009 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
    Bus 008 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
    


    You can play with inserting and removing the device to determine which of these entries is the device you care about.

    So that's great, until you have more than one device with the same USB serial vendor and product id. Then things are a bit more... difficult.

    It turns out that you can have udev execute a command on device insert to help you determine what symlink to create. So for example, I have this entry in the rules on one of my machines:

    KERNEL=="ttyUSB*", \
        ATTRS{idVendor}=="067b", ATTRS{idProduct}=="2303", \
        PROGRAM="/usr/bin/usbtest /dev/%k", \
        SYMLINK+="%c"
    


    This results in /usr/bin/usbtest being run with the path of the device file on its command line for every device detection (of a matching device). The stdout of that program is then used as the name of a symlink in /dev.

    So, that script attempts to talk to the device and determine what it is -- in my case either a currentcost or a solar panel inverter.

    Tags for this post: linux udev serial usb usbserial
    Related posts: Ubuntu, Dapper Drake, and that difficult Dell e310; Video4Linux, ov511, and RGB24 palettes; ov511 hackery; Roomba serial cables; Via M10000, video, and a Belkin wireless USB thing; SMART and USB storage

posted at: 12:04 | path: /linux | permanent link to this entry


Mon, 30 Jan 2017



A pythonic example of recording metrics about ephemeral scripts with prometheus

    In my previous post we talked about how to record information from short lived scripts (I call them ephemeral scripts by the way) with prometheus. The example there was a script which checked the SMART status of each of the disks in a machine and reported that via pushgateway. I now want to work through a slightly more complicated example.

    I think you hit the limits of reporting simple values in shell scripts via curl requests fairly quickly. For example with the SMART monitoring script, SMART is capable of returning a whole heap of metrics about the performance of a disk, but we boiled that down to a single "health" value. This is largely because writing a parser for all the other values that smartctl returns would be inefficient and fragile in shell. So for this post, we're going to work through an example of how to report a variety of values from a python script. Those values could be the parsed output of smartctl, but to mix things up a bit, I'm going to use a different script I wrote recently.

    This new script uses the Weather Underground API to lookup weather stations near my house, and then generate graphics of the weather forecast. These graphics are displayed on the various Cisco SIP phones I already had around the house. The forecasts look like this:



    The script to generate these weather forecasts is relatively simple python, and you can see the source code on github.

    My cunning plan here is to use prometheus' time series database and alert capabilities to drive home automation around my house. The first step for that is to start gathering some simple facts about the home environment so that we can do trending and decision making on them. The code to do this isn't all that complicated. First off, we need to add the python prometheus client to our python environment, which is hopefully a venv:

    pip install prometheus_client
    pip install six
    


    That second dependency isn't a strict requirement for prometheus, but the script I'm working on needs it (because it needs to work out what's a text value, and python 3 is bonkers).

    Next we import the prometheus client in our code and setup the counter registry. At the same time I record when the script was run:

    from prometheus_client import CollectorRegistry, Gauge, push_to_gateway
    
    registry = CollectorRegistry()
    Gauge('job_last_success_unixtime', 'Last time the weather job ran',
          registry=registry).set_to_current_time()
    


    And then we just add gauges for any values we want to add to the pushgateway

    Gauge('_'.join(field), '', registry=registry).set(value)
    


    Finally, the values don't exist in the pushgateway until we actually push them there, which we do like this:

    push_to_gateway('localhost:9091', job='weather', registry=registry)
    


    You can see the entire patch I wrote to add prometheus support on github if you're interested in an example with more context.

    Now we can have pretty graphs of temperature and stuff!

    Tags for this post: prometheus monitoring python pushgateway
    Related posts: Recording performance information from short lived processes with prometheus; Basic prometheus setup; Terrible pong; More coding club; Friday ; Buying Time

posted at: 01:08 | path: /prometheus | permanent link to this entry


Fri, 27 Jan 2017



Recording performance information from short lived processes with prometheus

    Now that I'm recording basic statistics about the behavior of my machines, I now want to start tracking some statistics from various scripts I have lying around in cron jobs. In order to make myself sound smarter, I'm going to call these short lived scripts "ephemeral scripts" throughout this document. You're welcome.

    The promethean way of doing this is to have a relay process. Prometheus really wants to know where to find web servers to learn things from, and my ephemeral scripts are both not permanently around and also not running web servers. Luckily, prometheus has a thing called the pushgateway which is designed to handle this situation. I can run just one of these, and then have all my little scripts just tell it things to add to its metrics. Then prometheus regularly scrapes this one process and learns things about those scripts. Its like a game of Telephone, but for processes really.

    First off, let's get the pushgateway running. This is basically the same as the node_exporter from last time:

    $ wget https://github.com/prometheus/pushgateway/releases/download/v0.3.1/pushgateway-0.3.1.linux-386.tar.gz
    $ tar xvzf pushgateway-0.3.1.linux-386.tar.gz
    $ cd pushgateway-0.3.1.linux-386
    $ ./pushgateway
    


    Let's assume once again that we're all adults and did something nicer than that involving configuration management and init scripts.

    The pushgateway implements a relatively simple HTTP protocol to add values to the metrics that it reports. Note that the values wont change once set until you change them again, they're not garbage collected or aged out or anything fancy. Here's a trivial example of adding a value to the pushgateway:

    echo "some_metric 3.14" | curl --data-binary @- http://pushgateway.example.org:9091/metrics/job/some_job
    


    This is stolen straight from the pushgateway README of course. The above command will have the pushgateway start to report a metric called "some_metric" with the value "3.14", for a job called "some_job". In other words, we'll get this in the pushgateway metrics URL:

    # TYPE some_metric untyped
    some_metric{instance="",job="some_job"} 3.14
    


    You can see that this isn't perfect because the metric is untyped (what types exist? we haven't covered that yet!), and has these confusing instance and job labels. One tangent at a time, so let's explain instances and jobs first.

    On jobs and instances

    Prometheus is built for a universe a little bit unlike my home lab. Specifically, it expects there to be groups of processes doing a thing instead of just one. This is especially true because it doesn't really expect things like the pushgateway to be proxying your metrics for you because there is an assumption that every process will be running its own metrics server. This leads to some warts, which I'll explain in a second. Let's start by explaining jobs and instances.

    For a moment, assume that we're running the world's most popular wordpress site. The basic architecture for our site is web frontends which run wordpress, and database servers which store the content that wordpress is going to render. When we first started our site it was all easy, as they could both be on the same machine or cloud instance. As we grew, we were first forced to split apart the frontend and the database into separate instances, and then forced to scale those two independently -- perhaps we have reasonable database performance so we ended up with more web frontends than we did database servers.

    So, we go from something like this:



    To an architecture which looks a bit like this:



    Now, in prometheus (i.e. google) terms, there are three jobs here. We have web frontends, database masters (the top one which is getting all the writes), and database slaves (the bottom one which everyone is reading from). For one of the jobs, the frontends, there is more than one instance of the job. To put that into pictures:



    So, the topmost frontend job would be job="fe" and instance="0". Google also had a cool way to lookup jobs and instances via DNS, but that's a story for another day.

    To harp on a point here, all of these processes would be running a web server exporting metrics in google land -- that means that prometheus would know that its monitoring a frontend job because it would be listed in the configuration file as such. You can see this in the configuration file from the previous post. Here's the relevant snippet again:

      - job_name: 'node'
        static_configs:
          - targets: ['molokai:9100', 'dell:9100', 'eeebox:9100']
    


    The job "node" runs on three targets (instances), named "molokai:9100", "dell:9100", and "eeebox:9100".

    However, we live in the ghetto for these ephemeral scripts and want to use the pushgateway for more than one such script, so we have to tell lies via the pushgateway. So for my simple emphemeral script, we'll tell the pushgateway that the job is the script name and the instance can be an empty string. If we don't do that, then prometheus will think that the metric relates to the pushgateway process itself, instead of the ephemeral process.

    We tell the pushgateway what job and instance to use like this:

    echo "some_metric 3.14" | curl --data-binary @- http://localhost:9091/metrics/job/frontend/instance/0
    


    Now we'll get this at the metrics URL:

    # TYPE some_metric untyped
    some_metric{instance="",job="some_job"} 3.14
    some_metric{instance="0",job="frontend"} 3.14
    


    The first metric there is from our previous attempt (remember when I said that values are never cleared out?), and the second one is from our second attempt. To clear out values you'll need to restart the pushgateway process. For simple ephemeral scripts, I think its ok to leave the instance empty, and just set a job name -- as long as that job name is globally unique.

    We also need to tell prometheus to believe our lies about the job and instance for things reported by the pushgateway. The scrape configuration for the pushgateway therefore ends up looking like this:

      - job_name: 'pushgateway'
        honor_labels: true
        static_configs:
          - targets: ['molokai:9091']
    


    Note the honor_labels there, that's the believing the lies bit.

    There is one thing to remember here before we can move on. Job names are being blindly trusted from our reporting. So, its now up to us to keep job names unique. So if we export a metric on every machine, we might want to keep the job name specific to the machine. That said, it really depends on what you're trying to do -- so just pay attention when picking job and instance names.

    On metric types

    Prometheus supports a couple of different types for the metrics which are exported. For now we'll discuss two, and we'll cover the third later. The types are:

    • Gauge: a value which goes up and down over time, like the fuel gauge in your car. Non-motoring examples would include the amount of free disk space on a given partition, the amount of CPU in use, and so forth.
    • Counter: a value which always increases. This might be something like the number of bytes sent by a network card -- the value only resets when the network card is reset (probably by a reboot). These only-increasing types are valuable because its easier to do maths on them in the monitoring system.
    • Histograms: a set of values broken into buckets. For example, the response time for a given web page would probably be reported as a histogram. We'll discuss histograms in more detail in a later post.


    I don't really want to dig too deeply into the value types right now, apart from explaining that our previous examples haven't specified a type for the metrics being provided, and that this is undesirable. For now we just need to decide if the value goes up and down (a gauge) or just up (a counter). You can read more about prometheus types at https://prometheus.io/docs/concepts/metric_types/ if you want to.

    A typed example

    So now we can go back and do the same thing as before, but we can do it with typing like adults would. Let's assume that the value of pi is a gauge, and goes up and down depending on the vagaries of space time. Let's also show that we can add a second metric at the same time because we're fancy like that. We'd therefore need to end up doing something like (again heavily based on the contents of the README):

    cat <<EOF | curl --data-binary @- http://pushgateway.example.org:9091/metrics/job/frontend/instance/0
    # TYPE some_metric gauge
    # HELP approximate value of pi in the current space time continuum
    some_metric 3.14
    # TYPE another_metric counter
    # HELP another_metric Just an example.
    another_metric 2398
    EOF
    


    And we'd end up with values like this in the pushgateway metrics URL:

    # TYPE some_metric gauge
    some_metric{instance="0",job="frontend"} 3.14
    # HELP another_metric Just an example.
    # TYPE another_metric counter
    another_metric{instance="0",job="frontend"} 2398
    


    A tangible example

    So that's a lot of talking. Let's deploy this in my home lab for something actually useful. The node_exporter does not report any SMART health details for disks, and that's probably a thing I'd want to alert on. So I wrote this simple script:

    #!/bin/bash
    
    hostname=`hostname | cut -f 1 -d "."`
    
    for disk in /dev/sd[a-z]
    do
      disk=`basename $disk`
    
      # Is this a USB thumb drive?
      if [ `/usr/sbin/smartctl -H /dev/$disk | grep -c "Unknown USB bridge"` -gt 0 ]
      then
        result=1
      else
        result=`/usr/sbin/smartctl -H /dev/$disk | grep -c "overall-health self-assessment test result: PASSED"`
      fi
    
      cat <<EOF | curl --data-binary @- http://localhost:9091/metrics/job/$hostname/instance/$disk
      # TYPE smart_health_passed gauge
      # HELP whether or not a disk passed a "smartctl -H /dev/sdX"
      smart_health_passed $result
    EOF
    done
    


    Now, that's not perfect and I am sure that I'll re-write this in python later, but it is actually quite useful already. It will report if a SMART health check failed, and now I could write an alerting rule which looks for disks with a health value of 0 and send myself an email to go to the hard disk shop. Once your pushgateways are being scraped by prometheus, you'll end up with something like this in the console:



    I'll explain how to turn this into alerting later.

    Tags for this post: prometheus monitoring ephemeral_script pushgateway
    Related posts: A pythonic example of recording metrics about ephemeral scripts with prometheus; Basic prometheus setup; Friday ; Buying Time; The System of the World; The Ghost Brigades (2)

posted at: 20:17 | path: /prometheus | permanent link to this entry


Thu, 26 Jan 2017



Basic prometheus setup

    I've been playing with prometheus for monitoring. It feels quite familiar to me because its based on an internal google technology called borgmon, but I suspect that means it feels really weird to everyone else.

    The first thing to realize is that everything at google is a web server. Your short lived tool that copies some files around probably runs a web server. All of these web servers have built in URLs which report the progress and status of the task at hand. Prometheus is built to: scrape those web servers; aggregate the data; store the data into a time series database; and then perform dashboarding, trending and alerting on that data.

    The most basic example is to just export metrics for each machine on my home network. This is the easiest first step, because we don't need to build any software to do this. First off, let's install node_exporter on each machine. node_exporter is the tool which runs a web server to export metrics for each node. Everything in prometheus land is written in go, which is new to me. However, it does make running node exporter easy -- just grab the relevant binary from https://prometheus.io/download/, untar, and run. Let's do it in a command line script example thing:

    $ wget https://github.com/prometheus/node_exporter/releases/download/v0.14.0-rc.1/node_exporter-0.14.0-rc.1.linux-386.tar.gz
    $ tar xvzf node_exporter-0.14.0-rc.1.linux-386.tar.gz
    $ cd node_exporter-0.14.0-rc.1.linux-386
    $ ./node_exporter
    


    That's all it takes to run the node_exporter. This runs a web server at port 9100, which exposes the following metrics:

    $ curl -s http://localhost:9100/metrics | grep filesystem_free | grep 'mountpoint="/data"'
    node_filesystem_free{device="/dev/mapper/raidvg-srvlv",fstype="xfs",mountpoint="/data"} 6.811044864e+11
    


    Here you can see that the system I'm running on is exporting a filesystem_free value for the filesystem mounted at /data. There's a lot more than that exported, and I'd encourage you to poke around at that URL a little before continuing on.

    So that's lovely, but we really want to record that over time. So let's assume that you have one of those running on each of your machines, and that you have it setup to start on boot. I'll leave the details of that out of this post, but let's just say I used my existing puppet infrastructure.

    Now we need the central process which collects and records the values. That's the actual prometheus binary. Installation is again trivial:

    $ wget https://github.com/prometheus/prometheus/releases/download/v1.5.0/prometheus-1.5.0.linux-386.tar.gz
    $ tar xvzf prometheus-1.5.0.linux-386.tar.gz
    $ cd prometheus-1.5.0.linux-386
    


    Now we need to move some things around to install this nicely. I did the puppet equivalent of:

    • Moving the prometheus file to /usr/bin
    • Creating an /etc/prometheus directory and moving console_libraries and consoles into it
    • Creating a /etc/prometheus/prometheus.yml config file, more on the contents on this one in a second
    • And creating an empty data directory, in my case at /data/prometheus


    The config file needs to list all of your machines. I am sure this could be generated with puppet templating or something like that, but for now here's my simple hard coded one:

    # my global config
    global:
      scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
      evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
      # scrape_timeout is set to the global default (10s).
    
      # Attach these labels to any time series or alerts when communicating with
      # external systems (federation, remote storage, Alertmanager).
      external_labels:
          monitor: 'stillhq'
    
    # Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
    rule_files:
      # - "first.rules"
      # - "second.rules"
    
    # A scrape configuration containing exactly one endpoint to scrape:
    # Here it's Prometheus itself.
    scrape_configs:
      # The job name is added as a label `job=` to any timeseries scraped from this config.
      - job_name: 'prometheus'
    
        # metrics_path defaults to '/metrics'
        # scheme defaults to 'http'.
    
        static_configs:
          - targets: ['molokai:9090']
    
      - job_name: 'node'
        static_configs:
          - targets: ['molokai:9100', 'dell:9100', 'eeebox:9100']
    


    Here you can see that I want to scrape each of my web servers which exports metrics every 15 seconds, and I also want to calculate values (such as firing alerts) every 15 seconds too. This might not scale if you have bajillions of processes or machines to monitor. I also label all of my values as coming from my domain, so that if I ever aggregate these values with another prometheus from somewhere else the origin will be clear.

    The other interesting bit for now is the scrape configuration. This lists the metrics exporters to monitor. In this case its prometheus itself (molokai:9090), and then each of my machines in the home lab (molokai, dell, and eeebox -- all on port 9100). Remember, port 9090 is the prometheus binary itself and port 9100 is that node_exporter binary we now have running on all of our machines.

    Now if we start prometheus, it will do its thing. There is some configuration which needs to be passed on the command line here (instead of in the configration file), so my command line looks like this:

    /usr/bin/prometheus -config.file=/etc/prometheus/prometheus.yml \
        -web.console.libraries=/etc/prometheus/console_libraries \
        -web.console.templates=/etc/prometheus/consoles \
        -storage.local.path=/data/prometheus
    


    Prometheus also presents an interactive user interface on port 9090, which is handy. Here's an example of it graphing the load average on each of my machines (it was something which caused a nice jaggy line):



    You can see here that the user interface has a drop down for selecting values that are known, and that the key at the bottom tells you things about each time series in the graph. So for example, if we added {instance="eeebox:9100"} to the end of the value in the text box at the top, then we'd be filtering for values with that label set, and would as a result only show one value in the graph (the one for eeebox).

    If you're interested in very simple dashboarding of basic system metrics, that's actually all you need to do. In my next post about prometheus I'm going to show how to write your own binary which exports values to be graphed. In my case, the temperature outside my house.

    Tags for this post: prometheus monitoring node_exporter
    Related posts: Recording performance information from short lived processes with prometheus; A pythonic example of recording metrics about ephemeral scripts with prometheus; Friday ; Buying Time; The System of the World; The Ghost Brigades (2)

posted at: 21:23 | path: /prometheus | permanent link to this entry


Mon, 23 Jan 2017



Gods of Metal




    ISBN: 9780141982267
    LibraryThing
    In this follow-up to Command and Control, Schlosser explores the conscientious objectors and protestors who have sought to highlight not just the immorality of nuclear weapons, but the hilariously insecure state the US government stores them in. In all seriousness, we are talking grannies with heart conditions being able to break in.

    My only real objection to this book is that is more of a pamphlet than a book, and feels a bit like things that didn't make it into the main book. That said, it is well worth the read.

    Tags for this post: book eric_schlosser nuclear weapons safety protest
    Related posts: Command and Control; Random fact for the day; Random linkage; Fast Food Nation; Ghost; Starfish Prime


posted at: 02:38 | path: /book/Eric_Schlosser | permanent link to this entry


Sat, 17 Dec 2016



A Walk in the Woods

posted at: 23:22 | path: /book/Bill_Bryson | permanent link to this entry


Sat, 10 Dec 2016



Leviathan Wakes

posted at: 21:16 | path: /book/James_SA_Corey | permanent link to this entry


Fri, 27 May 2016



Oryx and Crake




    ISBN: 9780385721677
    LibraryThing
    I bought this book ages ago, on the recommendation of a friend (I don't remember who), but I only just got around to reading it. Its a hard book to read in places -- its not hopeful, or particularly fun, and its confronting in places -- especially the plot that revolves around child exploitation. There's very little to like about the future society that Atwood posits here, but perhaps that's the point.

    Despite not being a happy fun story, the book made me think about things like genetic engineering in a way I didn't before and I think that's what Atwood was seeking to achieve. So I'd have to describe the book as a success.

    Tags for this post: book margaret_atwood apocalypse genetic_engineering
    Related posts: Cyteen: The Rebirth; Against the Tide; The Exterminator's Want Ad; The Chronicles of Pern: First Fall; Runner; Emerald Sea


posted at: 03:07 | path: /book/Margaret_Atwood | permanent link to this entry


Sun, 22 May 2016



Potato Point

posted at: 18:21 | path: /diary/pictures/20160523 | permanent link to this entry


Sat, 23 Apr 2016



High Output Management




    ISBN: 9780679762881
    LibraryThing
    A reading group of managers at work has been reading this book, except for the last chapter which we were left to read by ourselves. Overall, the book is interesting and very readable. Its a little dated, being all excited with the invention of email and some unfortunate gender pronouns, but if you can get past those minor things there is a lot of wise advice here. I'm not sure I agree with 100% of it, but I do think the vast majority is of interest. A well written book that I'd recommend to new managers.

    Tags for this post: book andy_gove management intel non_fiction
    Related posts: Being Geek; Looking for web form state management; PDF/A; I Know You Got Soul; The Bad Popes; On Cars


posted at: 01:30 | path: /book/Andy_Gove | permanent link to this entry


Wed, 20 Apr 2016



Bad Pharma




    ISBN: 9780007350742
    LibraryThing
    Another excellent book by Ben Goldacre. In this book he argues that modern medicine is terribly corrupted by the commercial forces that act largely unchecked in the marketplace -- studies which don't make a new drug look good go missing; new drugs are compared only against placebo and not against the current best treatment; doctors are routinely bribed with travel, training and small perks. Overall I'm left feeling like things haven't improved much since this book was published, given that these behaviors still seem common.

    The book does offer concrete actions that we could take to fix things, but I don't see many of these happening any time soon, which is a worrying place to be. Overall, a disturbing but important read.

    Tags for this post: book ben_goldacre medicine science corruption non_fiction
    Related posts: Bad Science; Sixty five roses (Cystic Fibrosis); I Know You Got Soul; The Bad Popes; On Cars; MythBuntu 8.10 just made me sad


posted at: 16:53 | path: /book/Ben_Goldacre | permanent link to this entry


Mon, 11 Apr 2016



Exploring the Jagungal

    Peter Thomas kindly arranged for a variety of ACT Scout leaders to take a tour of the Jagungal portion of Kosciuszko National Park under the guidance of Robert Green. Robert is very experienced with this area, and has recently written a book. Five leaders from the Macarthur Scout Group decided to go along on this tour and take a look at our hiking options in the area.

    The first challenge is getting to the area. The campsite we used for the first day is only accessible to four wheel drive vehicles -- the slope down to the camp site from Nimmo Plain is quite rocky and has some loose sections. That said, the Landcruiser I was in had no trouble making the trip, and the group managed to get two car style four wheel drives into the area without problems as well. The route to Nimmo Plain from the south of Canberra is as follows:




    We explored two areas which are both a short drive from Nimmo Plain. We in fact didn't explore anything at Nimmo Plain itself, but as the intermediate point where the road forks it makes sense to show that bit of route first. From Nimmo Plain, it you turn left you end up where we camped for the first day, which is a lovely NWPS camp site with fire pits, a pit toilet, and trout in the river.

    The route to that camp site is like this:




    From this campsite we did a 14km loop walk, which took in a series of huts and ruins along relatively flat and easy terrain. There are certainly good walking options here for Scouts, especially those which don't particularly like hills. The route for the first day was like this:




    Its a fantastic area, very scenic without being difficult terrain...

                                               

    As you can see from the pictures, life around the camp fire that evening was pretty hard. One note on the weather though -- even at the start of April we're already starting to see very cool overnight weather in this area, with a definite frost on the tents and cars in the morning. I wouldn't want to be hiking in this area much later in the season than this without being prepared for serious cold weather.

       

    The next day we drove back to Nimmo Plain and turned right. You then proceed down a dirt road that is marked as private property, but has a public right of way through to the national park. At the border of the park you can leave the car again and go for another walk. The route to this second entrance to the park is like this:




                         

    This drive on the second morning involved a couple of river crossings, with some representative pictures below. Why does the red Landcruiser get to do the crossing three times? Well that's what happens when you forget to shut the gate...

                                                       

    Following that we did a short 5km return walk to Cesjack's Hut, which again wasn't scenic at all...




                                         

    I took some pictures on the drive home too of course...

                 

    Tags for this post: blog pictures 20160409-jagungal photo kosciuszko scouts bushwalk
    Related posts: Light to Light, Day Three; Light to Light, Day Two; Light to Light, Day One; Scout activity: orienteering at Mount Stranger; Potato Point

posted at: 00:17 | path: /diary/pictures/20160409-jagungal | permanent link to this entry


Tue, 15 Mar 2016



Downbelow Station

posted at: 22:40 | path: /book/C_J_Cherryh | permanent link to this entry