Content here is by Michael Still mikal@stillhq.com. All opinions are my own.
See recent comments. RSS feed of all comments.


Thu, 21 Aug 2014



Juno nova mid-cycle meetup summary: conclusion

posted at: 23:47 | path: /openstack/juno | permanent link to this entry


Juno nova mid-cycle meetup summary: the next generation Nova API

    This is the final post in my series covering the highlights from the Juno Nova mid-cycle meetup. In this post I will cover our next generation API, which used to be called the v3 API but is largely now referred to as the v2.1 API. Getting to this point has been one of the more painful processes I think I've ever seen in Nova's development history, and I think we've learnt some important things about how large distributed projects operate along the way. My hope is that we remember these lessons next time we hit something as contentious as our API re-write has been.

    Now on to the API itself. It started out as an attempt to improve our current API to be more maintainable and less confusing to our users. We deliberately decided that we would not focus on adding features, but instead attempt to reduce as much technical debt as possible. This development effort went on for about a year before we realized we'd made a mistake. The mistake we made is that we assumed that our users would agree it was trivial to move to a new API, and that they'd do that even if there weren't compelling new features, which it turned out was entirely incorrect.

    I want to make it clear that this wasn't a mistake on the part of the v3 API team. They implemented what the technical leadership of Nova at the time asked for, and were very surprised when we discovered our mistake. We've now spent over a release cycle trying to recover from that mistake as gracefully as possible, but the upside is that the API we will be delivering is significantly more future proof than what we have in the current v2 API.

    At the Atlanta Juno summit, it was agreed that the v3 API would never ship in its current form, and that what we would instead do is provide a v2.1 API. This API would be 99% compatible with the current v2 API, with the incompatible things being stuff like if you pass a malformed parameter to the API we will now tell you instead of silently ignoring it, which we call 'input validation'. The other thing we are going to add in the v2.1 API is a system of 'micro-versions', which allow a client to specify what version of the API it understands, and for the server to gracefully degrade to older versions if required.

    This micro-version system is important, because the next step is to then start adding the v3 cleanups and fixes into the v2.1 API, but as a series of micro-versions. That way we can drag the majority of our users with us into a better future, without abandoning users of older API versions. I should note at this point that the mechanics for deciding what the minimum micro-version a version of Nova will support are largely undefined at the moment. My instinct is that we will tie to stable release versions in some way; if your client dates back to a release of Nova that we no longer support, then we might expect you to upgrade. However, that hasn't been debated yet, so don't take my thoughts on that as rigid truth.

    Frustratingly, the intent of the v2.1 API has been agreed and unchanged since the Atlanta summit, yet we're late in the Juno release and most of the work isn't done yet. This is because we got bogged down in the mechanics of how micro-versions will work, and how the translation for older API versions will work inside the Nova code later on. We finally unblocked this at the mid-cycle meetup, which means this work can finally progress again.

    The main concern that we needed to resolve at the mid-cycle was the belief that if the v2.1 API was implemented as a series of translations on top of the v3 code, then the translation layer would be quite thick and complicated. This raises issues of maintainability, as well as the amount of code we need to understand. The API team has now agreed to produce an API implementation that is just the v2.1 functionality, and will then layer things on top of that. This is actually invisible to users of the API, but it leaves us with an implementation where changes after v2.1 are additive, which should be easier to maintain.

    One of the other changes in the original v3 code is that we stopped proxying functionality for Neutron, Cinder and Glance. With the decision to implement a v2.1 API instead, we will need to rebuild that proxying implementation. To unblock v2.1, and based on advice from the HP and Rackspace public cloud teams, we have decided to delay implementing these proxies. So, the first version of the v2.1 API we ship will not have proxies, but later versions will add them in. The current v2 API implementation will not be removed until all the proxies have been added to v2.1. This is prompted by the belief that many advanced API users don't use the Nova API proxies, and therefore could move to v2.1 without them being implemented.

    Finally, I want to thank the Nova API team, especially Chris Yeoh and Kenichi Oomichi for their patience with us while we have worked through these complicated issues. It's much appreciated, and I find them a consistent pleasure to work with.

    That brings us to the end of my summary of the Nova Juno mid-cycle meetup. I'll write up a quick summary post that ties all of the posts together, but apart from that this series is now finished. Thanks for following along.

    Tags for this post: openstack juno nova mid-cycle summary api v3 v2.1
    Related posts: Juno nova mid-cycle meetup summary: nova-network to Neutron migration; Juno nova mid-cycle meetup summary: scheduler; Juno nova mid-cycle meetup summary: ironic; Juno nova mid-cycle meetup summary: conclusion; Juno nova mid-cycle meetup summary: DB2 support; Juno nova mid-cycle meetup summary: social issues

posted at: 16:52 | path: /openstack/juno | permanent link to this entry


Tue, 19 Aug 2014



Juno nova mid-cycle meetup summary: nova-network to Neutron migration

    This will be my second last post about the Juno Nova mid-cycle meetup, which covers the state of play for work on the nova-network to Neutron upgrade.

    First off, some background information. Neutron (formerly Quantum) was developed over a long period of time to replace nova-network, and added to the OpenStack Folsom release. The development of new features for nova-network was frozen in the Nova code base, so that users would transition to Neutron. Unfortunately the transition period took longer than expected. We ended up having to unfreeze development of nova-network, in order to fix reliability problems that were affecting our CI gating and the reliability of deployments for existing nova-network users. Also, at least two OpenStack companies were carrying significant feature patches for nova-network, which we wanted to merge into the main code base.

    You can see the announcement at http://lists.openstack.org/pipermail/openstack-dev/2014-January/025824.html. The main enhancements post-freeze were a conversion to use our new objects infrastructure (and therefore conductor), as well as features that were being developed by Nebula. I can't find any contributions from the other OpenStack company in the code base at this time, so I assume they haven't been proposed.

    The nova-network to Neutron migration path has come to the attention of the OpenStack Technical Committee, who have asked for a more formal plan to address Neutron feature gaps and deprecate nova-network. That plan is tracked at https://wiki.openstack.org/wiki/Governance/TechnicalCommittee/Neutron_Gap_Coverage. As you can see, there are still some things to be merged which are targeted for juno-3. At the time of writing this includes grenade testing; Neutron being the devstack default; a replacement for nova-network multi-host; a migration plan; and some documentation. They are all making good progress, but until these action items are completed, Nova can't start the process of deprecating nova-network.

    The discussion at the Nova mid-cycle meetup was around the migration planning item in the plan. There is a Nova specification that outlines one possible plan for live upgrading instances (i.e, no instance downtime) at https://review.openstack.org/#/c/101921/, but this will probably now be replaced with a simpler migration path involving cold migrations. This is prompted by not being able to find a user that absolutely has to have live upgrade. There was some confusion, because of a belief that the TC was requiring a live upgrade plan. But as Russell Bryant says in the meetup etherpad:

    "Note that the TC has made no such statement on migration expectations other than a migration path must exist, both projects must agree on the plan, and that plan must be submitted to the TC as a part of the project's graduation review (or project gap review in this case). I wouldn't expect the TC to make much of a fuss about the plan if both Nova and Neutron teams are in agreement."


    The current plan is to go forward with a cold upgrade path, unless a user comes forward with an absolute hard requirement for a live upgrade, and a plan to fund developers to work on it.

    At this point, it looks like we are on track to get all of the functionality we need from Neutron in the Juno release. If that happens, we will start the nova-network deprecation timer in Kilo, with my expectation being that nova-network would be removed in the "M" release. There is also an option to change the default networking implementation to Neutron before the deprecation of nova-network is complete, which will mean that new deployments are defaulting to the long term supported option.

    In the next (and probably final) post in this series, I'll talk about the API formerly known as Nova API v3.

    Tags for this post: openstack juno nova mid-cycle summary nova-network neutron migration
    Related posts: Juno nova mid-cycle meetup summary: scheduler; Juno nova mid-cycle meetup summary: ironic; Juno nova mid-cycle meetup summary: conclusion; Juno nova mid-cycle meetup summary: DB2 support; Juno nova mid-cycle meetup summary: social issues; Juno nova mid-cycle meetup summary: slots

posted at: 20:37 | path: /openstack/juno | permanent link to this entry


Juno nova mid-cycle meetup summary: slots

    If I had to guess what would be a controversial topic from the mid-cycle meetup, it would have to be this slots proposal. I was actually in a Technical Committee meeting when this proposal was first made, but I'm told there were plenty of people in the room keen to give this idea a try. Since the mid-cycle Joe Gordon has written up a more formal proposal, which can be found at https://review.openstack.org/#/c/112733.

    If you look at the last few Nova releases, core reviewers have been drowning under code reviews, so we need to control the review workload. What is currently happening is that everyone throws up their thing into Gerrit, and then each core tries to identify the important things and review them. There is a list of prioritized blueprints in Launchpad, but it is not used much as a way of determining what to review. The result of this is that there are hundreds of reviews outstanding for Nova (500 when I wrote this post). Many of these will get a review, but it is hard for authors to get two cores to pay attention to a review long enough for it to be approved and merged.

    If we could rate limit the number of proposed reviews in Gerrit, then cores would be able to focus their attention on the smaller number of outstanding reviews, and land more code. Because each review would merge faster, we believe this rate limiting would help us land more code rather than less, as our workload would be better managed. You could argue that this will mean we just say 'no' more often, but that's not the intent, it's more about bringing focus to what we're reviewing, so that we can get patches through the process completely. There's nothing more frustrating to a code author than having one +2 on their code and then hitting some merge freeze deadline.

    The proposal is therefore to designate a number of blueprints that can be under review at any one time. The initial proposal was for ten, and the term 'slot' was coined to describe the available review capacity. If your blueprint was not allocated a slot, then it would either not be proposed in Gerrit yet, or if it was it would have a procedural -2 on it (much like code reviews associated with unapproved specifications do now).

    The number of slots is arbitrary at this point. Ten is our best guess of how much we can dilute core's focus without losing efficiency. We would tweak the number as we gained experience if we went ahead with this proposal. Remember, too, that a slot isn't always a single code review. If the VMWare refactor was in a slot for example, we might find that there were also ten code reviews associated with that single slot.

    How do you determine what occupies a review slot? The proposal is to groom the list of approved specifications more carefully. We would collaboratively produce a ranked list of blueprints in the order of their importance to Nova and OpenStack overall. As slots become available, the next highest ranked blueprint with code ready for review would be moved into one of the review slots. A blueprint would be considered 'ready for review' once the specification is merged, and the code is complete and ready for intensive code review.

    What happens if code is in a slot and something goes wrong? Imagine if a proposer goes on vacation and stops responding to review comments. If that happened we would bump the code out of the slot, but would put it back on the backlog in the location dictated by its priority. In other words there is no penalty for being bumped, you just need to wait for a slot to reappear when you're available again.

    We also talked about whether we were requiring specifications for changes which are too simple. If something is relatively uncontroversial and simple (a better tag for internationalization for example), but not a bug, it falls through the cracks of our process at the moment and ends up needing to have a specification written. There was talk of finding another way to track this work. I'm not sure I agree with this part, because a trivial specification is a relatively cheap thing to do. However, it's something I'm happy to talk about.

    We also know that Nova needs to spend more time paying down its accrued technical debt, which you can see in the huge amount of bugs we have outstanding at the moment. There is no shortage of people willing to write code for Nova, but there is a shortage of people fixing bugs and working on strategic things instead of new features. If we could reserve slots for technical debt, then it would help us to get people to work on those aspects, because they wouldn't spend time on a less interesting problem and then discover they can't even get their code reviewed. We even talked about having an alternating focus for Nova releases; we could have a release focused on paying down technical debt and stability, and then the next release focused on new features. The Linux kernel does something quite similar to this and it seems to work well for them.

    Using slots would allow us to land more valuable code faster. Of course, it also means that some patches will get dropped on the floor, but if the system is working properly, those features will be ones that aren't important to OpenStack. Considering that right now we're not landing many features at all, this would be an improvement.

    This proposal is obviously complicated, and everyone will have an opinion. We haven't really thought through all the mechanics fully, yet, and it's certainly not a done deal at this point. The ranking process seems to be the most contentious point. We could encourage the community to help us rank things by priority, but it's not clear how that process would work. Regardless, I feel like we need to be more systematic about what code we're trying to land. It's embarrassing how little has landed in Juno for Nova, and we need to be working on that. I would like to continue discussing this as a community to make sure that we end up with something that works well and that everyone is happy with.

    This series is nearly done, but in the next post I'll cover the current status of the nova-network to neutron upgrade path.

    Tags for this post: openstack juno nova mid-cycle summary review slots blueprint priority project management
    Related posts: Juno nova mid-cycle meetup summary: social issues; Juno nova mid-cycle meetup summary: nova-network to Neutron migration; Juno nova mid-cycle meetup summary: scheduler; Juno nova mid-cycle meetup summary: ironic; Juno nova mid-cycle meetup summary: conclusion; Juno nova mid-cycle meetup summary: DB2 support

posted at: 00:34 | path: /openstack/juno | permanent link to this entry


Sun, 17 Aug 2014



Juno nova mid-cycle meetup summary: scheduler

    This post is in a series covering the discussions at the Juno Nova mid-cycle meetup. This post will cover the current state of play of our scheduler refactoring efforts. The scheduler refactor has been running for a fair while now, dating back to at least the Hong Kong summit (so about 1.5 release cycles ago).

    The original intent of the scheduler sub-team's effort was to pull the scheduling code out of Nova so that it could be rapidly iterated on its own, with the eventual goal being to support a single scheduler across the various OpenStack services. For example, the scheduler that makes placement decisions about your instances could also be making decisions about the placement of your storage resources and could therefore ensure that they are co-located as much as possible.

    During this process we realized that a big bang replacement is actually much harder than we thought, and the plan has morphed into being a multi-phase effort. The first step is to make the interface for the scheduler more clearly defined inside the Nova code base. For example, in previous releases, it was the scheduler that launched instances: the API would ask the scheduler to find available hypervisor nodes, and then the scheduler would instruct those nodes to boot the instances. We need to refactor this so that the scheduler picks a set of nodes, but then the API is the one which actually does the instance launch. That way, when the scheduler does move out it's not trusted to perform actions that change hypervisor state, and the Nova code does that for it. This refactoring work is under way, along with work to isolate the SQL database accesses inside the scheduler.

    I would like to set expectations that this work is what will land in Juno. It has little visible impact for users, but positions us to better solve these problems in Kilo.

    We discussed the need to ensure that any new scheduler is at least as fast and accurate as the current one. Jay Pipes has volunteered to work with the scheduler sub-team to build a testing framework to validate this work. Jay also has some concerns about the resource tracker work that is being done at the moment that he is going to discuss with the scheduler sub-team. Since the mid-cycle meetup there has been a thread on the openstack-dev mailing list about similar resource tracker concerns (here), which might be of interest to people interested in scheduler work.

    We also need to test our assumption at some point that other OpenStack services such as Neutron and Cinder would be even willing to share a scheduler service if a central one was implemented. We believe that Neutron is interested, but we shouldn't be surprising our fellow OpenStack projects by just appearing with a complete solution. There is a plan to propose a cross-project session at the Paris summit to cover this work.

    In the next post in this series we'll discuss possibly the most controversial part of the mid-cycle meetup. The proposal for "slots" for landing blueprints during Kilo.

    Tags for this post: openstack juno nova mid-cycle summary scheduler
    Related posts: Juno nova mid-cycle meetup summary: nova-network to Neutron migration; Juno nova mid-cycle meetup summary: ironic; Juno nova mid-cycle meetup summary: conclusion; Juno nova mid-cycle meetup summary: DB2 support; Juno nova mid-cycle meetup summary: social issues; Juno nova mid-cycle meetup summary: slots

posted at: 20:06 | path: /openstack/juno | permanent link to this entry


Juno nova mid-cycle meetup summary: bug management

    Welcome to the next exciting installment of the Nova Juno mid-cycle meetup summary. In the previous chapter, our hero battled a partially complete cells implementation, by using his +2 smile of good intentions. In this next exciting chapter, watch him battle our seemingly never ending pile of bugs! Sorry, now that I'm on to my sixth post in this series I feel like it's time to get more adventurous in the introductions.

    For at least the last cycle, and probably longer, Nova has been struggling with the number of bugs filed in Launchpad. I don't think the problem is that Nova has terrible code, it is instead that we have a lot of users filing bugs, and the team working on triaging and closing bugs is small. The complexity of the deployment options with Nova make this problem worse, and that complexity increases as we allow new drivers for things like different storage engines to land in the code base.

    The increasing number of permutations possible with Nova configurations is a problem for our CI systems as well, as we don't cover all of these options and this sometimes leads us to discover that they don't work as expected in the field. CI is a tangent from the main intent of this post though, so I will reserve further discussion of our CI system until a later post.

    Tracy Jones and Joe Gordon have been doing good work in this cycle trying to get a grip on the state of the bugs filed against Nova. For example, a very large number of bugs (hundreds) were for problems we'd fixed, but where the bug bot had failed to close the bug when the fix merged. Many other bugs were waiting for feedback from users, but had been waiting for longer than six months. In both those cases the response was to close the bug, with the understanding that the user can always reopen it if they come back to talk to us again. Doing "quick hit" things like this has reduced our open bug count to about one thousand bugs. You can see a dashboard that Tracy has produced that shows the state of our bugs at http://54.201.139.117/nova-bugs.html. I believe that Joe has been moving towards moving this onto OpenStack hosted infrastructure, but this hasn't happened yet.

    At the mid-cycle meetup, the goal of the conversation was to try and find other ways to get our bug queue further under control. Some of the suggestions were largely mechanical, like tightening up our definitions of the confirmed (we agree this is a bug) and triaged (and we know how to fix it) bug states. Others were things like auto-abandoning bugs which are marked incomplete for more than 60 days without a reply from the person who filed the bug, or unassigning bugs when the review that proposed a fix is abandoned in Gerrit.

    Unfortunately, we have more ideas for how to automate dealing with bugs than we have people writing automation. If there's someone out there who wants to have a big impact on Nova, but isn't sure where to get started, helping us out with this automation would be a super helpful way to get started. Let Tracy or I know if you're interested.

    We also talked about having more targeted bug days. This was prompted by our last bug day being largely unsuccessful. Instead we're proposing that the next bug day have a really well defined theme, such as moving things from the "undecided" to the "confirmed" state, or similar. I believe the current plan is to run a bug day like this after J-3 when we're winding down from feature development and starting to focus on stabilization.

    Finally, I would encourage people fixing bugs in Nova to do a quick search for duplicate bugs when they are closing a bug. I wouldn't be at all surprised to discover that there are many bugs where you can close duplicates at the same time with minimal effort.

    In the next post I'll cover our discussions of the state of the current scheduler work in Nova.

    Tags for this post: openstack juno nova mi-cycle summary bugs
    Related posts: Juno nova mid-cycle meetup summary: nova-network to Neutron migration; Juno nova mid-cycle meetup summary: scheduler; Juno nova mid-cycle meetup summary: ironic; Juno nova mid-cycle meetup summary: conclusion; Michael's surprisingly unreliable predictions for the Havana Nova release; Juno nova mid-cycle meetup summary: DB2 support

posted at: 19:38 | path: /openstack/juno | permanent link to this entry


Thu, 14 Aug 2014



Juno nova mid-cycle meetup summary: cells

    This is the next post summarizing the Juno Nova mid-cycle meetup. This post covers the cells functionality used by some deployments to scale Nova.

    For those unfamiliar with cells, it's a way of combining smaller Nova installations into a thing which feels like a single large Nova install. So for example, Rackspace deploys Nova in cells of hundreds of machines, and these cells form a Nova availability zone which might contain thousands of machines. The cells in one of these deployments form a tree: users talk to the top level of the tree, which might only contain API services. That cell then routes requests to child cells which can actually perform the operation requested.

    There are a few reasons why Rackspace does this. Firstly, it keeps the MySQL databases smaller, which can improve the performance of database operations and backups. Additionally, cells can contain different types of hardware, which are then partitioned logically. For example, OnMetal (Rackspace's Ironic-based baremetal product) instances come from a cell which contains OnMetal machines and only publishes OnMetal flavors to the parent cell.

    Cells was originally written by Rackspace to meet its deployment needs, but is now used by other sites as well. However, I think it would be a stretch to say that cells is commonly used, and it is certainly not the deployment default. In fact, most deployments don't run any of the cells code, so you can't really call them even a "single cell install". One of the reasons cells isn't more widely deployed is that it doesn't implement the entire Nova API, which means some features are missing. As a simple example, you can't live-migrate an instance between two child cells.

    At the meetup, the first thing we discussed regarding cells was a general desire to see cells finished and become the default deployment method for Nova. Perhaps most people end up running a single cell, but in that case at least the cells code paths are well used. The first step to get there is improving the Tempest coverage for cells. There was a recent openstack-dev mailing list thread on this topic, which was discussed at the meetup. There was commitment from several Nova developers to work on this, and notably not all of them are from Rackspace.

    It's important that we improve the Tempest coverage for cells, because it positions us for the next step in the process, which is bringing feature parity to cells compared with a non-cells deployment. There is some level of frustration that the work on cells hasn't really progressed in Juno, and that it is currently incomplete. At the meetup, we made a commitment to bringing a well-researched plan to the Kilo summit for implementing feature parity for a single cell deployment compared with a current default deployment. We also made a commitment to make cells the default deployment model when this work is complete. If this doesn't happen in time for Kilo, then we will be forced to seriously consider removing cells from Nova. A half-done cells deployment has so far stopped other development teams from trying to solve the problems that cells addresses, so we either need to finish cells, or get out of the way so that someone else can have a go. I am confident that the cells team will take this feedback on board and come to the summit with a good plan. Once we have a plan we can ask the whole community to rally around and help finish this effort, which I think will benefit all of us.

    In the next blog post I will cover something we've been struggling with for the last few releases: how we get our bug count down to a reasonable level.

    Tags for this post: openstack juno nova mid-cycle summary cells
    Related posts: Juno nova mid-cycle meetup summary: nova-network to Neutron migration; Juno nova mid-cycle meetup summary: scheduler; Juno nova mid-cycle meetup summary: ironic; Juno nova mid-cycle meetup summary: conclusion; Juno nova mid-cycle meetup summary: DB2 support; Juno nova mid-cycle meetup summary: social issues

posted at: 21:20 | path: /openstack/juno | permanent link to this entry


Juno nova mid-cycle meetup summary: DB2 support

    This post is one part of a series discussing the OpenStack Nova Juno mid-cycle meetup. It's a bit shorter than most of the others, because the next thing on my list to talk about is DB2, and that's relatively contained.

    IBM is interested in adding DB2 support as a SQL database for Nova. Theoretically, this is a relatively simple thing to do because we use SQLAlchemy to abstract away the specifics of the SQL engine. However, in reality, the abstraction is leaky. The obvious example in this case is that DB2 has different rules for foreign keys than other SQL engines we've used. So, in order to be able to make this change, we need to tighten up our schema for the database.

    The change that was discussed is the requirement that the UUID column on the instances table be not null. This seems like a relatively obvious thing to allow, given that UUID is the official way to identify an instance, and has been for a really long time. However, there are a few things which make this complicated: we need to understand the state of databases that might have been through a long chain of upgrades from previous Nova releases, and we need to ensure that the schema alterations don't cause significant performance problems for existing large deployments.

    As an aside, people sometimes complain that Nova development is too slow these days, and they're probably right, because things like this slow us down. A relatively simple change to our database schema requires a whole bunch of performance testing and negotiation with operators to ensure that its not going to be a problem for people. It's good that we do these things, but sometimes it's hard to explain to people why forward progress is slow in these situations.

    Matt Riedemann from IBM has been doing a good job of handling this change. He's written a tool that operators can run before the change lands in Juno that checks if they have instance rows with null UUIDs. Additionally, the upgrade process has been well planned, and is documented in the specification available on the fancy pants new specs website.

    We had a long discussion about this change at the meetup, and how it would impact on large deployments. Both Rackspace and HP were asked if they could run performance tests to see if the schema change would be a problem for them. Unfortunately HP's testing hardware was tied up with another project, so we only got numbers from Rackspace. For them, the schema change took 42 minutes for a large database. Almost all of that was altering the column to be non-nullable; creating the new index was only 29 seconds of runtime. However, the Rackspace database is large because they don't currently purge deleted rows, if they can get that done before running this schema upgrade then the impact will be much smaller.

    So the recommendation here for operators is that it is best practice to purge deleted rows from your databases before an upgrade, especially when schema migrations need to occur at the same time. There are some other takeaways for operators as well: if we know that operators have a large deployment, then we can ask if an upgrade will be a problem. This is why being active on the openstack-operators mailing list is important. Additionally, if operators are willing to donate a dataset to Turbo-Hipster for DB CI testing, then we can use that in our automation to try and make sure these upgrades don't cause you pain in the future.

    In the next post in this series I'll talk about the future of cells, and the work that needs to be done there to make it a first class citizen.

    Tags for this post: openstack juno nova mid-cycle summary sql database sqlalchemy db2
    Related posts: Juno nova mid-cycle meetup summary: nova-network to Neutron migration; Juno nova mid-cycle meetup summary: scheduler; Juno nova mid-cycle meetup summary: ironic; Juno nova mid-cycle meetup summary: conclusion; Juno nova mid-cycle meetup summary: social issues; Juno nova mid-cycle meetup summary: slots

posted at: 19:20 | path: /openstack/juno | permanent link to this entry


Review priorities as we approach juno-3

    I just send this email out to openstack-dev, but I am posting it here in case it makes it more discoverable to people drowning in email:

    To: openstack-dev
    Subject: [nova] Review priorities as we approach juno-3
    
    Hi.
    
    We're rapidly approaching j-3, so I want to remind people of the
    current reviews that are high priority. The definition of high
    priority I am using here is blueprints that are marked high priority
    in launchpad that have outstanding code for review -- I am sure there
    are other reviews that are important as well, but I want us to try to
    land more blueprints than we have so far. These are listed in the
    order they appear in launchpad.
    
    == Compute Manager uses Objects (Juno Work) ==
    
    https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/compute-manager-objects-juno,n,z
    
    This is ongoing work, but if you're after some quick code review
    points they're very easy to review and help push the project forward
    in an important manner.
    
    == Move Virt Drivers to use Objects (Juno Work) ==
    
    I couldn't actually find any code out for review for this one apart
    from https://review.openstack.org/#/c/94477/, is there more out there?
    
    == Add a virt driver for Ironic ==
    
    This one is in progress, but we need to keep going at it or we wont
    get it merged in time.
    
    * https://review.openstack.org/#/c/111223/ was approved, but a rebased
    ate it. Should be quick to re-approve.
    * https://review.openstack.org/#/c/111423/
    * https://review.openstack.org/#/c/111425/
    * ...there are more reviews in this series, but I'd be super happy to
    see even a few reviewed
    
    == Create Scheduler Python Library ==
    
    * https://review.openstack.org/#/c/82778/
    * https://review.openstack.org/#/c/104556/
    
    (There are a few abandoned patches in this series, I think those two
    are the active ones but please correct me if I am wrong).
    
    == VMware: spawn refactor ==
    
    * https://review.openstack.org/#/c/104145/
    * https://review.openstack.org/#/c/104147/ (Dan Smith's -2 on this one
    seems procedural to me)
    * https://review.openstack.org/#/c/105738/
    * ...another chain with many more patches to review
    
    Thanks,
    Michael
    


    The actual email thread is at http://lists.openstack.org/pipermail/openstack-dev/2014-August/043098.html.

    Tags for this post: openstack juno review nova ptl
    Related posts: Juno Nova PTL Candidacy; Thoughts from the PTL; Havana Nova PTL elections; Expectations of core reviewers; Juno nova mid-cycle meetup summary: social issues; Juno nova mid-cycle meetup summary: slots

posted at: 13:01 | path: /openstack/juno | permanent link to this entry


Juno nova mid-cycle meetup summary: ironic

    Welcome to the third in my set of posts covering discussion topics at the nova juno mid-cycle meetup. The series might never end to be honest.

    This post will cover the progress of the ironic nova driver. This driver is interesting as an example of a large contribution to the nova code base for a couple of reasons -- its an official OpenStack project instead of a vendor driver, which means we should already have well aligned goals. The driver has been written entirely using our development process, so its already been reviewed to OpenStack standards, instead of being a large code dump from a separate development process. Finally, its forced us to think through what merging a non-trivial code contribution should look like, and I think that formula will be useful for later similar efforts, the Docker driver for example.

    One of the sticking points with getting the ironic driver landed is exactly how upgrade for baremetal driver users will work. The nova team has been unwilling to just remove the baremetal driver, as we know that it has been deployed by at least a few OpenStack users -- the largest deployment I am aware of is over 1,000 machines. Now, this unfortunate because the baremetal driver was always intended to be experimental. I think what we've learnt from this is that any driver which merges into the nova code base has to be supported for a reasonable period of time -- nova isn't the right place for experiments. Now that we have the stackforge driver model I don't think that's too terrible, because people can iterate quickly in stackforge, and when they have something stable and supportable they can merge it into nova. This gives us the best of both worlds, while providing a strong signal to deployers about what the nova team is willing to support for long periods of time.

    The solution we came up with for upgrades from baremetal to ironic is that the deployer will upgrade to juno, and then run a script which converts their baremetal nodes to ironic nodes. This script is "off line" in the sense that we do not expect new baremetal nodes to be launchable during this process, nor after it is completed. All further launches would be via the ironic driver.

    These nodes that are upgraded to ironic will exist in a degraded state. We are not requiring ironic to support their full set of functionality on these nodes, just the bare minimum that baremetal did, which is listing instances, rebooting them, and deleting them. Launch is excluded for the reasoning described above.

    We have also asked the ironic team to help us provide a baremetal API extension which knows how to talk to ironic, but this was identified as a need fairly late in the cycle and I expect it to be a request for a feature freeze exception when the time comes.

    The current plan is to remove the baremetal driver in the Kilo release.

    Previously in this post I alluded to the review mechanism we're using for the ironic driver. What does that actually look like? Well, what we've done is ask the ironic team to propose the driver as a series of smallish (500 line) changes. These changes are broken up by functionality, for example the code to boot an instance might be in one of these changes. However, because of the complexity of splitting existing code up, we're not requiring a tempest pass on each step in the chain of reviews. We're instead only requiring this for the final member in the chain. This means that we're not compromising our CI requirements, while maximizing the readability of what would otherwise be a very large review. To stop the reviews from merging before we're comfortable with them, there's a marker review at the beginning of the chain which is currently -2'ed. When all the code is ready to go, I remove the -2 and approve that first review and they should all merge together.

    In the next post I'll cover the state of adding DB2 support to nova.

    Tags for this post: openstack juno nova mid-cycle summary ironic
    Related posts: Juno nova mid-cycle meetup summary: nova-network to Neutron migration; Juno nova mid-cycle meetup summary: scheduler; Juno nova mid-cycle meetup summary: conclusion; Juno nova mid-cycle meetup summary: DB2 support; Juno nova mid-cycle meetup summary: social issues; Juno nova mid-cycle meetup summary: slots

posted at: 01:49 | path: /openstack/juno | permanent link to this entry


Juno nova mid-cycle meetup summary: containers

    This is the second in my set of posts discussing the outcomes from the OpenStack nova juno mid-cycle meetup. I want to focus in this post on things related to container technologies.

    Nova has had container support for a while in the form of libvirt LXC. While it can be argued that this support isn't feature complete and needs more testing, its certainly been around for a while. There is renewed interest in testing libvirt LXC in the gate, and a team at Rackspace appears to be working on this as I write this. We have already seen patches from this team as they fix issues they find on the way. There are no plans to remove libvirt LXC from nova at this time.

    The plan going forward for LXC tempest testing is to add it as an experimental job, so that people reviewing libvirt changes can request the CI system to test LXC by using "check experimental". This hasn't been implemented yet, but will be advertised when it is ready. Once we've seen good stable results from this experimental check we will talk about promoting it to be a full blown check job in our CI system.

    We have also had prototype support for Docker for some time, and by all reports Eric Windisch has been doing good work at getting this driver into a good place since it moved to stackforge. We haven't started talking about specifics for when this driver will return to the nova code base, but I think at this stage we're talking about Kilo at the earliest. The driver has CI now (although its still working through stability issues to my understanding) and progresses well. I expect there to be a session at the Kilo summit in the nova track on the current state of this driver, and we'll decide whether to merge it back into nova then.

    There was also representation from the containers sub-team at the meetup, and they spent most of their time in a break out room coming up with a concrete proposal for what container support should look like going forward. The plan looks a bit like this:

    Nova will continue to support "lowest common denominator containers": by this I mean that things like the libvirt LXC and docker driver will be allowed to exist, and will expose the parts of containers that can be made to look like virtual machines. That is, a caller to the nova API should not need to know if they are interacting with a virtual machine or a container, it should be opaque to them as much as possible. There is some ongoing discussion about the minimum functionality we should expect from a hypervisor driver, so we can expect this minimum level of functionality to move over time.

    The containers sub-team will also write a separate service which exposes a more full featured container experience. This service will work by taking a nova instance UUID, and interacting with an agent within that instance to create containers and manage them. This is interesting because it is the first time that a compute project will have an in operating system agent, although other projects have had these for a while. There was also talk about the service being able to start an instance if the user didn't already have one, or being able to declare an existing instance to be "full" and then create a new one for the next incremental container. These are interesting design issues, and I'd like to see them explored more in a specification.

    This plan met with general approval within the room at the meetup, with the suggestion being that it move forward as a stackforge project as part of the compute program. I don't think much code has been implemented yet, but I hope to see something come of these plans soon. The first step here is to create some specifications for the containers service, which we will presumably create in the nova-specs repository for want of a better place.

    Thanks for reading my second post in this series. In the next post I will cover progress with the Ironic nova driver.

    Tags for this post: openstack juno nova mid-cycle summary containers docker lxc
    Related posts: Juno nova mid-cycle meetup summary: nova-network to Neutron migration; Juno nova mid-cycle meetup summary: scheduler; Juno nova mid-cycle meetup summary: ironic; Juno nova mid-cycle meetup summary: conclusion; Juno nova mid-cycle meetup summary: DB2 support; Juno nova mid-cycle meetup summary: social issues

posted at: 01:17 | path: /openstack/juno | permanent link to this entry


Wed, 13 Aug 2014



Juno nova mid-cycle meetup summary: social issues

    Summarizing three days of the Nova Juno mid-cycle meetup is a pretty hard thing to do - I'm going to give it a go, but just in case I miss things, there is an etherpad with notes from the meetup at https://etherpad.openstack.org/p/juno-nova-mid-cycle-meetup. I'm also going to do it in the form of a series of posts, so as to not hold up any content at all in the wait for perfection. This post covers the mechanics of each day at the meetup, reviewer burnout, and the Juno release.

    First off, some words about the mechanics of the meetup. The meetup was held in Beaverton, Oregon at an Intel campus. Many thanks to Intel for hosting the event -- it is much appreciated. We discussed possible locations and attendance for future mid-cycle meetups, and the consensus is that these events should "always" be in the US because that's where the vast majority of our developers are. We will consider other host countries when the mix of Nova developers change. Additionally, we talked about the expectations of attendance at these events. The Icehouse mid-cycle was an experiment, but now that we've run two of these I think they're clearly useful events. I want to be clear that we expect nova-drivers members to attend these events at all possible, and strongly prefer to have all nova-cores at the event.

    I understand that sometimes life gets in the way, but that's the general expectation. To assist with this, I am going to work on advertising these events much earlier than we have in the past to give time for people to get travel approval. If any core needs me to go to the Foundation and ask for travel assistance, please let me know.

    I think that co-locating the event with the Ironic and Containers teams helped us a lot this cycle too. We can't co-locate with every other team working on OpenStack, but I'd like to see us pick a couple of teams -- who we might be blocking -- each cycle and invite them to co-locate with us. It's easy at this point for Nova to become a blocker for other projects, and we need to be careful not to get in the way unless we absolutely need to.

    The process for each of the three days: we met at Intel at 9am, and started each day by trying to cherry pick the most important topics from our grab bag of items at the top of the etherpad. I feel this worked really well for us.

    Reviewer burnout

    We started off talking about core reviewer burnout, and what we expect from core. We've previously been clear that we expect a minimum level of reviews from cores, but we are increasingly concerned about keeping cores "on the same page". The consensus is that, at least, cores should be expected to attend summits. There is a strong preference for cores making it to the mid-cycle if at all possible. It was agreed that I will approach the OpenStack Foundation and request funding for cores who are experiencing budget constraints if needed. I was asked to communicate these thoughts on the openstack-dev mailing list. This openstack-dev mailing list thread is me completing that action item.

    The conversation also covered whether it was reasonable to make trivial updates to a patch that was close to being acceptable. For example, consider a patch which is ready to merge apart from its commit message needing a trivial tweak. It was agreed that it is reasonable for the second core reviewer to fix the commit message, upload a new version of the patch, and then approve that for merge. It is a good idea to leave a note in the review history about this when these cases occur.

    We expect cores to use their judgement about what is a trivial change.

    I have an action item to remind cores that this is acceptable behavior. I'm going to hold off on sending that email for a little bit because there are a couple of big conversations happening about Nova on openstack-dev. I don't want to drown people in email all at once.

    Juno release

    We also took at look at the Juno release, with j-3 rapidly approaching. One outcome was to try to find a way to focus reviewers on landing code that is a project priority. At the moment we signal priority with the priority field in the launchpad blueprint, which can be seen in action for j-3 here. However, high priority code often slips away because we currently let reviewers review whatever seems important to them.

    There was talk about picking project sponsored "themes" for each release -- with the obvious examples being "stability" and "features". One problem here is that we haven't had a lot of luck convincing developers and reviewers to actually work on things we've specified as project goals for a release. The focus needs to move past specific features important to reviewers. Contributors and reviewers need to spend time fixing bugs and reviewing priority code. The harsh reality is that this hasn't been a glowing success.

    One solution we're going to try is using more of the Nova weekly meeting to discuss the status of important blueprints. The meeting discussion should then be turned into a reminder on openstack-dev of the current important blueprints in need of review. The side effect of rearranging the weekly meeting is that we'll have less time for the current sub-team updates, but people seem ok with that.

    A few people have also suggested various interpretations of a "review day". One interpretation is a rotation through nova-core of reviewers who spend a week of their time reviewing blueprint work. I think these ideas have merit. An action item for me to call for volunteers to sign up for blueprint focused reviewing.

    Conclusion

    As I mentioned earlier, this is the first in a series of posts. In this post I've tried to cover social aspects of nova -- the mechanics of the Nova Juno mid-cycle meetup, and reviewer burnout - and our current position in the Juno release cycle. There was also discussion of how to manage our workload in Kilo, but I'll leave that for another post. It's already been alluded to on the openstack-dev mailing list this post and the subsequent proposal in gerrit. If you're dying to know more about what we talked about, don't forget the relatively comprehensive notes in our etherpad.

    Tags for this post: openstack juno nova mid-cycle summary core review social
    Related posts: Juno nova mid-cycle meetup summary: slots; Juno nova mid-cycle meetup summary: nova-network to Neutron migration; Juno nova mid-cycle meetup summary: scheduler; Juno nova mid-cycle meetup summary: ironic; Juno nova mid-cycle meetup summary: conclusion; Juno nova mid-cycle meetup summary: DB2 support

posted at: 23:57 | path: /openstack/juno | permanent link to this entry


Tue, 12 Aug 2014



Expectations of core reviewers

    One of the action items from the nova midcycle was that I was asked to make nova's expectations of core reviews more clear. This blog post is an attempt at that.

    Nova expects a minimum level of sustained code reviews from cores. In the past this has been generally held to be in the order of two code reviews a day, which is a pretty low bar compared to the review workload of many cores. I feel that existing cores understand this requirement well, and I am mostly stating it here for completeness.

    Additionally, there is increasing levels of concern that cores need to be on the same page about the criteria we hold code to, as well as the overall direction of nova. While the weekly meetings help here, it was agreed that summit attendance is really important to cores. Its the way we decide where we're going for the next cycle, as well as a chance to make sure that people are all pulling in the same direction and trust each other.

    There is also a strong preference for midcycle meetup attendance, although I understand that can sometimes be hard to arrange. My stance is that I'd like core's to try to attend, but understand that sometimes people will miss one. In response to the increasing importance of midcycles over time, I commit to trying to get the dates for these events announced further in advance.

    Given that we consider these physical events so important, I'd like people to let me know if they have travel funding issues. I can then approach the Foundation about funding travel if that is required.

    Tags for this post: openstack juno ptl nova
    Related posts: Juno Nova PTL Candidacy; Review priorities as we approach juno-3; Thoughts from the PTL; Havana Nova PTL elections; Juno nova mid-cycle meetup summary: nova-network to Neutron migration; Juno nova mid-cycle meetup summary: scheduler

posted at: 17:16 | path: /openstack/juno | permanent link to this entry


Sun, 10 Aug 2014



devpi as a pip cache

posted at: 19:58 | path: /openstack | permanent link to this entry


Tue, 15 Apr 2014



Juno TC Candidacy

    Another email archived for historical reasons.

    I'd also like to announce my TC candidacy. I am currently a member of
    the TC, and I would like to continue to serve.
    
    I first started hacking on Nova during the Diablo release, with my
    first code contributions appearing in the Essex release. Since then
    I've hacked mostly on Nova and Oslo, although I have also contributed
    to many other projects as my travels have required. For example, I've
    tried hard to keep various projects in sync with their imports of
    parts of Oslo I maintain.
    
    I work full time on OpenStack at Rackspace, leading a team of
    developers who work solely on upstream open source OpenStack. I am a
    Nova and Oslo core reviewer and the Nova PTL.
    
    I have been serving on the TC for the last year, and in the Icehouse
    release started acting as the liaison for the board "defcore"
    committee along with Anne Gentle. "defcore" is the board effort to
    define what parts of OpenStack we require vendors to ship in order to
    be able to use the OpenStack trade mark, so it involves both the board
    and the TC. That liaison relationship is very new and only starting to
    be effective now, so I'd like to keep working on that if you're
    willing to allow it.
    


    Tags for this post: openstack juno tc election
    Related posts: Juno Nova PTL Candidacy; Havana Nova PTL elections

posted at: 00:01 | path: /openstack/juno | permanent link to this entry


Mon, 14 Apr 2014



Thoughts from the PTL

    I sent this through to the openstack-dev mailing list (you can see the thread here), but I want to put it here as well for people who don't actively follow the mailing list.

    First off, thanks for electing me as the Nova PTL for Juno. I find the
    outcome of the election both flattering and daunting. I'd like to
    thank Dan and John for running as PTL candidates as well -- I strongly
    believe that a solid democratic process is part of what makes
    OpenStack so successful, and that isn't possible without people being
    will to stand up during the election cycle.
    
    I'm hoping to send out regular emails to this list with my thoughts
    about our current position in the release process. Its early in the
    cycle, so the ideas here aren't fully formed yet -- however I'd rather
    get feedback early and often, in case I'm off on the wrong path. What
    am I thinking about at the moment? The following things:
    
    * a mid cycle meetup. I think the Icehouse meetup was a great success,
    and I'd like to see us do this again in Juno. I'd also like to get the
    location and venue nailed down as early as possible, so that people
    who have complex travel approval processes have a chance to get travel
    sorted out. I think its pretty much a foregone conclusion this meetup
    will be somewhere in the continental US. If you're interested in
    hosting a meetup in approximately August, please mail me privately so
    we can chat.
    
    * specs review. The new blueprint process is a work of genius, and I
    think its already working better than what we've had in previous
    releases. However, there are a lot of blueprints there in review, and
    we need to focus on making sure these get looked at sooner rather than
    later. I'd especially like to encourage operators to take a look at
    blueprints relevant to their interests. Phil Day from HP has been
    doing a really good job at this, and I'd like to see more of it.
    
    * I promised to look at mentoring newcomers. The first step there is
    working out how to identify what newcomers to mentor, and who mentors
    them. There's not a lot of point in mentoring someone who writes a
    single drive by patch, so working out who to invest in isn't as
    obvious as it might seem at first. Discussing this process for
    identifying mentoring targets is a good candidate for a summit
    session, so have a ponder. However, if you have ideas let's get
    talking about them now instead of waiting for the summit.
    
    * summit session proposals. The deadline for proposing summit sessions
    for Nova is April 20, which means we only have a little under a week
    to get that done. So, if you're sitting on a summit session proposal,
    now is the time to get it in.
    
    * business as usual. We also need to find the time for bug fix code
    review, blueprint implementation code review, bug triage and so forth.
    Personally, I'm going to focus on bug fix code review more than I have
    in the past. I'd like to see cores spend 50% of their code review time
    reviewing bug fixes, to make the Juno release as solid as possible.
    However, I don't intend to enforce that, its just me asking real nice.
    
    Thanks for taking the time to read this email, and please do let me
    know if you think this sort of communication is useful.
    


    Tags for this post: openstack juno ptl nova
    Related posts: Juno Nova PTL Candidacy; Review priorities as we approach juno-3; Havana Nova PTL elections; Expectations of core reviewers; Juno nova mid-cycle meetup summary: nova-network to Neutron migration; Juno nova mid-cycle meetup summary: scheduler

posted at: 00:01 | path: /openstack/juno | permanent link to this entry


Sat, 29 Mar 2014



Juno Nova PTL Candidacy

    This is a repost of an email to the openstack-dev list, which is mostly here for historical reasons.

    Hi.
    
    I would like to run for the OpenStack Compute PTL position as well.
    
    I have been an active nova developer since late 2011, and have been a
    core reviewer for quite a while. I am currently serving on the
    Technical Committee, where I have recently been spending my time
    liaising with the board about how to define what software should be
    able to use the OpenStack trade mark. I've also served on the
    vulnerability management team, and as nova bug czar in the past.
    
    I have extensive experience running Open Source community groups,
    having served on the TC, been the Director for linux.conf.au 2013, as
    well as serving on the boards of various community groups over the
    years.
    
    In Icehouse I hired a team of nine software engineers who are all
    working 100% on OpenStack at Rackspace Australia, developed and
    deployed the turbo hipster third party CI system along with Joshua
    Hesketh, as well as writing nova code. I recognize that if I am
    successful I will need to rearrange my work responsibilities, and my
    management is supportive of that.
    
    The future
    --------------
    
    To be honest, I've thought for a while that the PTL role in OpenStack
    is poorly named. Specifically, its the T that bothers me. Sure, we
    need strong technical direction for our programs, but putting it in
    the title raises technical direction above the other aspects of the
    job. Compute at the moment is in an interesting position -- we're
    actually pretty good on technical direction and we're doing
    interesting things. What we're not doing well on is the social aspects
    of the PTL role.
    
    When I first started hacking on nova I came from an operations
    background where I hadn't written open source code in quite a while. I
    feel like I'm reasonably smart, but nova was certainly the largest
    python project I'd ever seen. I submitted my first patch, and it was
    rejected -- as it should have been. However, Vishy then took the time
    to sit down with me and chat about what needed to change, and how to
    improve the patch. That's really why I'm still involved with
    OpenStack, Vishy took an interest and was always happy to chat. I'm
    told by others that they have had similar experiences.
    
    I think that's what compute is lacking at the moment. For the last few
    cycles we're focused on the technical, and now the social aspects are
    our biggest problem. I think this is a pendulum, and perhaps in a
    release or two we'll swing back to needing to re-emphasise on
    technical aspects, but for now we're doing poorly on social things.
    Some examples:
    
    - we're not keeping up with code reviews because we're reviewing the
    wrong things. We have a high volume of patches which are unlikely to
    ever land, but we just reject them. So far in the Icehouse cycle we've
    seen 2,334 patchsets proposed, of which we approved 1,233. Along the
    way, we needed to review 11,747 revisions. We don't spend enough time
    working with the proposers to improve the quality of their code so
    that it will land. Specifically, whilst review comments in gerrit are
    helpful, we need to identify up and coming contributors and help them
    build a relationship with a mentor outside gerrit. We can reduce the
    number of reviews we need to do by improving the quality of initial
    proposals.
    
    - we're not keeping up with bug triage, or worse actually closing
    bugs. I think part of this is that people want to land their features,
    but part of it is also that closing bugs is super frustrating at the
    moment. It can take hours (or days) to replicate and then diagnose a
    bug. You propose a fix, and then it takes weeks to get reviewed. I'd
    like to see us tweak the code review process to prioritise bug fixes
    over new features for the Juno cycle. We should still land features,
    but we should obsessively track review latency for bug fixes. Compute
    fails if we're not producing reliable production grade code.
    
    - I'd like to see us focus more on consensus building. We're a team
    after all, and when we argue about solely the technical aspects of a
    problem we ignore the fact that we're teaching the people involved a
    behaviour that will continue on. Ultimately if we're not a welcoming
    project that people want to code on, we'll run out of developers. I
    personally want to be working on compute in five years, and I want the
    compute of the future to be a vibrant, friendly, supportive place. We
    get there by modelling the behaviour we want to see in the future.
    
    So, some specific actions I think we should take:
    
    - when we reject a review from a relatively new contributor, we should
    try and pair them up with a more experienced developer to get some
    coaching. That experienced dev should take point on code reviews for
    the new person so that they receive low-latency feedback as they
    learn. Once the experienced dev is ok with a review, nova-core can
    pile on to actually get the code approved. This will reduce the
    workload for nova-core (we're only reviewing things which are of a
    known good standard), while improving the experience for new
    contributors.
    
    - we should obsessively track review performance for bug fixes, and
    prioritise them where possible. Let's not ignore features, but let's
    agree that each core should spend at least 50% of their review time
    reviewing bug fixes.
    
    - we should work on consensus building, and tracking the progress of
    large blueprints. We should not wait until the end of the cycle to
    re-assess the v3 API and discover we have concerns. We should be
    talking about progress in the weekly meetings and making sure we're
    all on the same page. Let's reduce the level of surprise. This also
    flows into being clearer about the types of patches we don't want to
    see proposed -- for example, if we think that patches that only change
    whitespace are a bad idea, then let's document that somewhere so
    people know before they put a lot of effort in.
    
    Thanks for taking the time to read this email!
    


    Tags for this post: openstack juno ptl nova election
    Related posts: Havana Nova PTL elections; Review priorities as we approach juno-3; Thoughts from the PTL; Expectations of core reviewers; Juno nova mid-cycle meetup summary: nova-network to Neutron migration; Juno nova mid-cycle meetup summary: scheduler

posted at: 00:01 | path: /openstack/juno | permanent link to this entry


Sun, 03 Nov 2013



Comparing alembic with sqlalchemy migrate

    In the last few days there has been a discussion on the openstack-dev mailing list about converting nova to alembic. Nova currently uses sqlalchemy migrate for its schema migrations. I would consider myself a sceptic of this change, but I want to be a well educated sceptic so I thought I should take a look at an existing alembic user, in this case neutron. There is also at least one session on database changes at the Icehouse summit this coming week, and I wanted to feel prepared for those conversations.

    I should start off by saying that I'm not particularly opposed to alembic. We definitely have problems with migrate, but I am not sure that these problems are addressed by alembic in the way that we'd hope. I think we need to dig deeper into the issues we face with migrate to understand if alembic is a good choice.

    sqlalchemy migrate

    There are two problems with migrate that I see us suffering from at the moment. The first is that migrate is no longer maintained by upstream. I can see why this is bad, although there are other nova dependencies that the OpenStack team maintains internally. For example, the various oslo libraries and the oslo incubator. I understand that reducing the amount of code we maintain is good, but migrate is stable and relatively static. Any changes made will be fixes for security issues or feature changes that the OpenStack project wants. This relative stability means that we're unlikely to see gate breakages because of unexpected upstream changes. It also means that when we want to change how migrate works for our convenience, we don't need to spend time selling upstream on that change.

    The other problem I see is that its really fiddly to land database migrations in nova at the moment. Migrations are a linear stream though time implemented in the form of a sequential number. So, if the current schema version is 227, then my new migration would be implemented by adding the following files to the git repository:

      184_implement_funky_feature.py
      184_sqlite_downgrade.sql
      184_sqlite_upgrade.sql
      


    In this example, the migration is called "implement_funky_feature", and needs custom sqlite upgrades and downgrades. Those sqlite specific files are optional.

    Now the big problem here is that if there is more than one patch competing for the next migration number (which is quite common), then only one patch can win. The others will need to manually rebase their change by renaming these files and then have to re-attempt the code review process. This is very annoying, especially because migration numbers are baked into our various migration tests.

    "Each" migration also has migration tests, which reside in nova/tests/db/test_migrations.py. I say each in quotes because we haven't been fantastic about actually adding tests for all our migrations, so that is imperfect at best. When you miss out on a migration number, you also need to update your migration tests to have the new version number in them.

    If we ignore alembic for a moment, I think we can address this issue within migrate relatively easily. The biggest problem at the moment is that migration numbers are derived from the file naming scheme. If instead they came from a configuration file, then when you needed to change the migration number for your patch it would be a one line change in a configuration file, instead of a selection of file renames and some changes to tests. Consider a configuration file which looks like this:

      mikal@e7240:~/src/openstack/nova/nova/db/sqlalchemy/migrate_repo/versions$ cat versions.json | head
      {
          "133": [
              "folsom.py"
          ], 
          "134": [
              "add_counters_to_bw_usage_cache.py"
          ], 
          "135": [
              "add_node_to_instances.py"
          ], 
      ...
      


    Here, the only place the version number appears is in this versions.json configuration file. For each version, you just list the files present for the migration. In each of the cases here its just the python migration, but it could just as easily include sqlite specific migrations in the array of filenames.

    Then we just need a very simple change to migrate to prefer the config file if it is present:

      diff --git a/migrate/versioning/version.py b/migrate/versioning/version.py index d5a5be9..cee1e66 100644 --- a/migrate/versioning/version.py +++ b/migrate/versioning/version.py @@ -61,22 +61,31 @@ class Collection(pathed.Pathed): """ super(Collection, self).__init__(path) - # Create temporary list of files, allowing skipped version numbers. - files = os.listdir(path) - if '1' in files: - # deprecation - raise Exception('It looks like you have a repository in the old ' - 'format (with directories for each version). ' - 'Please convert repository before proceeding.') - - tempVersions = dict() - for filename in files: - match = self.FILENAME_WITH_VERSION.match(filename) - if match: - num = int(match.group(1)) - tempVersions.setdefault(num, []).append(filename) - else: - pass # Must be a helper file or something, let's ignore it. + # NOTE(mikal): If there is a versions.json file, use that instead of + # filesystem numbering + json_path = os.path.join(path, 'versions.json') + if os.path.exists(json_path): + with open(json_path) as f: + tempVersions = json.loads(f.read()) + + else: + # Create temporary list of files, allowing skipped version numbers. + files = os.listdir(path) + if '1' in files: + # deprecation + raise Exception('It looks like you have a repository in the ' + 'old format (with directories for each ' + 'version). Please convert repository before ' + 'proceeding.') + + tempVersions = dict() + for filename in files: + match = self.FILENAME_WITH_VERSION.match(filename) + if match: + num = int(match.group(1)) + tempVersions.setdefault(num, []).append(filename) + else: + pass # Must be a helper file or something, let's ignore it. # Create the versions member where the keys # are VerNum's and the values are Version's.


    There are some tweaks required to test_migrations.py as well, but they are equally trivial. As an aside, I wonder what people think about moving the migration tests out of the test tree and into the versions directory so that they are beside the migrations. This would make it clearer which migrations lack tests, and would reduce the length of test_migrations.py, which is starting to get out of hand at 3,478 lines.

    There's one last thing I want to say about migrate migrations before I move onto discussing alembic. One of the features of migrate is that schema migrations are linear, which I think is a feature not a limitation. In the Havana (and presumably Icehouse) releases there has been significant effort from Mirantis and Rackspace Australia to fix bugs in database migrations in nova. To be frank, we do a poor job of having reliable migrations, even in the relatively simple world of linear migrations. I strongly feel we'd do an even worse job if we had non-linear migrations, and I think we need to require that all migrations be sequential as a matter of policy. Perhaps one day when we're better at writing migrations we can vary that, but I don't think we're ready for it yet.

    Alembic

    An example of an existing user of alembic in openstack is neutron, so I took a look at their code to work out what migrations in nova using alembic might look like. First off, here's the work flow for adding a new migration:

    First off, have a read of neutron/db/migration/README. The process involves more tools than nova developers will be used to, its not a simple case of just adding a manually written file to the migrations directory. First off, you need access to the neutron-db-manage tool to write a migration, so setup neutron.

    Just as an aside, the first time I tried to write this blog post I was on an aeroplane, with no network connectivity. Its is frustrating that writing a new database migration requires network connectivity if you don't already have the neutron tools setup in your development environment. Even more annoyingly, you need to have a working neutron configuration in order to be able to add a new migration, which slowed me down a fair bit when I was trying this out. In the end it seems the most expedient way to do this is just to run up a devstack with neutron configured.

    Now we can add a new migration:

      $ neutron-db-manage --config-file /etc/neutron/neutron.conf \
      --config-file /etc/neutron/plugins/ml2/ml2_conf.ini \
      revision -m "funky new database migration" \
      --autogenerate
      No handlers could be found for logger "neutron.common.legacy"
      INFO  [alembic.migration] Context impl MySQLImpl.
      INFO  [alembic.migration] Will assume non-transactional DDL.
      INFO  [alembic.autogenerate] Detected removed table u'arista_provisioned_tenants'
      INFO  [alembic.autogenerate] Detected removed table u'ml2_vxlan_allocations'
      INFO  [alembic.autogenerate] Detected removed table u'cisco_ml2_nexusport_bindings'
      INFO  [alembic.autogenerate] Detected removed table u'ml2_vxlan_endpoints'
      INFO  [alembic.autogenerate] Detected removed table u'arista_provisioned_vms'
      INFO  [alembic.autogenerate] Detected removed table u'ml2_flat_allocations'
      INFO  [alembic.autogenerate] Detected removed table u'routes'
      INFO  [alembic.autogenerate] Detected removed table u'cisco_ml2_credentials'
      INFO  [alembic.autogenerate] Detected removed table u'ml2_gre_allocations'
      INFO  [alembic.autogenerate] Detected removed table u'ml2_vlan_allocations'
      INFO  [alembic.autogenerate] Detected removed table u'servicedefinitions'
      INFO  [alembic.autogenerate] Detected removed table u'servicetypes'
      INFO  [alembic.autogenerate] Detected removed table u'arista_provisioned_nets'
      INFO  [alembic.autogenerate] Detected removed table u'ml2_gre_endpoints'
        Generating /home/mikal/src/openstack/neutron/neutron/db/migration/alembic_migrations/
      versions/297033515e04_funky_new_database_m.py...done
      


    This command has allocated us a migration id, in this case 297033515e04. Interestingly, the template migration drops all of the tables for the ml2 driver, which is a pretty interesting choice of default.

    There are a bunch of interesting headers in the migration python file which you need to know about:

      """funky new database migration
      
      Revision ID: 297033515e04
      Revises: havana
      Create Date: 2013-11-04 17:12:31.692133
      
      """
      
      # revision identifiers, used by Alembic.
      revision = '297033515e04'
      down_revision = 'havana'
      
      # Change to ['*'] if this migration applies to all plugins
      
      migration_for_plugins = [
          'neutron.plugins.ml2.plugin.Ml2Plugin'
      ]
      


    The developer README then says that you can check your migration is linear with this command:

      $ neutron-db-manage --config-file /etc/neutron/neutron.conf \
      --config-file /etc/neutron/plugins/ml2/ml2_conf.ini check_migration
      


    In my case it is fine because I'm awesome. However, it is also a little worrying that you need a tool to hold your hand to verify this because its too hard to read through the migrations to verify it yourself.

    So how does alembic go with addressing the concerns we have with the nova database migrations? Well, alembic is currently supported by an upstream other than OpenStack developers, so alembic addresses that concern. I should also say that alembic is obviously already in use by other OpenStack projects, so I think it would be a big ask to move to something other than alembic.

    Alembic does allow linear migrations as well, but its not enforced by the tool itself (in other words, non-linear migrations are supported by the tooling). That means there's another layer of checking required by developers in order to maintain a linear migration stream, and I worry that will introduce another area in which we can make errors and accidentally end up with non-linear migrations. In fact, in the example of multiple patches competing to be the next one in the line alembic is worse, because the headers in the migration file would need to be updated to ensure that linear migrations are maintained.

    Conclusion

    I'm still not convinced alembic is a good choice for nova, but I look forward to a lively discussion at the design summit about this.

    Tags for this post: openstack icehouse migrate alembic db migrations
    Related posts: Exploring a single database migration; On Continuous Integration testing for Nova DB

posted at: 22:52 | path: /openstack/icehouse | permanent link to this entry


Sat, 02 Nov 2013



On Continuous Integration testing for Nova DB

    To quote Homer Simpson: "All my life I've had one dream, to achieve my many goals.".

    One of my more recent goals is a desire to have real continuous integration testing for database migrations in Nova. You see, at the moment, database migrations can easily make upgrades painful for deployers, normally by taking a very long time to run. This is partially because we test on trivial datasets on our laptops, but it is also because it is hard to predict the scale of the various dimensions in the database -- for example: perhaps one deployment has lots of instances; whilst another might have a smaller number of instances but a very large number of IP addresses.

    The team I work with at Rackspace Australia has therefore been cooking up a scheme to try and fix this. For example, Josh Hesketh has been working on what we call Turbo Hipster, which he has blogged about. We've started off with a prototype to prove we can get meaningful testing results, which is running now.

    Since we finished the prototype we've been working on a real implementation, which is known as Turbo Hipster. I know it's an odd name, but we couldn't decide what to call it, so we just took a suggestion from the github project namer. Its just an added advantage that the OpenStack Infra team think that the name is poking fun at them. Turbo Hipster reads the gerrit event stream, and then uses our own zuul to run tests and report results to gerrit. We need our own zuul because we want to able to offer federated testing later, and it isn't fair to expect the Infra team to manage that for us. There's nothing special about the tests we're running; our zuul is capable of running other tests if people are interested in adding more, although we'd have to talk about if it makes more sense for you to just run your own zuul.

    Generally I keep an eye on the reports and let developers know when there are problems with their patchset. I don't want to link to where the reports live just yet. Right now, there are some problems which stop me from putting our prototype in a public place, though. Consider a migration that takes some form of confidential data out of the database and just logs it. Sure, we'd pick this up in code review, but by then we might have published test logs with confidential information. This is especially true because we want to be able to run tests against real production databases, both ones donated to run on our test infrastructure and ones where a federated worker is running somewhere else.

    We have therefore started work on a database anonymization tool, which we named Fuzzy Happiness (see earlier comment about us being bad at naming things). This tool takes markup in the sqlalchemy models file and uses that to decide what values to anonymize (and how). Fuzzy Happiness is what prompted me to write this blog post: Nova reviewers are about to see a patch with strange markup in it, and I wanted something to point at to explain what we're trying to do.

    Once we have anonymization working there is one last piece we need, which is database scaling. Perhaps the entire size of your database gives away things you don't want leaked into gerrit. This tool is tentatively codenamed Elastic Duckface, and we'll tell you more about it just as soon as we've written it.

    I'd be very interested in comments on any of this work, so please do reach out if you have thoughts.

    Tags for this post: openstack turbo_hipster fuzzy_happiness db ci anonymization
    Related posts: Comparing alembic with sqlalchemy migrate; Nova database continuous integration

posted at: 13:10 | path: /openstack | permanent link to this entry


Fri, 02 Aug 2013



Exploring a single database migration

    Yesterday I was having some troubles with a database migration download step, and a Joshua Hesketh suggested I step through the migrations one at a time and see what they were doing to my sqlite test database. That's a great idea, but it wasn't immediately obvious to me how to do it. Now that I've figured out the steps required, I thought I'd document them here.

    First off we need a test environment. I'm hacking on nova at the moment, and tend to build throw away test environments in the cloud because its cheap and easy. So, I created a new Ubuntu 12.04 server instance in Rackspace's Sydney data center, and then configured it like this:

      $ sudo apt-get update
      $ sudo apt-get install -y git python-pip git-review libxml2-dev libxml2-utils
      libxslt-dev libmysqlclient-dev pep8 postgresql-server-dev-9.1 python2.7-dev
      python-coverage python-netaddr python-mysqldb python-git virtualenvwrapper
      python-numpy virtualenvwrapper sqlite3
      $ source /etc/bash_completion.d/virtualenvwrapper
      $ mkvirtualenv migrate_204
      $ toggleglobalsitepackages
      


    Simple! I should note here that we probably don't need the virtualenv because this machine is disposable, but its still a good habit to be in. Now I need to fetch the code I am testing. In this case its from my personal fork of nova, and the git location to fetch will obviously change for other people:

      $ git clone http://github.com/mikalstill/nova
      


    Now I can install the code under test. This will pull in a bunch of pip dependencies as well, so it takes a little while:

      $ cd nova
      $ python setup.py develop
      


    Next we have to configure nova because we want to install specific database schema versions.

      $ mkdir /etc/nova
      $ sudo mkdir /etc/nova
      $ sudo vim /etc/nova/nova.conf
      $ sudo chmod -R ugo+rx /etc/nova
      


    The contents of my nova.conf looks like this:

      $ cat /etc/nova/nova.conf
      [DEFAULT]
      sql_connection = sqlite:////tmp/foo.sqlite
      


    Now I can step up to the version before the one I am testing:

      $ nova-manage db sync --version 203
      


    You do the same thing but with a different version number to step somewhere else. Its also pretty easy to get the schema for a table under sqlite. I just do this:

      $ sqlite3 /tmp/foo.sqlite
      SQLite version 3.7.9 2011-11-01 00:52:41
      Enter ".help" for instructions
      Enter SQL statements terminated with a ";"
      sqlite> .schema instances
      CREATE TABLE "instances" (
              created_at DATETIME,
              updated_at DATETIME,
      [...]
      


    So there you go.

    Disclaimer -- I wouldn't recommend upgrading to a specific version like this for real deployments, because the models in the code base wont match the tables. If you wanted to do that you'd need to work out what git commit added the version after the one you've installed, and then checkout the commit before that commit.

    Tags for this post: openstack tips rackspace nova database migrations sqlite
    Related posts: Merged in Havana: fixed ip listing for single hosts; Merged in Havana: configurable iptables drop actions in nova; Michael's surprisingly unreliable predictions for the Havana Nova release; Juno nova mid-cycle meetup summary: DB2 support; Havana Nova PTL elections; Upgrade problems with the new Fixed IP quota

posted at: 18:37 | path: /openstack/tips | permanent link to this entry