make better_mistakes

Finding intermittently failing specs with RSpec’s bisect

High level takeaways: As of RSpec 3.4 we have a --bisect flag which is used to find order-dependent spec failures. Using bash, we can run an inconsistent spec until it fails, and using its seed number use an RSpec command of the form rspec spec/some_spec.rb --seed 12345 --bisect find the minimum number of specs to run in order to recreate the failure. This drastically reduces the amount of code which needs to be reviewed to find the cause of spec failure, which is likely a side effect in the passing spec. This passing spec that causes later failures can be isolated using the documentation formatter.

I work on a large app with thousands of specs of varying quality.

Unfortunately, running our spec suite locally is impractical, and running all the specs in continuous integration can take more than a half an hour. Our specs fail intermittently, and spec failures delay builds from passing while reducing developer trust in the specs.

I recently came across a file which had an intermittently failing spec in my local environment. At the time I didn’t have the attention or the tooling to address the issue. I noted the seed number, which RSpec uses to run the same specs in the same order, and moved on.

A while later a friend tweeted about git bisect. I had a little free time at the end of my work week, so I decided to try out this technique to isolate the minimum reproduction to cause failing specs.

I ran the spec with the seed number I’d recorded a few weeks earlier to confirm the test was still failing in this order, and got the failure I expected. I then tried running the spec with the bisect command, but RSpec failed because it didn’t recognize the --bisect command flag. Bisect was introduced in RSpec 3.4, and I was using an older version in this project.

I updated RSpec and ran the seeded spec again, and this time it passed. I’m not sure why, but I had a hunch that the spec failure still existed, and perhaps RSpec had changed its seeding algorithm between versions. So I decided to run the spec until it failed. If you don’t know the seed number to reproduce a failing spec, you can run the script below to find it. I ran this script and went to lunch.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
until [ $? == 1 ] # $? is last exit code, and 1 is the failure exit code.
do
  bundle exec rspec spec/some_spec.rb
done

# (I use zsh in my terminal, and some of this syntax doesn't work in that shell.
# To fix this I dropped into a bash shell with `bash`, which I closed when I
# finished up with `exit`.)

# There are some obvious improvements I would make if this weren't throwaway
# code. First, I would put it in a script file, maybe titled
# `run-until-failure`. Second, it could use a command flag which indicates how
# many times a script should be run until it's considered a consistently
# passing spec. Third, I'd extract the spec command itself, and pass it into
# the script. Since inconsistent spec are such a drag on fast development, I
# am considering making these improvements and running our entire spec suite
# through this to find our flakiest specs.

When I came back from lunch I had a new seed number, which then used to bisect my spec with a command resembling bundle exec rspec spec/some_spec.rb --seed 12345 --bisect. In a few minutes RSpec returned command that looked like rspec './spec/some_spec.rb[1:16:1:2:1,1:31:2:1,1:31:2:2,1:32:4:1,1:35:2]' --seed 12345. I ran this command, but added the documentation formatter by appending --format documentation to it. This printed out the expectation message from describe and it blocks associated with the handful of specs this command runs.

In this case, I got one passing spec, and four failing specs. The passing spec was executed before the failures, so I suspected it was causing a side effect in its sibling specs. I used the strings printed out from the documentation to search the spec file for line numbers, and then I ran the specs in pairs, matching the passing spec with each of the failing specs. I usually had to run these a few times to get the passing spec to run before the failing spec, and recorded the seed number to recreate this failure in each case. Then I looked at the code and found there was one line I could comment out which would cause my failing specs to pass. Of course, the previously passing spec now failed, but now I could isolate or remove this side effect, and increase our spec suite’s stability.

Fred Brooks: Werewolf Hunter

Programming culture is a pop culture. The inevitable Hollywoodization of “No Silver Bullet” is apparent enough to me that I can see the movie trailer in my mind.

Our protagonist is hunkered in a dirty underground bunker, reviewing his weapons carefully. As a voiceover he curmudgeonly complains “Of all the monsters who fill the nightmares of our folklore—”

a heavy bass thud as we flash cut to him wandering a graveyard at midnight

“—none terrify more than werewolves—”

out of the corner of his eye he sees something and turns

“—because they transform unexpectedly from the familiar—”

he is attacked by a fast-moving wolfman, and fires a crossbow at it

“—into horrors.”

Cut back to the bunker as he opens a box of shells… “For these, we seek bullets of silver that can magically lay them to rest. I have bad news, there are no silver bullets.” Then a title card comes up declaring “Fred Brooks: Werewolf Hunter. Coming Summer 2016.”


I reread “No Silver Bullet” today, and while I agree with its thesis I don’t know if its conclusions hold up. The thesis of “NSB” is that software will never have order of magnitude improvements occur within the timeframes that these sorts of improvements happen with hardware because the essence of the work of software development is not conducive to these sorts of improvements.

Brooks writes, “I believe the hard part of building software to be the specification, design, and testing of this conceptual construct [that is the essence of software], not the labor of representing it and testing the fidelity of the representation. We still make syntax errors, to be sure; but they are fuzz compared to the conceptual errors in most systems.” [emphasis mine]

No doubt, the specification, design, and testing of the conceptual framework of the software is where most of the labor lies. Most of the problems of software development lie in figuring out what you actually need the software to do, often from a loose set of requirements from folks whose expertise lies outside the domain of software.

To improve how software is constructed Brooks makes four suggestions:

  • Buy software instead of building it.
  • Use rapid prototyping to refine requirements.
  • Grow your software instead of building it using incremental development.
  • Foster great designers by rewarding them with status.

This last point is the one I quibble with. I feel it could use its own essay in defense. Though the difference between an average designer and a great one is like that of Salieri and Mozart Brooks does not spend much time explaining how to foster great designers. By way of example he says that Unix is great while MS-DOS is only average, which sounds about right. He claims that, “The differences between the great and the average approach an order of magnitude.”

He then goes on to outline:

How to grow great designers? Space does not permit a lengthy discussion, but some steps are obvious:

  • Systematically identify top designers as early as possible. The best are often not the most experienced.
  • Assign a career mentor to be responsible for the development of the prospect, and keep a careful career file.
  • Devise and maintain a career development plan for each prospect, including carefully selected apprenticeships with top designers, episodes of advanced formal education, and short courses, all interspersed with solo design and technical leadership assignments.
  • Provide opportunities for growing designers to interact with and stimulate each other.

To me this outline is akin to recognizing that there are no silver bullets, and thus resigning to giving the werewolves the corner office.

I do believe there are significant improvements that can be made which will make software’s design improve from average to great, but I don’t believe that improvement needs to happen primarily at the level of the individual, but instead at the level of the team. If this sort of cultural shift can be fostered then the quality of software the team produces will be great, and so will the quality of life for the developers who are working on it.

How to grow great teams? Space does not permit a lengthy discussion, but some steps are obvious:

  • Systematically identify developers who will collaborate well with others. The best are often not the most experienced, and the best collaborations are often across experience levels.
  • Mentorship is the responsibility of everyone on the team. Formal mentors may be assigned and career files kept, but instead of trying to formalize the entire process, instead grow a culture of respect and make it a priority that developers recognize that their work requires both that the continually learn and that they support others in their learning.
  • Devise and maintain a career development plan for every person in your organization. Build apprenticeship into the DNA of your team. Episodes of informal education, and opportunities for all to teach each other, all interspersed with collaborative design and technical leadership assignments.

When Have You Tested Enough?

@cwgem recently asked: ‘Curiously throwing this out there. For test driven development, what do you consider to be “enough testing”?’

To which I respond:

Yesterday I tweeted, “I only test the code I want to work. I trust my code the least, then my team’s, then libraries.”

Not sure I really answered your original question tho. In TDD the idea of ‘enough’ tests may be missing the point. I don’t claim to TDD all the time, or to do it perfectly when I do, but one of the things that has been an apparent focus of TDD to me is the focus on tests driving design.

More to the point, the purpose of TDD as I understand it is to express expectations as test code, which unlike documentation needs to stay in sync with the changing codebase. By driving out our object from tests we create simpler and more focused methods.

In example, last year I was working on calculator code for a client which had some wonky business logic. I drove out the code using tests, and came up with what I thought was an elegant design (a bunch of single-line methods) in a pretty quick cycle. However, I had misunderstood the requirements when I wrote the tests, and the client wasn’t happy.

I quickly found out how the requirements I had written down in tests differed from reality, changed the tests, then made each failing example go green. If I had designed this without tests behind it, I would have likely written some dense conditional logic, and when it failed to meet expectations it would have been faster to throw it away and try again than to refactor it. The design that tests encourage was better for dealing with change.

I have when writing tests for unfamiliar languages written tests for things like setting variables. It was the simplest thing that I could think to test and writing that test gave me the confidence and familiarity to be able to soldier bravely onward. However, I wouldn’t keep a test of that sort in the codebase. There is nothing too small to write a test for at least once, so that you can prove to yourself that your expectations match the reality of the execution of code. There are, however, tests too trivial to commit to version control. The tests that I want in my test suite are ones which prove that the business logic of my application works, not that the language or libraries work.

Or more to the point, I only test the code I want to work. I want code which I trust the least to work the most, and I trust new code less than old code, changed code less than old code, my code less than my team’s code, and my team’s code less than library or framework code.

Can You Develop With an iPad and a VPS?

I’ve been playing around with connecting to a remote server using my iPad in the last few days. I’m curious if I can develop without a laptop. It’s an interesting constraint to work within. Other accounts of using an iPad and a remote server to develop, like I swapped my laptop for an iPad+Linode, sold me on the possibility of developing on the iPad ages ago. This weekend I thought I’d try it out.

My friend @zspencer recommended Digital Ocean* as a quick to setup VPS, and Prompt and iSSH as iPad apps that give me an ssh terminal. Yesterday I set up a Droplet, Digital Ocean’s name for a remote server. Basically, I gave them a credit card number, clicked a button, and a minute later I had a remote Ubuntu box reserved for me, an IP address, and a root password.

Since then I’ve played around with working on that machine just through the iPad. I cheated once, last night when I wanted to add my remote box’s SSH key to github and I couldn’t figure out how to copy the key from the command line into my iPad’s clipboard.

My web development toolkit on a computer requires me to use a shell, a browser, and a text editor. From the iPad I’ve got the shell. I can use vi for text editing, but it’s not something I’ve got practice with, yet. The trickiest thing is getting a browser going while I’m in the development environ. One route would be to treat the remote server as a staging server. I could write code in vi, administer in the shell, then load the site up in a browser on the iPad. Unfortunately, I don’t want mid-commit work to be staged, I just want to be able to view it.

What I want is to be able to go from the Ubuntu terminal to a GUI environ and look at the site in a browser running on the server. iSSH has a VPN option, but I couldn’t figure out how to get it working this afternoon. In fact I remembered something about X, the linux GUI, and thought I needed to start up xterm. Looking at it now I realize I need to start X not xterm which is a terminal program.

The experience reminds me a bit of the first time I worked with Linux. I ended up installing Red Hat on a Compaq box my family had stopped using. I got the terminal running, but I couldn’t ever figure out how to get a GUI running, despite skimming over a thousand page how to use Linux tome I’d ordered from Powell’s. I don’t think I touched Linux again for another six or eight years.

Anyhow, getting a GUI running in iSSH is clearly my next step, but I also need to figure out if this is the time to level up my vim-fu or figure out how to use Sublime Text in a remote version of Ubuntu. Neither is a particularly terrible option.

One other thing I need to figure out sooner rather than later is how to create a non-root user with the correct permissions to create files, install programs, and make things go. I’ve done a little remote server administration, but most of it has been following recipes written by other developers, and I can’t remember what needs a chmod.

Have you used a remote system to develop? How did you level up? What resources were useful for you? (And how do I get my GUI running? Work with an account other than root?)


* Digital Ocean is just one of many VPS providers. Others include Linode, Amazon VPS, and Slicehost. I don’t have any strong opinions about any of these providers, they all give you access to a server you can administer remotely for about the same price as a cup or two of fancy coffee a month.

Blogging Is Hard

Hello Internet,

I started this blog about a year and a half ago, when I was just starting to pursue a career in development. I’ve done an alright job with it, but let’s be honest, I haven’t posted much lately and this thing is getting a little moss growing on the North side from a lack of forward movement.

I’m no stranger to sharing my writing on the Internet, after all, I had a livejournal back in the day. However, blogging hasn’t been holding my attention, because for a small blog like mine it’s not holding anyone else’s attention. When I write, I want to write for an audience, and even though people read this, it doesn’t feel like enough.

On Saturday, as my technical book club was wrapping up, I said something funny, and Zee indicated interest in subscribing to my newsletter. “Hrmm,” I thought, “maybe a newsletter is what I need to write to write again.”

And but so, I am starting a newsletter called Open Source and Feelings. I expect it to be more conversational and scattershot than my blog posts, and I’ll be putting out the first one near the end of May. Expect excerpts and responses to online articles and a bit of editorial on my part. It’s a letter from me to you, a place to start conversations.

Make Better Mistakes isn’t going away, but posts will continue to be infrequent, and I will post to the blog when I have a one thing to say on a topic.

Cheers!

Your Pal from the Internet,

    Strand

postscript Sign up below! (I am excited.)

Open Source and Feelings

Open Source and Feelings is a fortnightly newsletter.

What’s it about? The day I wrote this I said: empathy, open source, community, Ruby, social systems, language, and relationships. Topics are liable to change quickly, your mileage may vary. I am not a doctor or lawyer.

powered by TinyLetter

The work is never done

As I left Ruby on Ales a few weeks ago I was sad, sad that I didn’t hug a few colleagues, sad that we only got a few words in with each other. I was sad that our time together was so short.

For me conferences are like family reunions. I get to touch base with people I haven’t seen in a while, and see how life is treating them. Many of my conference friends bring their partners along; many conference goers’ partners are doing the same work they are.

We are a close knit tribe. When a day of talks is over at a conference, we go to dinner, we share drinks, we talk about what we are working on for work and what we are working on for fun. We stay up too late drinking in each others company.

So I get really upset when death and rape threats are made against someone telling Twitter they felt uncomfortable at a conference. I get angry when someone is doxed. And I get sad when someone loses their job because… ?

This isn’t what my corner of the tech world is, even on the bad days. The people I choose to work with make safe space for all. Maybe I’m naïve, but I believe our conferences and our companies should be safe places for everyone. For women, for people of color, for transfolk, for queers, sometimes they feel really unsafe. I’m glad that Codes of Conduct are becoming standard issue, and that conferences are focusing on increasing attendance for marginalized groups.

This work isn’t done. The work is never done. We can do better.

Hold ⌘Q to Quit

One of Chrome’s features that I really love is the “Warn Before Quitting” option under the Chrome menu, which prevents the program from quitting with a message saying “Hold ⌘Q to quit.” displaying for a second or two before the browser closes.

Unfortunately, that feature only exists in chrome, and I have fat fingers everywhere. So last night I asked on Twitter, “Lazyweb: Any way to make Google Chrome’s “Hold ⌘Q to quit” (Google blog post explaining the feature) the default across OSX?”

My friend @perisaccadic responded that KeyRemap4MacBook has that feature. I looked at the documentation last night, and KeyRemap4MacBook has a lot of key remapping features. Setting it to prevent quitting on accidental keypresses isn’t hard, but it isn’t obvious either. Here’s how you do it.

  1. Install KeyRemap4MacBook and restart your computer.
  2. Go to “System Preferences…” under the Apple menu, and select KeyRemap4MacBook.
  3. Under “Custom Shortcuts” select “Hold Command+Q to Quit Application.” There are a lot of menus under the “Change Key” tab, and I recommend using search box at top and just searching for “quit.”
  4. Select the Key Repeat tab and adjust the “[Holding Key to Key] Holding Threshold”, which has a default of 200ms. I set this to 2000ms, as I want to have half a breath before I close my programs, your mileage may vary. This option is the third from the bottom.

Then system-wide you’ll have to hold down ⌘Q, rather than bumping it by accident and losing whatever you were working on.

Hold ⌘Q to Quit

One of Chrome’s features that I really love is the “Warn Before Quitting” option under the Chrome menu, which prevents the program from quitting with a message saying “Hold ⌘Q to quit.” displaying for a second or two before the browser closes.

Unfortunately, that feature only exists in chrome, and I have fat fingers everywhere. So last night I asked on Twitter, “Lazyweb: Any way to make Google Chrome’s “Hold ⌘Q to quit” (Google blog post explaining the feature) the default across OSX?”

My friend @perisaccadic responded that KeyRemap4MacBook has that feature. I looked at the documentation last night, and KeyRemap4MacBook has a lot of key remapping features. Setting it to prevent quitting on accidental keypresses isn’t hard, but it isn’t obvious either. Here’s how you do it.

  1. Install KeyRemap4MacBook and restart your computer.
  2. Go to “System Preferences…” under the Apple menu, and select KeyRemap4MacBook.
  3. Under “Custom Shortcuts” select “Hold Command+Q to Quit Application.” There are a lot of menus under the “Change Key” tab, and I recommend using search box at top and just searching for “quit.”
  4. Select the Key Repeat tab and adjust the “[Holding Key to Key] Holding Threshold”, which has a default of 200ms. I set this to 2000ms, as I want to have half a breath before I close my programs, your mileage may vary. This option is the third from the bottom.

Then system-wide you’ll have to hold down ⌘Q, rather than bumping it by accident and losing whatever you were working on.

Triangles, Man

Yesterday I was working on building out some arrows to bring attention to content. The request was to have a box with some content with an arrow from the top that makes it look like a speech bubble.

I came up with this:

speech bubble arrow Based on CSS Arrows and Shapes Without Markup
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
// Colors
$white:             #ffffff
$greya:             #aaaaaa

.bubble
  background:       $white
  border:           1px solid $greya
  padding:          10px
  font-weight:      bold
  position:         relative

.bubble.arrow
  &:before, &:after
    content:        ' '
    height:         0
    width:          0
    position:       absolute
  &:before
    top:            -10px
    left:           87px
    border-left:    10px solid transparent
    border-right:   10px solid transparent
    border-bottom:  10px solid $white
  &:after
    z-index:        -1
    top:            -12px
    left:           85px
    border-left:    12px solid transparent
    border-right:   12px solid transparent
    border-bottom:  12px solid $greya

The effect has a diagonal border that follows the arrow as it juts out, and this is done by faking a border, a triangle with a z-index of -1 is placed below the initial triangle. The :before pseudo-element generates a 10px white triangle, while the :after pseudo element generates a background 12px grey triangle.

Not bad, but it’s pretty repetitive. Moreover, it’s hard to tell from the code that the video bubble’s arrow is on the top.

We can express this more clearly and succinctly. First we abstract out the arrow:

Rewrite step one - note that the values of the existing arrow are used as defaults while structure is set in place.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
...
@mixin arrow($size: 10px, $shadow: 1px, $position: 85px)
  $shadow-size:     $size + ($shadow * 2)
  &:before, &:after
    content:        ' '
    height:         0
    width:          0
    position:       absolute
  &:before
    top:            -$size
    left:           $position + ($shadow * 2)
    border-left:    $size solid transparent
    border-right:   $size solid transparent
    border-bottom:  $size solid $white
  &:after
    z-index:        -1
    top:            -$shadow-size
    left:           $position
    border-left:    $shadow-size solid transparent
    border-right:   $shadow-size solid transparent
    border-bottom:  $shadow-size solid $greyc
...
.bubble.arrow
  @include arrow

Nothing fancy is going on here, but taking a moment to get the structure in place and check that our math is clear make the next step easier.

It becomes a bit apparent at this point that we’re re-using a triangle pattern for both the arrow and it’s border. We can shake that out too:

Bring triangles out of the arrow
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
...
@mixin triangle($color, $size, $offset)
  top:              -$size
  left:             $offset
  border-left:      $size solid transparent
  border-right:     $size solid transparent
  border-bottom:    $size solid $color

@mixin arrow($size: 10px, $shadow: 1px, $position: 85px)
  $shadow-size:     $size + ($shadow * 2)
  &:before, &:after
    content:        ' '
    height:         0
    width:          0
    position:       absolute
  &:before
    @include triangle($white, $size, $offset + ($shadow * 2))
  &:after
    z-index:        -1
    @include triangle($greyc, $size + ($shadow * 2), $offset)
...

It looks pretty good, and then we get another request… Can we make it so the the arrow comes out of the left side in some contexts, too.

First we put in the structure to accommodate a fourth argument, $position:

Bring triangles out of the arrow
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
...
@mixin triangle($color, $size, $offset, $position: top)
  @if $position == top
    top:            -$size
    left:           $offset
    border-left:    $size solid transparent
    border-right:   $size solid transparent
    border-bottom:  $size solid $color

@mixin arrow($size, $shadow, $offset, $position)
  &:before, &:after
    content:        ' '
    height:         0
    width:          0
    position:       absolute
  &:before
    @include triangle($white, $size, $offset + ($shadow * 2), $position)
  &:after
    z-index:        -1
    @include triangle($greyc, $size + ($shadow * 2), $offset, $position)

.bubble.arrow
  @include arrow(10px, 1px, 85px, top)

Then we can add another conditional for the left arrow, and voila a modular arrow giving our dialog boxes a speech bubble playfulness.

The finished product
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
@mixin triangle($color, $size, $offset, $position)
  @if $position == top
    top:            -$size
    left:           $offset
    border-left:    $size solid transparent
    border-right:   $size solid transparent
    border-bottom:  $size solid $color
  @if $position == left
    left:           -$size
    top:            $offset
    border-top:     $size solid transparent
    border-bottom:  $size solid transparent
    border-right:   $size solid $color

@mixin arrow($size, $shadow, $offset, $position)
  &:before, &:after
    content:        ' '
    height:         0
    width:          0
    position:       absolute
  &:before
    @include triangle($white, $size, $offset + ($shadow * 2), $position)
  &:after
    z-index:        -1
    @include triangle($greyc, $size + ($shadow * 2), $offset, $position)

.bubble
  background:       $white
  border:           1px solid $greya
  padding:          10px
  font-weight:      bold
  position:         relative

.bubble.top_arrow
  @include arrow(10px, 1px, 85px, top)

.bubble.left_arrow
  @include arrow(10px, 1px, 12px, left)

This technique can easily extend to the right and bottom sides of the box, and shows how the flexibility of Sass speeds up development, especially if you can break down your styles into small, reusable parts.

Apprenticeships as a Competitive Advantage

A business that chooses to train knowledge workers in whatever domain is in demand is likely to find that at the end of the day they have a competitive advantage. I think that you could start a small business today based solely on this principle and be a leader in your field within five years.

I recently read Why Good People Can’t Get Jobs a short book by Peter Cappelli, and found his argument incredibly compelling.1 For a variety of risk-averse reasons companies today make a practice of not filling job openings unless they have the perfect candidate. In the past, to fill a position which required special knowledge some companies would train promising candidates, but the nature of the jobs marketplace in the last few decades has made them overly-cautious of investing in training programs as they fear their employees will walk away after getting training.

For instance, a business that chooses to hire non-programmers and teach them to program will have a competitive advantage in their field. For this reason, based solely on the fact that LivingSocial started an apprenticeship program, I would place money on LivingSocial doing better than GroupOn in the next few years. LivingSocial recognized that without an influx of developers they wouldn’t be able to reach future business goals, and that it is cheaper to train devs than to (continually) recruit them.

I wonder if a web development consultancy cooperative could be formed based off of this insight. If you had a group of half experienced devs and half less experienced but dedicated trainees pairing together, you might find diamonds in the rough rather than paying market rates for diamonds. Of course, this sort of idea could be applied to any sort of business rooted in knowledge work, which is all of them.


  1. That’s an Amazon Affiliate link. The book was short, but this interview with the author is shorter and covers the key points.