Hackflow

Why Every Language Needs Its Underscore

(This is an adaptation of a talk I gave at PyCon and DevDay. Slides and video are available in russian)

Do you know what underscore is? In its most general it’s a JavaScript library that makes life better. For whoever writes in JavaScript. I mostly write Python and I also wanted my life better, so I went ahead and wrote a similar library for Python. But the thing is you need to understand a magic behind something to properly replicate it. So what’s the magic behind _?

The problem

To answer this question we should look at problems this kind of libraries solve. To get some ground I’ll illustrate them with code. That will be in python, but all the ideas are universal, so don’t be afraid.

A piece of entangled code

This messy piece of code was taken from a real project and slightly simplified:

1
2
3
4
5
6
7
8
9
images = []
for url in urls:
    for attempt in range(DOWNLOAD_TRIES):
        try:
            images.append(download_image(url))
            break
        except HttpError:
            if attempt + 1 == DOWNLOAD_TRIES:
                raise

There are several things entangled in here, but my point is that this could be written much shorter:

1
2
http_retry = retry(DOWNLOAD_TRIES, HttpError)
images = map(http_retry(download_image), urls)

If it seems hard at first, then it’s okay. It involves some functions and control flow abstractions most probably new to you. Once you get used to it you’ll find that the latter variant is not only shorter, but is also simpler.

Dirty dictionary

But let’s go on and clean some dirty dictionary:

1
2
3
4
5
6
d = {}
for k, v in request.items():
    try:
        d[k] = int(v)
    except (TypeError, ValueError):
        d[k] = None

Here we go through dictionary and clean its values by coercing them to int. Or to None if that is impossible. Cleaning input and ignoring malformed data is quite frequent task and yet it takes so much effort. This is the way I want to write that:

1
walk_values(silent(int), request)

And it’s entirely possible with funcy. But let’s move to the next one.

Pairwise iteration

This code checks if a sequence is ascending:

1
2
3
4
5
6
7
8
prev = None
for x in seq:
    if prev is not None and x <= prev:
        is_ascending = False
        break
    prev = x
else:
    is_ascending = True

Ah, iterating over a sequence and keeping track of a previous element. How many times had you done that? There should be a function to abstract it:

1
is_ascending = all(l < r for l, r in pairwise(seq))

And pairwise does exactly that. It enables us to iterate by sequence adjacent pairs. So we just need to check that all of them are ordered accordingly.

All these examples have one common property — red variants have more code. And more code:

  • takes longer to write,
  • takes longer to read,
  • takes longer to debug,
  • contains more bugs.

Obviously, underscore, funcy and friends help us write less code (at least in this three examples). But how do they do that?

Extracting abstractions

Let’s take another look at the first example. It does three things in a single blob of code:

1
2
3
4
5
6
7
8
9
images = []
for url in urls:
    for attempt in range(DOWNLOAD_TRIES):
        try:
            images.append(download_image(url))
            break
        except HttpError:
            if attempt + 1 == DOWNLOAD_TRIES:
                raise

I highlighted every aspect of this code with separate color:

  • image download (green),
  • retries on fails (red),
  • iteration through urls and result collection (blue).

As you can see, three colors are interleaved here. This hints that corresponding aspects are entangled. And by “entangled” I mean they can not be reused separately. Say we need retries on fails in some other place, we will probably end up copying the whole block and updating it somewhat. Not exactly the best practice “code reuse”.

If, on the other hand, we managed to separate reties then our code will look like:

1
2
3
4
5
6
7
def retry(...):
    ...

http_retry = retry(DOWNLOAD_TRIES, HttpError)
images = []
for url in urls:
    images.append(http_retry(download_image)(url))

Now red code is nicely grouped at the top. Green and blue are still mixed, but now they represent a pattern so common that most modern languages have a builtin function to handle that:

1
2
3
4
5
def retry(...):
    ...

http_retry = retry(DOWNLOAD_TRIES, HttpError)
images = map(http_retry(download_image), urls)

This last variant has some lovely traits: each part of a task at hand (downloading images) appear only once, the whole iteration aspect is handled with a single map() call and retries are abstracted out into the retry function.

Extracting common behavior into a higher order functions is a first trick underscore and funcy use to make your life better.

Hiding low level

It’s time to go back to second example. I’ll throw away error handling to make snippets more even:

1
2
3
4
5
# Using function
walk_values(int, request)

# Using dict comprehension
{k: int(v) for k, v in request.items()}

Now they are both one-liners, so how is first one better? Let’s identify every single distinct component of each code variant:

1
2
3
4
5
# 3 components
walk_values(int, request)

# 8 or so components
{k: int(v) for k, v in request.items()}

The second one looks like rainbow. But besides looking nice this means each time you write or read it you need to load all those components into your head, taking up all your cognitive resources. This is how first line is better.

That could be highlighted in even more obvious manner:

1
2
3
4
walk_values(int, request)

# red are low-level details
{k: int(v) for k, v in request.items()}

This way we can see that about a half of the second line is low-level details. And low-level mean you don’t need to know all those details to understand what’s going on.

Hiding low-level details is the second way such libraries make your life better.

Enriching our language

I’ll translate the last example into natural language:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
prev = None
for x in seq:
    if prev is not None and x <= prev:
        is_ascending = False
        break
    prev = x
else:
    is_ascending = True
# Starting with empty prev, iterate over seq, bookkeeping prev actuality,
# on each cycle if prev is present and current element is less or equal than it
# then set is_ascending to False and break.
# If loop wasn't broken set is_ascending to True

is_ascending = all(l < r for l, r in pairwise(seq))
# set is_ascending to all left elements being smaller than right
# in adjacent pairs of seq

Obviously, more code emits more text. Higher level code generates an explanation utilizing higher level abstractions. This way we can use bigger building blocks not only in coding, but in problem solving.

And this is the third way _ makes your life better.

Wrap-up

All the things we came through are completely language independent. So there gotta be underscore for every language? Not quite, and more importantly a straight-forward port is not always a great idea: common behaviors to abstract vary per language and especially per application. The right approach would be to follow core ideas. Or look around if someone have already done that.

Here are some leads for you to take:

Language Libraries
JavaScript Array, Function, Underscore, lowdash
Python itertools, functools, funcy, toolz, fn.py
Ruby Enumerable, ActiveSupport
PHP functional-php, Underscore.php
Clojure clojure.core
Java FunctionalJava
C# LINQ
Objective-C Underscore.m

P.S. You may also want to look at Scala if you are using JVM and at F# if it’s .NET.

P.P.S. Please stop commenting on Hacker News, a controversial penalty is killing this post. Use reddit thread instead. Sadly, HN is not a place for discussions anymore.

Comments, also at Hacker News and Reddit