Pipes - Or How I Learned to Stop Worrying and Love The Application Script

Posted on January 10, 2015 in cli, development

Introduction

Piping is not a new concept. Piping allows developers or administrators to execute commands and pass along outputs to other programs. Programs remain specific in this manner and are usually only able to carry out a specific task, and do that task well.

When it comes to writing scripts or daemons for a website, many developers, including myself, are normally accustomed to writing scripts in one language and letting that script execute. There are often times when the language of choice, or the language the application is written in, is not the right fit and as a by-product usually lead the developer to make less-than-deseriable design decisions in order to allow the script to do it's job.

![No but really, like 40 goddamn years](http://curtis.lassam.net/comics/cube_drone/103.gif) Source: http://cube-drone.com/2014_11_27-103_Pipes.html

Enter, Pipes

What exactly is a pipe? Wikipedia defines a pipe as

[...] a set of processes chained by their standard streams, so that the output of each process (stdout) feeds directly as input (stdin) to the next one.

Simply put, pipes are a way to take data and "pass it on" to another process. An example of a pipe might look like:

history | grep "scp"

The above command will list a user's bash history and then pipe the data via the | to grep which then searches for the word "scp", thus showing only the recent commands related to scp. Neat, not to mention useful for quick finds.

So how can we take advantage of this in our own application? Let's work with an example.

Processing Customers

Let's suppose you own an application that lets users sign up and then puts these customers in a waiting period until an admin approves them. Let's also assume that your application is database heavy and needs to queue users that are approved so that they can be fully-activated at a later point, such as midnight. For the sake of argument and to prove you can do this is any language, let's also assume you wrote this application in PHP.

So how would one go about tackling this problem? Well there's a million different ways. One way would be to write a script that connects to the database, pulls all active customers from the queue and processes them as necessary. Of course if we decide to do things that way, we're relying on a few factors:

  • We assume PHP is going to live through the whole process and do it's job properly
  • We hold a lot of memory in PHP in order for it to do it's job
  • There is no easy way to override what data PHP is going to fetch

What if we were able to abstract the database to another script (or better yet, a process) and then pass that data off to PHP to handle it's conditions? The good news, is we absolutely can. Let's look at the code below:

mysql -u MyUser -pMyPassword -h db.myserver.com mydatabase --skip-column-names -e \
"SELECT id FROM my_table t WHERE t.status = 'active' ORDER BY t.id DESC” | \
while read line; do \
php -f scripts/my_cli_script.php --id=$line; \
done

First we initiate and connect to MySQL from the command line, we then issue a SELECT statement to select all entries in my_table with a status of "active". We then pipe (|) the stdout (or what the command will echo to the console) to a bash while loop. The while loop reads the MySQL output line-by-line and passes each line to our PHP script until it's out of lines to read.

Cool! So now we can write a PHP CLI script that takes exactly one parameter and processes exactly one piece of data.

Benefits

The benefits of pipes should be to solve the problem of monolithic or processor/memory intensive scripts. We should be treating our scripts almost as "services", letting each script do one job and do it so well that other jobs can pass and take information in any which way. If we changed the requirements of our example application and decided to let all users navigate the site, then we could easily modify our above script to grab any non-active legacy customers in MySQL and pass that data off to our CLI script to handle the activation. In theory, your code shouldn't have to change.

Conclusion

Pipes are an excellent way to handle and process large chunks of data without killing your scripts. They're easily modifiable, flexible, and allow developers to focus on specific functionality as opposed to clever workarounds due to language limitations.

If you have questions or comments about piping and scripting please leave a comment below.

Cheers!

Thomas Lackemann :)

About

Tom is the founder of Astral TableTop. He's a homebrewer, hiker, and has an about page. Follow @tlackemann on Twitter for more discussions like this.

Comments