Linux Blog

Optimizing Shell Scripts

Filed under: Shell Script Sundays — at 6:30 pm on Sunday, January 23, 2011

optimizing shell scripts

I’ll be honest, I’m no expert on optimizing shell scripts. I’m hoping that readers will chime in with their tips / experiences. With that being said I do have a few tricks up my sleeve from hands on experience with code optimization using other languages.

Use time to get a baseline

Any performance testing normally starts with a baseline. It can be hard to tell which direction you need to go, when you don’t know where you started. Using the Linux time command, you can get an baseline which you can use to track any performance increases.

Consider changing types of loops

I’m not sure how much this makes a difference in bash, but it can make a huge difference in other languages. It is also important to make sure there is no repetition of operations within the clause, because this too will get repeated on every iteration.

time for i in `seq 1 10000`;
do echo $i; done;
time seq 1 10000 |
while read i; do echo $i; done;
real    0m0.410s
user    0m0.327s
sys    0m0.077s
real    0m0.626s
user    0m0.472s
sys    0m0.164s

Remove unneeded output

time for i in `seq 1 100000`; do echo $i; done; time for i in `seq 1 100000`; do true ; done;
real    0m3.172s
user    0m2.480s
sys    0m0.502s
real    0m1.105s
user    0m1.087s
sys    0m0.014s


If you have processes that may take a while, you can always use backgrounding to put them in the background while you perform other operations. This may or may not work depending on the situation.

Change Shell

csh zsh ksh Dash
real    0m0.409s
user    0m0.340s
sys    0m0.065s
real    0m0.408s
user    0m0.324s
sys    0m0.078s
real    0m0.21s
user    0m0.05s
sys    0m0.01s
real    0m0.409s
user    0m0.328s
sys    0m0.078s

Remove needed Comments and lines

Most compiled languages remove comments so that they don’t appear in the binary, since bash is an interpreted language this is not the case. If there is a huge number of comments in a script it can cause some sluggishness as it interprets each line.

Use sed to remove them, keep another copy or branch for development and distribute the de-commented version.

Database Vs. Files

Using a database versus. files can give you a performance boost. Think about it, writing files takes up processing time, it then takes time to read those files. Processing and performing lookups on data using text files is slow, using a database such as MySQL or SQLite will work and give you results.

Inserting into SQLite can take longer than writing to a file (at least the way I was doing it) which may or may not be a problem depending on what you’re trying to do. The resulting file is also larger than a plain text file, which I assume is due to indexes.

Here is an example of reading from a text file versus reading from sqlite:

sqlite sqlite.db “select * from test”;

While this may not seem significant, try selecting lines 1,10,100,150,200 and 9000-10000 delimited by pipes.

Commands Used:

time awk ‘NR==1;NR==10;NR==100;NR==150;NR==200;NR==9000,NR==10000;’ bash.txt | sed ‘s/ /\|/’
time sqlite sqlite.db “SELECT * from test WHERE one IN (1,10,100,150,200) OR one BETWEEN 9000 AND 10000”

real    0m0.023s
user    0m0.020s
sys     0m0.004s
real    0m0.019s
user    0m0.020s
sys     0m0.000s

Not much of a difference here, but if you’re working with real world data and need to select certain rows it can make a huge difference take the example of looking for the string Hello within 10000 rows:

time grep Hello bash.txt
time sqlite sqlite.db “SELECT * from test where two LIKE ‘%Hello%'”

grep SQLite
real    0m0.157s
user    0m0.024s
sys     0m0.028s
real    0m0.065s
user    0m0.000s
sys     0m0.048s

Use the right tool for the job

You may try to hammer a nail into the wall, but a screwdriver or drill will work much better. Using the correct application is key, knowing what tool is best to use and for what purpose is the tricky part.

Take a look at cut vs sed vs awk:

time apt-cache search python | (below)
cut -d \  -f 1 awk ‘{print $1}’ sed ‘s/\(.*\?\)\ \-\ \(.*\)/\1/’
real    0m0.766s
user    0m0.692s
sys    0m0.040s
real    0m0.759s
user    0m0.680s
sys    0m0.052s
real    0m0.864s
user    0m0.804s
sys    0m0.012s

Perhaps this one isn’t fair. The sed expression doesn’t do it properly. If some one with uber sed-fu can do it a better way, let me know in the comments and I’ll bench mark it. This is the closest to awk and cut that I came up with so this is whats represented for now.

Use Better Syntax

wc -l file.txt cat file.txt | wc -l
real    0m0.009s
user    0m0.000s
sys    0m0.008s
real    0m0.018s
user    0m0.001s
sys    0m0.025s

The same goes for a lot of other programs including grep:

grep test file.txt cat file.txt | grep test
real 0m0.009s real 0m0.017s

Process in Parallel

If performance is a huge concern, you could consider performing operations in parallel. Projects like distcc achieve awesomely fast compile times when distributing load over a number of hosts. You’d think that using this kind of technique with shell scripts would result in a considerable performance boost. From my testing it seems to produce varying results, you can use the techniques outlined here: to see if your scripts can benefit from parallel processing.

Consider Changing Languages

This may be sacrilegious to die hard scripters but I’ve written before about when not to script it, this was more along the lines of why make something so complex when a shell script can do it? When performance matters is the answer.

Simple test to echo hello 10,000 times

C++ Bash (time for i in `seq 1 10000`;
do echo Hello; done;)
Python PHP Perl Java
real    0m0.039s
user    0m0.004s
sys     0m0.024s
real    0m0.202s
user    0m0.144s
sys     0m0.040s
real    0m0.058s
user    0m0.032s
sys     0m0.024s
real    0m0.057s
user    0m0.028s
sys     0m0.016s
real    0m0.043s
user    0m0.000s
sys     0m0.028s
real    0m0.212s
user    0m0.124s
sys     0m0.052s