Linux Blog

Working with CSV files in Bash

Filed under: Shell Script Sundays — TheLinuxBlog.com at 1:25 pm on Sunday, October 19, 2008

CSV files are very common. Using them with Bash to aid in scripting can be very useful. Here are a some methods of using bash to work with CSV files. These are what I sometimes use when dealing with CSV files in bash, they may work for your CSV’s they may not. I accept no responsibility of the accuracy of these, I’m sure there are better ways of accomplishing the same tasks. If anybody finds other ways or knows better please comment and let me know. Don’t run these on production data, its always a good idea to make backups and test them before using in a production environment. Who knows how your precious data could become mangled due to a wrong number of quotes, terminations or bad regex. Jump for the post

How to count the number of elements within a CSV file.

 [owen@TheLinuxBlog.Com bin]$ echo $(cat test.csv | sed 's/, /_ /g' | tr ',' '\n' | wc -l)
9

Count the number of lines in a CSV file terminated with \n

 [owen@TheLinuxBlog.Com bin]$ wc -l test.csv | cut --delimiter\=" " -f 1
3

And another way:

[owen@TheLinuxBlog.Com bin]$ echo $(expr `wc -l "test.csv" | awk '{print $1}'`)
3

Print the headers of a CSV file (useful if the headers contain the field names)

[owen@TheLinuxBlog.Com bin]$ head -1 test.csv
ID,Question,Answer

Read a line of the CSV (where the number next to the p is the line number) This is useful for looping over a file and getting the line.

[owen@TheLinuxBlog.com bin]$ sed -n 2'p' "test.csv"
1,Who Am I?,I am Owen
[owen@TheLinuxBlog.com bin]$ sed -n 1'p' "test.csv"
ID,Question,Answer
[owen@TheLinuxBlog.com bin]$ sed -n 3'p' "test.csv"
2,What blog is this from?,TheLinuxBlog

So, to loop over the CSV and read a line you can do something like the following:

[owen@TheLinuxBlog.Com bin]$ for i in $(seq 1 `wc -l "test.csv" | awk '{print $1}'`); do sed -n $i'p' "test.csv"; done;
ID,Question,Answer
1,Who Am I?,I am Owen
2,What blog is this from?,TheLinuxBlog

Of course with the above you can offset the first line by changing seq 1 to seq 2 to start from line 2.

To split up each row (line) in the CSV at the comma, I use the following:

[owen@TheLinuxBlog.Com bin]$ for i in $(seq 1 `wc -l "test.csv" | awk '{print $1}'`); do echo $(sed -n $i'p' "test.csv") | tr ',' '\n'; done;

This puts a backspace in place of a comma. The above is known not to work for fields with comma’s in them. If the data relies on comma’s in the fields you have to use some more advanced sed expressions.

Here is the CSV file I was using for this post:

[owen@norpmes bin]$ cat test.csv
ID,Question,Answer
1,Who Am I?,I am Owen
2,What blog is this from?,TheLinuxBlog

Let me know if this was helpful or not, as a whole this is not suppose to be a one to end all CSV tutorial but more of the techniques that can be used to aid in processing them. If any one has any thoughts or suggestions, needs any help or finds other methods of completing the same or other tasks then please comment!