Linux Blog

Working with CSV files in Bash

Filed under: Shell Script Sundays — TheLinuxBlog.com at 1:25 pm on Sunday, October 19, 2008

CSV files are very common. Using them with Bash to aid in scripting can be very useful. Here are a some methods of using bash to work with CSV files. These are what I sometimes use when dealing with CSV files in bash, they may work for your CSV’s they may not. I accept no responsibility of the accuracy of these, I’m sure there are better ways of accomplishing the same tasks. If anybody finds other ways or knows better please comment and let me know. Don’t run these on production data, its always a good idea to make backups and test them before using in a production environment. Who knows how your precious data could become mangled due to a wrong number of quotes, terminations or bad regex. Jump for the post

How to count the number of elements within a CSV file.

 [owen@TheLinuxBlog.Com bin]$ echo $(cat test.csv | sed 's/, /_ /g' | tr ',' '\n' | wc -l)
9

Count the number of lines in a CSV file terminated with \n

 [owen@TheLinuxBlog.Com bin]$ wc -l test.csv | cut --delimiter\=" " -f 1
3

And another way:

[owen@TheLinuxBlog.Com bin]$ echo $(expr `wc -l "test.csv" | awk '{print $1}'`)
3

Print the headers of a CSV file (useful if the headers contain the field names)

[owen@TheLinuxBlog.Com bin]$ head -1 test.csv
ID,Question,Answer

Read a line of the CSV (where the number next to the p is the line number) This is useful for looping over a file and getting the line.

[owen@TheLinuxBlog.com bin]$ sed -n 2'p' "test.csv"
1,Who Am I?,I am Owen
[owen@TheLinuxBlog.com bin]$ sed -n 1'p' "test.csv"
ID,Question,Answer
[owen@TheLinuxBlog.com bin]$ sed -n 3'p' "test.csv"
2,What blog is this from?,TheLinuxBlog

So, to loop over the CSV and read a line you can do something like the following:

[owen@TheLinuxBlog.Com bin]$ for i in $(seq 1 `wc -l "test.csv" | awk '{print $1}'`); do sed -n $i'p' "test.csv"; done;
ID,Question,Answer
1,Who Am I?,I am Owen
2,What blog is this from?,TheLinuxBlog

Of course with the above you can offset the first line by changing seq 1 to seq 2 to start from line 2.

To split up each row (line) in the CSV at the comma, I use the following:

[owen@TheLinuxBlog.Com bin]$ for i in $(seq 1 `wc -l "test.csv" | awk '{print $1}'`); do echo $(sed -n $i'p' "test.csv") | tr ',' '\n'; done;

This puts a backspace in place of a comma. The above is known not to work for fields with comma’s in them. If the data relies on comma’s in the fields you have to use some more advanced sed expressions.

Here is the CSV file I was using for this post:

[owen@norpmes bin]$ cat test.csv
ID,Question,Answer
1,Who Am I?,I am Owen
2,What blog is this from?,TheLinuxBlog

Let me know if this was helpful or not, as a whole this is not suppose to be a one to end all CSV tutorial but more of the techniques that can be used to aid in processing them. If any one has any thoughts or suggestions, needs any help or finds other methods of completing the same or other tasks then please comment!

Man Pages for commands in this post »

cut
tr
sed
awk
head

11 Comments »

Comment by Tom

November 3, 2008 @ 7:55 pm

Great post! Not enough bash tutorials exist out there! I’m tired of the usual, “Here’s how to write a bash script! Ready?

#!/bin/bash

echo “Hello World!”

And its just that easy!”

Nice work!

Comment by TheLinuxBlog.com

November 10, 2008 @ 11:12 am

I appreciate your input @TOM. If there are any topics you or any one else would like me to write on, just let me know!

Comment by jack

November 22, 2008 @ 12:44 am

hi linuxblog, really good intro tutorial for sed on csv, any I’m stuck with some problem on converting text file to csv file, apprieacite if you could help me, i have a file like below (3 records) but with more than 5k records:

Fri Aug 01 00:01:19 2008
NAS-IP-Address = 172.16.160.1
Class = “P1-SHT-AAA01;1217116976;70999″
Calling-Station-Id = “00-1F-FB-00-02-26″
Acct-Status-Type = Stop
Acct-Session-Id = “31908b485a370000″
Framed-IP-Address = 172.16.160.2
WiMAX-BS-ID = 000012A00433
User-Name = “joe”

Fri Aug 01 00:02:02 2008
NAS-IP-Address = 172.16.160.1
Class = “P1-SHT-AAA01;1217116976;71005″
Calling-Station-Id = “00-1F-FB-00-06-61″
Acct-Status-Type = Start
Acct-Session-Id = “31908b485d370000″
Framed-IP-Address = 172.16.160.2
WiMAX-BS-ID = 000012A00413
User-Name = “john”

Fri Aug 01 00:02:02 2008
NAS-IP-Address = 172.16.160.1
Class = “P1-SHT-AAA01;1217116976;71005″
Calling-Station-Id = “00-1F-FB-00-06-61″
Acct-Status-Type = Update
Acct-Session-Id = “31908b485d370000″
Framed-IP-Address = 172.16.160.2
WiMAX-BS-ID = 000012A00413
User-Name = “don”

and I need to make csv file to import to sql db like below:

2008-08-1;172.16.160.1;P1-SHT-AAA01;1217116976;71005;00-1F-FB-00-06-61;Start;31908b485d370000;172.16.160.2;000012A00413;joe

possible to do it in sed ? iam really appreaciete your help, thanks.

Comment by TheLinuxBlog.com

November 24, 2008 @ 2:08 pm

Hey Jack,

If those records are automatically put out into a file and follow the same format regardless of if the data exists it shouldn’t be too hard.

If there are two line breaks in between each record then you could process the file that way. As there are semicolons and quotes in the file you will have to escape the query.

If you are trying to import 5000 records into mysql you may be better off using a php or perl script to read the file, but it can be done with bash if you HAVE to use it.

Comment by swatkat

February 12, 2009 @ 9:42 am

Hi, i would like to know how to split a large CSV file in chunks of 65000 lines.

split works fine here, but the condition is it should check for the 2nd column.. if the 2nd column has the same value as that of the 65000 – 1th line (2nd column), then this line too should be appended to the 2nd file created.

thx.

Comment by TheLinuxBlog.com

February 12, 2009 @ 10:35 am

@SWATKAT I guess I don’t understand what it is you are exactly trying to do. Split will split the file into the 65,000 lines. Are you trying to add the 65,000 line to the 2nd file if the second field on the 65,000 line is the same as the 2nd field on the 2nd file? I am assuming you would want this at the beginning of the file correct?

Let me know and I’ll try and help.

Comment by Nikolay

June 8, 2010 @ 3:05 am

Hallo,
Thanks for the post – it is vetry useful indeed.
However i am strugling with the following – I have
to convert a csv file into ANSII file..
Could you please help me with that – I’d like to point out that I am really inexperienced bash/awk script user, but eager to learn :)

Comment by Patrick

December 20, 2011 @ 2:12 pm

Hi, i’m afraid your scripts are not working for me. Especially the one fo count the number of elements in the csv file. The error it comes up with is:

“cut: the delimiter must be a single character
Try ‘cut –help’ for more information.”

Comment by TheLinuxBlog.com

December 20, 2011 @ 3:22 pm

Hey @Patrick, make sure there is only one character for the cut -f field. The space is done by using:
-f \
Also make sure -f is only defined once.
Another issue is that sometimes quotes and apostrophes get translated incorrectly, try replacing any of those for the standard issue ones.

Hope this helps!

Comment by Bogwitch

February 23, 2012 @ 2:23 pm

Hi,

I have a peculiar requirement which I hope you can help with. I have a few CSV files that I wish to convert to TSV for importing into MySQL.
Each entry consists of a few text entries that are enclosed within quotation marks, some numeric entries are not. Within the text areas there are occasionally commas!

e.g.

“abcde”,”fghij”,”klm,no”,1234,5678,””,”pqrst”

There are an equal number of entries on each line.

I have tried a ‘sed ‘s/”,”/[tab]/g but the commas are left around the numbers, if I tr ‘,’ ‘\t’ the commas in the enclosed text is converted!

Any help, suggestions, etc gratefully received.

Bog

Comment by Bogwitch

March 1, 2012 @ 7:27 pm

My own solution:

I extracted each column using variations of ‘cut’ into separate files. I then used ‘paste’ to create a TAB delimited file from the single column files. It was a long way round and although I could have scripted it, I did it manually so as to check each step did not introduce any errors.
The net result is, I now have a table within my database containing slightly over 75 million records with 100% accuracy!

RSS feed for comments on this post. TrackBack URI

Leave a comment

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>