Linux Blog

RSS Feeds

Filed under: Shell Script Sundays — Kaleb at 11:43 am on Sunday, April 20, 2008

The other day I was playing around with AwesomeWM and I wanted to have the newest article from digg.com/linux_unix to be displayed in the statusbar. I thought to myself:

“I roughly know how RSS works, so I should be able to do this.”

It turns out it was extremely easy to do.

First how does RSS work. It’s easy just an xml file that gets downloaded with a list of the articles on the site. Well that’s pretty simple so I wrote a little script that will do all the things I need.

First I needed to download the list

wget -c http://digg.com/rss/indexlinux_unix.xml

done with that. Now for what I wanted and to make it a little cleaner i moved this file:

mv indexlinux_unix.xml ~/.news

this way it was in a file that i can easily access.

After that it was just some simple editing of the file using sed. If you don’t know much about sed I suggest you read up on it. It is an extremely powerful tool for quick editing and scripting. For the editing of
the file it was actually quite simple:

cat ~/.news | grep “<title>” | sed -e ‘s/<[/]title>//’ | sed -e ‘s/<title>//’ | sed -e ‘2,2 !d’

now no worries I will explain this its actually quite simple.

I will assume you know what cat ~/.news does but if you don’t, it outputs the contents of the file until the end of the file.

| grep “<title>” is a very important part of the command. As I looked at the xml file i realized that i would get a simple list of all the articles if I greped the title. However thats not all.

It was a very messy output with <title> at the beginning and </title> at the end. Nobody wants to look at that, what I wanted was the text in between. | sed -e ‘s/<[/]title>//’ will get rid of the </title> in the line. I am almost certain that | sed -e ‘s/<\/title>//’ would have done that same thing but you can test that if you want. It needs to be done like this because “/” is a special character so it needs to be escaped.

The next part | sed -e ‘s/<title>//’ should be self explanatory. Basically it just gets rid of the <title> in the line. So now using the first 3 pipes you will get a nice pretty list of all the articles.

This is not what we wanted though. We wanted the newest article. so that’s why we use | sed -e ‘2,2 !d’. This command will cut out everything except the second line in the list. “Hmm but why the second line Kaleb?”
well because while creating this script I found that the first <title> line was the line that told me where I was getting this information from. So it was http://digg.com/linux_unix now I don’t want that. so I went with the second line for the first article. Easy right.

Now as I mentioned at the begining of this article, I wanted to make this give me a clickable link for the awesome statusbar. I will go over awesome piping later this week but basically the only information you will need. Is to go threw your xml file for your RSS feed and find out between what tags the link for your article is and use the above command to show you that link instead of the title then have Firefox open that
link (or whatever browser you use). It was a very simple thing to do.

Kaleb Porter

porterboy55@yahoo.com

http://kpstuff.servebeer.com (website currently down)

Man Pages for commands in this post »

wget
mv
sed
cat

No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a comment

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>