Saturday 30 March 2013

[Python] URL Un-shorten-er

Hi all,

I was playing with the BeautifulSoup library of Python. It is a very good library to parse and extract data out of HTML documents, even the ones that are poorly coded. So, while I was playing, I thought why not make something interesting. I looked around for inspiration and while browsing my Twitter feed, I came across many shortened URLs, why not demystify them? That's why, I wrote a small script to Un-shorten those URLs. I used a website, URLXray, to resolve the URLs. This is very naive script, I know. But still, that helped me in learning BeautifulSoup.

I'm a naive in Python, therefore, script may look a bit long.

Here's the code:


  1. #!/usr/bin/env python
  2. # Simple script to un-shorten the shortened URLs using the website - URLXray.com
  3. # Author: Rahul Binjve (@RahulBinjve)
  4. # Usage: ./urlDecode.py URL
  5. import urllib2
  6. from bs4 import BeautifulSoup
  7. import sys
  8. def main():
  9.     if len(sys.argv) < 2:
  10.         print "\nUsage: urlDecoder.py \"URL You Want to decode\""
  11.         sys.exit(1)
  12.    
  13.     result = decode(sys.argv[1])
  14.     print "\nDecoded URL is -> ", result
  15. #Decode function, all our work will be done here.
  16. def decode(userArg):
  17.    
  18.     url = "http://urlxray.com/display.php?url=" + userArg
  19.     print "\nUser provided URL -> ", userArg
  20.    
  21.     webPage = urllib2.urlopen(url)
  22.     tastySoup = BeautifulSoup(webPage)
  23.     div = str(tastySoup.find_all("div", class_ = "resultURL2"))
  24.     tastySoup = BeautifulSoup(div)
  25.     for a in tastySoup.findAll('a'):
  26.         if a.has_key('href'):
  27.             decoded = a['href']
  28.     if decoded:
  29.         return decoded
  30. #Standard Python Boilerplate
  31. if __name__ == '__main__':
  32.     main()








Thanks for reading.
Cheers.

2 comments:

  1. Nice Job, Bro..:)
    Keep Experimenting..:)

    ReplyDelete
  2. jon said...

    Hi Rahul,

    Congratulations - it seems like you had an eventful and achievement-filled year! Nice summation of all your activities..keep blogging!

    Best wishes for your new role and for the coming holidays season.

    ReplyDelete