Here is a small python program a friend and I wrote to map and visualize the structure of websites. It is licensed with the BSD license. The programs opens all links in the start address and recursivly searches through all following links. The crawling process may take a long time for more than 1 or 2 recursions. This may cause a lot of traffic for a website, therefore please contact the site’s admin before you run the program.
The program requires Python 2.7, urllib2 and BeautifulSoup.
$ python(.exe) iac.py [parameters]
The required parameters are:
-f [path to result file]
-r [number of recursions]
-n [“title”|”url”] (map site titles or URls)
-u [url to map]
Create a visualization with graphviz:
$ dot -Tsvg result.txt -o sitemap.svg