GOVCOMORG FOUNDATION Tools < Issue Crawler Allied Tool Set
The Issue Crawler allied tool set allows you to:
- Query issue network actors for substance. Use Get all the urls from an issuecrawler xml file, copy and paste the URLs from the network results into scrapeGoogle, query the URLs for key words.
- Collect starting points (seed URLs) for a crawl. Use linkRipper to gather URLs from a page. Insert URLs into the harvester of the Issue Crawler.
- Compare networks over time. Which sites are rising in importance, which declining? Use compare networks over time, and receive ranked actor lists over time. Tip: use 'by site'.
- Perform image analysis of an issue network. Which images are associated with the issues, according to the network actors? Use googleImages to query a set of sites appearing in an issue network for images. Query that set of sites to see which organizations display which images for a particular sub-issue area or particular language. Use imagesDeep to fetch the images from a single URL.
- Compare actor composition across networks. Use Get all the urls from an issuecrawler xml file to fetch the URLs from two or more networks. Use analyse to compare the composition of two networks.
- Perform Internet censorship research. Use proxies to surf sites from within other countries (or view connection stats only).
- Explore robot exclusion policies. Sites may block robots, and thus prevent search engines' and other crawlers from indexing or scraping their sites for archiving or further analysis.
- IssueGeographer - Show on a geographical map where organizations in an issue network are based.
Tools:
-
Get all the urls from an issuecrawler xml file
-
scrapeGoogle - Query a set of sites appearing in an issue network. Query that set of sites to see which organizations work in which sub- issue areas or use particular language.
-
linkRipper - Capture all urls from a page. You can choose between inlinks and/or outlinks
-
compare networks over time - Compare Issue Crawler networks over time. Displays ranked actor lists from a scheduled set of Issue Crawler results.
-
googleImages - Query a set of sites appearing in an issue network for images. Query that set of sites to see which organizations display which images for a particular sub- issue area or particular language.
-
imagesDeep - get all the images from a deeplink.
-
analyse - compare two lists of urls for commonalities and differences
-
proxies - view a site through a proxy
-
robots.txt stripper - display a site's robot exclusion policy. Enter URL and see which parts of the site are blocked from indexing.
-
googleNews - query news by google
-
scrape Technorati
-
(Relative) Actor Resonance Per Issue in the Blogosphere. Building upon Technorati, the tool shows a ratio of all issue postings to an organization's association with the issue postings.
-
Charts of the (Relative) Actor Resonance Per Issue in the Blogosphere. This tool charts the amount of issue postings and the ratio of issue postings to an organization's association with the issue postings.
-
Show on a geographical map where organizations in an issue network are based. The IssueGeographer takes issuecrawler.net results, scapes a whois.net service, and plots the sites' registration address (lat/long of the city) to geographical map.
-
De.licio.us Organization Tag Cloud Generator Per Issue. Building upon Del.icio.us, this tool shows, in a tag cloud, which URLs or tags are referred to per issue area.
-
Surfer Issue Pathways. Building upon Alexa's related sites feature, this tool determines which sites are likely to be in the actual surfer paths of other sites related to the same issue.
-
Timestamp. Provides the last modification date of a webpage.
-
rss discovery. Get the rss feeds of a bunch of urls.
Scripts are written in php by Erik Borra, and Koen Martens (Sonologic.nl).