GOVCOMORG FOUNDATION Tools < Issue Crawler Allied Tool Set

IssueCrawler at the exhibition

The Issue Crawler allied tool set allows you to:

  1. Query issue network actors for substance. Use Get all the urls from an issuecrawler xml file, copy and paste the URLs from the network results into scrapeGoogle, query the URLs for key words.
  2. Collect starting points (seed URLs) for a crawl. Use linkRipper to gather URLs from a page. Insert URLs into the harvester of the Issue Crawler.
  3. Compare networks over time. Which sites are rising in importance, which declining? Use compare networks over time, and receive ranked actor lists over time. Tip: use 'by site'.
  4. Perform image analysis of an issue network. Which images are associated with the issues, according to the network actors? Use googleImages to query a set of sites appearing in an issue network for images. Query that set of sites to see which organizations display which images for a particular sub-issue area or particular language. Use imagesDeep to fetch the images from a single URL.
  5. Compare actor composition across networks. Use Get all the urls from an issuecrawler xml file to fetch the URLs from two or more networks. Use analyse to compare the composition of two networks.
  6. Perform Internet censorship research. Use proxies to surf sites from within other countries (or view connection stats only).
  7. Explore robot exclusion policies. Sites may block robots, and thus prevent search engines' and other crawlers from indexing or scraping their sites for archiving or further analysis.
  8. IssueGeographer - Show on a geographical map where organizations in an issue network are based.
Tools:

Scripts are written in php by Erik Borra, and Koen Martens (Sonologic.nl).

 

MENU