Methods

  1. Hyperlink Analysis
    • Perfom an issue crawl to demarcate an issue network, a social network, an establishment network, or an event network. The result will be an Issue Crawler network map with detailed network data, which can be analysed with various tools. More info on how to do an issue crawl can be found on Issue Crawler's scenario's of use. To collect starting points (seed URLs) for a crawl you can use linkRipper to gather URLs from a page. Insert these URLs into the harvester of the Issue Crawler.
    • You can schedule the issue crawl and compare the networks over time. Which sites are rising in importance, which declining? Use this tool and receive ranked actor lists over time. Tip: use 'by site'.
  2. Issue Networks
    • To query the issue network actors for substance, get all the urls from an issuecrawler xml file, copy and paste the URLs from the network results into the Google scraper. Query that set of sites to see which organizations work in which sub- issue areas or use particular language (by querying the URLs for key words). This can all be done in one step by pasting the network id here. You can also show the frequency of hosts per issue in a Tag Cloud.
    • Compare the network rank with Google's rank through the Actor Profiler. This script will get the top 10 network nodes (by indegree) and query those in Google for a specific issue. The Google pagerank and description will be visualized in an svg, along with the in- and outlinks of the actor from the network.
    • See if surfers actually follow the links from one site to the other with our Surfer Issue Pathways Tool. Building upon Alexa's related sites feature, this tool determines which sites are likely to be in the actual surfer paths of other sites related to the same issue.
    • Perform image analysis of an issue network. Which images are associated with the issues, according to the network actors? Use GoogleImages to query a set of sites appearing in an issue network for images. Query that set of sites to see which organizations display which images for a particular sub-issue area or particular language. Use imagesDeep to fetch the images from a single URL.
    • Show on a geographical map where organizations in an issue network are based. The IssueGeographer takes issuecrawler.net results, scrapes a whois.net service, and plots the sites' registration address (lat/long of the city) to geographical map. Seet this movie for an explanation on how to use the Issue Crawler and the Issue Geographer.
    • Perform in-depth social network analysis with UCINET. Use the UCINET datafile from the Issue Crawler crawl details page.
  3. Blog Analysis
    • Find high, medium, and low authorative starting points for an issue with the Technorati scraper and Charts of the (Relative) Actor Resonance Per Issue. Use these starting points as input for co-link analysis with the Issue Crawler.
      E.g. Bruns' article on using the Issue Crawler and Technorati in Methodologies for Mapping the Political Blogosphere (2007).
    • (Relative) Actor Resonance Per Issue in the Blogosphere. Building upon Technorati, the tool shows a ratio of all issue postings to an organization's association with the issue postings.
    • Charts of the (Relative) Actor Resonance Per Issue in the Blogosphere. This tool charts the amount of issue postings and the ratio of issue postings to an organization's association with the issue postings.
  4. Google News (Image) Analysis (you'll need a special account for this)
    • Query a particular country or language for it's news
    • Compare the discourse accross countries
    • Compare news images accross countries
    • Compare images by media ownership
  5. Censorship Research
    • Redistributed Content Discovery
      • Scrape Google (international) for an issue that is suspected to be censored. Get unique phrases from the Google descriptions, and query the individual phrases in Google again. Perform a geoip and whois lookup of the sites to see who authors the sites and where the sites are hosted. In addition you can check if the sites are known to be blocked, or submit them to be checked, by the Open Net Initiative.
        E.g. A chapter by Richard Rogers in the forthcoming book by Jussi Parikka, Tony Sampson (eds.), 'The Spam Book: On Viruses, Spam, and Other Anomalies from the Dark Side of Digital Culture'
      • Scrape Google for a particular issue. Split the results in a list of blocked and non-blocked sites by providing a list of known blocked sites.
    • Use proxies to surf sites from within other countries (or view connection stats only).
    • Url discovery through hyperlink sampling with the Issue Crawler
      E.g. "A Censor's Network: Iranian Social, Political and Religious Sites. A Hyperlink Analysis Method for Censored Website Discovery" (govcom.org, December 2006) pdf
    • See the section on Search Enginge Behavior for more censorship research in relation to search engines
  6. Search Engine Behavior
    • Information retrieval is normally not considered dramatic. On the Web, however, information sources are in constant competition with each other to be returned in the top ten for any given query. The competition is particularly fierce for products and services. The quest to reach the top often prompts companies to enlist the black arts services of search engine optimizers. Use the Issue Dramaturg to see the rise and fall of a site's Google rank for an issue.
    • Use the Page Rank script to get a site's current Google Rank for an issue.
    • Query Google for its results on a particular query. See the section on Geopgraphical Analysis to find out where the machines and the owner of the webpages are based.
  7. Geographical Analysis / (De-/Re-)Territorialization of the web
    • The Issue Geographer shows on a geographical map where organizations in an issue network are based by querying whois databases and looking up the country in which the IP-address is based.
    • Geo-ip. This script looks up the country in which the machine, identified by an IP-address, is based.
    • Whois. This script looks up the country in which the domain owner (registrar) is based.
  8. Exclusion Policies
    • Explore robot exclusion policies. Sites may block robots, and thus prevent search engines' and other crawlers from indexing or scraping their sites for archiving or further analysis. Enter a URL and see which parts of the site are blocked from indexing.
  9. Folksonomies
    • De.licio.us Organization Tag Cloud Generator Per Issue. Building upon Del.icio.us, this tool shows, in a tag cloud, which URLs or tags are referred to per issue area.
    • De.licio.us Organization Tag Cloud Generator Per Site. Building upon Del.icio.us, this tool shows, in a tag cloud, which tags are referred to per site. Also the number of users who bookmarked the site is displayed.
    • Del.icio.us tag and save history for a url. Building upon Del.icio.us this tool discovers how a url was tagged and at what time which tags were used.

Tools

  1. Scrapers and crawlers (used for sampling)
  2. Analysis
    • Pagerank - Google's pagerank for sites for an issue / query.
    • Issue Crawler's Actor Profiler - Get Google's pagerank for the top 10 actors in a network
    • Issue Dramaturg - Scheduled pagerank for sites for issue
    • Issue Geographer - Show on a geographical map where organizations in an issue network are based
    • Surfer Issue Pathways - Building upon Alexa's related sites feature, this tool determines which sites are likely to be in the actual surfer paths of other sites related to the same issue.
    • Compare networks over time - Compare Issue Crawler networks over time. Displays ranked actor lists from a scheduled set of Issue Crawler results.
    • Frequency of hosts per issue, as a tagcloud
    • Get unique phrases from google descriptions
    • Split google results in blocked and unblocked sites
    • Actor Resonance - Building upon Technorati, this script gets the resonance of an actor for an issue in the Blogosphere
    • De.licio.us Organization Tag Cloud Generator Per Issue - Building upon Del.icio.us, this tool shows, in a tag cloud, which URLs or tags are referred to per issue area.
    • del.icio.us tag and save history for a url - Find which tags belong to a url
    • Compare lists - Compare two lists of urls for commonalities and differences.
    • Timestamp - Provides the last modification date of a webpage.
    • Wikipedia network analysis - This tool finds the hyperlink network around a Wikipedia topic.
    • Co-link analysis with the Issue Crawler
    • Collect all links / page on a site by snowballing or spidering
    • Issuediscovery - this tool discovers the most relevant words and phrases in a text and can be automatically applied to all pages in an Issue Crawler network
    • Tag Cloud - Input a text and see the frequency of words in a tag cloud
    • News Images (requires special login)
    • Lexical Analysis (Part Of Speech Tagging, Stemming of words, get out stopwords, etc)
  3. Visualizations
    • (line/bar/pie) graphs and charts
    • Network diagrams
    • Dorling
    • Scatter plots
    • Screenshot generator
    • Chart the ratio of blogsposts (Technorati).
    • Del.icio.us tag network
    • Tag Cloud - Input a text and see the frequency of words in a tag cloud
    • Issue Geographer - Show on a geographical map where organizations in an issue network are based
    • Orakel Machine
  4. External Tools