Visualizing Large Collections of URLs Using the Hilbert Curve

Abstract

Search engines like Google provide an aggregation mechanism for the web and constitute the main access point to the Internet for a large part of the population. For this reason, biases and personalization schemes of search results may have huge societal implications that require scientific inquiry and monitoring. This work is dedicated to visualizing data such inquiry produces as well as understanding changes and development over time in such data. We argue that the aforementioned data structure is very akin to text corpora, but possesses some distinct characteristics that requires novel visualization methods. The key differences between URLs and other textual data are their lack of internal cohesion, their relatively short lengths, and—most importantly—their semi-structured nature that is attributable to their standardized constituents (protocol, top-level domain, country domain, etc.). We present a technique to …

Publication
In International Cross-Domain Conference for Machine Learning and Knowledge Extraction

Related