For the past year or so, I’ve been interested in putting together a small team of like-minded folks to help bring to fruition a data visualization project that could benefit less-prepared college students, teachers in the humanities, and researchers alike. Often, underprepared or at-risk educational populations struggle to connect literary study with the so-called “real world,” leading to a saddening lack of interest in the possibilities of the English language, much less literary study. I’d like to collaborate with someone to develop a web application drawing on WordNet—and particularly the range of semantic similarity extensions built around WordNet—to visually mark up and weight by color the semantic patterns emerging from small uploaded portions of text. This kind of application can not only help students attend more fully to the structures of representation in literature and the larger world around them—through the means of a tool emphatically of the “real world”—but also enable scholars to unearth unexpected connections in larger bodies of text. Like literary texts to many students, the existing semantic similarity tools available through the open source community can seem inaccessible, even foreign, to a lay audience; this project seeks to lay open the language that so many fear, while enabling the critical thinking involved in literary analysis. Ultimately, we hope to extend this application with a collaborative and growing database of user-generated annotations, and perhaps with time, to fold in a historically-conscious dictionary as well. We are seeking an NEH Digital Humanities startup grant to pursue this project fully, and I’d like the opportunity to throw our idea into the ring at THATcamp to explore its problems as well as possibilities, even gathering more collaborators along the way.
Here’s a hand-colored version of something like what I’m thinking; I used WordNet::Similarity to generate the numbers indicating degree of relatedness, and then broke those numbers into a visual weighting system. Implementation hurdles do come out pretty clearly when you see how the numbers are generated, so I’m hoping someone out there will have better insights into the how of it all.
To a related, larger point: I always have the sneaking suspicion that this has been done before–Jodi Schneider mentioned LiveInk, a program that reformats text according to its semantic units, so that readers can more effectively grasp and retain content. This strikes me as simlar, as well, to the kinds of issues raised by Douglas Knox–using scale and format to retrieve “structured information.” Do the much-better-informed Campers out there know of an already-existing project like this? I wish the checklist of visual thinking tools that George Brett proposes were already here!