Experiments in Text Mining Tools
My understanding of text mining, based on our
readings, is that it functions as a “zoomed out,” or “macroscopic” (Weingart), perspective on
the collection/text in question. This can be useful in revealing patterns that
are not at all apparent unless one can view a data set as a whole, rather than
at just individual instances—being able to view trends with the help of technology,
as a means of better understanding or being able to better tell a narrative.
I used a couple different texts/search terms for my
experiments with text mining tools. For Voyant, I used the text of Persuasion
by Jane Austen. I was motivated to choose this text because I’m teaching it
right now to a class of undergrad English lit majors. I was curious to see what
patterns text mining would reveal and if that would contribute to mine and my
students’ understanding of the text. Also, ease of access was a factor; the
entire text is available in the public domain.
My other experiments used words/terms related to feminist
publishing, as that is what I (presently) am planning to write my dissertation
on.
Voyant –
Producing a word cloud through Voyant indicated that
the most frequently used words in the text of Persuasion are the names
of the main characters. This did not surprise me, but it did reinforced the
idea that this is a novel about relationships; it’s primary focus is not action
or drama, but rather people and their connections. This is backed up by the
phrase counter, which indicates that many of the novel’s most frequently used
phrases are prepositional phrases. Once again, this isn’t surprising, but it
does indicate an interesting phenomenon about writing (that I doubt is unique
to Jane Austen), which is how often we write/speak of things or people in
relation to other things/people (i.e. “On the subject,” “at hand”).
I intend to show the results to my students to see
what kinds of meaning they pull from the word map and other stats available on Voyant.
Google nGram Viewer—
Because I’m interested in the history of publishing, I
searched the words “writing” and “publishing” together in the ngram Viewer. I
was surprised at how much more frequently “writing” appeared than “publishing.”
I experimented with time frames, starting with 1800-1920 and widening
eventually to 1700-2000, thinking perhaps more advanced technology would make
publishing a more popular term as time progressed, but in fact, over a greater
span of time the use of “writing” increased in proportion to “publishing.”
I added the term “books,” which turned up some
interesting results: between approximately 1770 and 1980, “books” appears more
frequently (to varying degrees) than “writing.” However, around 1980, the two
terms basically swap places, with writing becoming more prominent.
This kind of unexpected anomaly makes me think this
tool can be useful (a literary scholar’s macroscope, if you will) in revealing
unexpected patterns than I can use as a starting point for research; for
instance, what factors change to make the usage of one term more popular than
another? Drilling down to specifics may or may not turn up anything
interesting, but such is the nature of research – dead ends are part of the
process – so using nGram viewer seems like a great place to starting point to investigate
further.
JSTOR DfR—
I found this tool useful more as a research aid than
as a text mining tool, largely (I’m guessing) because of my area of research: I
want to understand what is said about women and publishing together by
scholars, not necessarily the sheer number of mentions those terms get together
in JSTOR’s primary sources. That being said, even with a 21 year time spread, narrowed
to my discipline, and limited to book chapters and articles, I still returned
more than 30,000 results, which indicates that there is a lot of discussion on
this topic among scholars.
Scrolling the results gave me insight into the kinds
of books and journals would be great starting points for my research.
I also tried mining just the 19th century
pamphlets for these terms; the results may not be useful to me since I don’t
necessarily need primary historical records for my research, but the findings
were interesting nonetheless: many reports on court proceedings and pamphlets
on women’s suffrage, for instance.
Weingart, Scott. “The Joys of Big Data for
Historians.” The Historian's Macroscope: Big Digital History, 8 Dec.
2014, http://www.themacroscope.org/?page_id=595.
Comments
Post a Comment