Friday, May 29 • 9:00am - 10:15am
Browse History: Digital Archives & Cultural Heritage

A panel of three papers:

Everything on Paper Will Be Used Against Me: Quantifying Kissinger
Micki Kaufman

The practice of Digital History continues to grow and expand as ever more fascinating sources of data become available - both from the present and from the past. As digital archives grow in size and scope, historians are increasingly combining the traditional methods and sources of their field with computer science, literary theory, psychology, social science and visual design in exciting new ways. Interpretations based on text analysis techniques like word frequency analysis, topic modeling, sentiment analysis and corpus linguistics, combined with novel forms of knowledge production like ‘distant reading,’ can be made comprehensible using new, interdisciplinary tools and techniques.

With each step the state of research art continues to progress - these new approaches providing not merely new answers to existing questions, but ultimately allowing entirely new questions to be asked. Besides being confronted by the kinds of difficulties inherent in any interdisciplinary approach, digital history also remains complicated by issues that do not always have clear parallels in the (non-digital) historical tradition – data sustainability and transparency, openness and open-access, collaboration, and inter-disciplinarity.

My work is an attempt to integrate text analysis methodology and interactive visualization technology to an analysis of the Digital National Security Archive (DNSA)’s Kissinger Collection. As detailed on the project's web site (http://www.quantifyingkissinger.com) my work involves the application of a host of quantitative text analysis methods like word frequency/correlation, topic modeling and sentiment analysis, as well as a variety of data visualization designs and methods.

The archive, comprising approximately 17600 meeting memoranda (‘memcons’) and teleconference transcripts (‘telcons’) detailing the former US National Security Advisor and Secretary of State’s correspondence during the period 1969-­ 1977, has served as a basis for combining political/international relations history with the fields of linguistics and visual design.

For example, a combination of the computational approach employed herein with an emphasis on ‘emotional history’ has illuminated new avenues of inquiry that combine these approaches with more subjective questions around interpersonal dynamics and individual psychology, facilitating fascinating questions about the period. In addition, crucial to the production of knowledge in this digital paradigm is an understanding of the various types, impacts and opportunities posed by ‘error,’ itself a constant companion to any research work. While not unique to Digital Humanities, the interdisciplinary and ‘non-traditional’ nature of the interaction between history, design, psychology and linguistics means that error can present in novel ways – and investigating it has lead to novel insights.

This project's application of computational techniques to the study of twentieth-­century diplomatic history has already generated useful finding aids for researchers, provided essential testing grounds for new historical methodologies, and prompted new interpretations and questions about the Nixon/Kissinger era. More than detailing existing historical facts about Former Secretary of State Henry Kissinger, the project has begun to surface deeper understandings and questions about how this new kind of ‘distant’ knowledge is formed, and the ramifications for historical analysis.

Email Data Analysis as an Alternate Lens into Historical Events
Craig Evans

Email data provide a rich account of interpersonal discourse on various scales, ranging from pairwise conversations to small group discussions to information dissemination and sharing by large amounts of people. From a humanities point of view, email data enables the retrospective analysis of events and organizations for which little alternative first-hand material might be available.

People have used email data to study questions around remote collaborative work, e.g. problem solving and cooperation in work teams, among other purposes. Also, in social science, scholars have applied socio-technical analysis methods to email data in order to gain a better understanding of the interplay of social structure and language use in real-world organizations. One particular challenge with email data specifically that also relates to other types of authorship data and social media data is the mapping of email addresses to individuals. This matters because a) many people use more than one email address and b) email addresses might refer to actual individuals versus larger collectives, e.g. in the case of mailing lists accounts.

Using the Enron email data, which entail about 400K of emails over a range of more than three years, we show how data provenance techniques, which here means pre-processing steps that allow for using actual individuals as the smallest social unit of analysis, have a large impact on the insights we gain from analyzing these data. We do this by showing the differences in substantive knowledge gained about the social dynamics in this organization that are due to various data consolidation techniques - instead of actual social dynamics. We will also provide an approximation of the “true” picture of these dynamics as reflected in the underlying data based on associating email addresses with actual people as much as possible. Doing that is tedious. We provide a cost benefit analysis of this process. This work also contributes to a better understanding of the robust of social network metrics.

However, we can’t stop at studying the characteristics and patterns emerging from these networks because prior research has shown that without considering the substance of information, we are limited in our ability to understand the effects of language use in networks and vice versa. We show how exploiting information from email headers to build time-stamped explicit social networks can be combined with analyzing the content of text bodies through natural language processing techniques to better understand the flow of knowledge and information in a social system. More specifically, we compare the entropy of the social system to the entropy of language use. Also, we correlate different stages of the crises with indicators of conflict from language use.

Digital Cultural Hegemony: Project Funding Trends and Impact on Digital Access to Cultural Heritage
John D. Martin III and Carolyn Runyon

External funding has become a fundamental aspect of many organizations, and cultural heritage institutions and programs are not immune. The ways in which funding is distributed represent for institutions and funding agencies implicit, and sometimes explicit, judgements of value, utility, or importance. Just as no research is done in a vacuum, cultural heritage institutions and programs do not exist for their own sake or the sake of their stakeholders. In cases where external funding is crucial to the continued existence of such institutions and programs, efforts at convincing funders of the importance of support are not a trivial undertaking. In the end, decisions that make or break programs are made by people who have little or no personal interest or stake. Over the past several decades, we have turned our attention to developing and maintaining avenues for digital access to cultural heritage. In this project, we examine the extent to which public funding impacts the digital access to cultural heritage. Using publicly-available data provided by the the National Endowment for the Humanities on grants awarded from 1980 to 2014, we identify and analyze trends in funding for digital and computational projects in the humanities, with particular attention paid to heritage and archival projects. We consider levels of funding for projects which represent the country’s diverse ethnic, linguistic, cultural and racial heritage within the context of the overall funding scheme of the NEH. Trends in funding data are compared and discussed in the context of United States census data. This research is intended to raise questions about what types of digital cultural heritage projects national tax-endowed humanities funding agencies are and should be supporting. Our analysis is designed to consider particularly whether and how funding is extended to projects that exist at the margins of the mainstream.


Attendees (25)