The Library of Congress, its digital strategy, and crowdsourcing

Screenshot of the homepage of the Library of Congress’s Crowd program

In late October, I asked the Preservation Directorate of the Library of Congress (LOC), about what they decide to digitize and if they have a process similar to NARA (National Archives and Records Administration, called National Archives in the rest of this article), with their own digitization priorities including working with external partners. After thanking me for my interest in the LOC’s preservation work, Jon Sweitzer-Lamme of the Preservation Directorate responded by saying:

The Library’s digital strategy is available here: https://www.loc.gov/digital-strategy. Our prioritization is driven by demand, such as demand for our presidential papers collections like the newly released Theodore Roosevelt Papers (https://www.loc.gov/item/prn-18-132/), and preservation needs, especially if an item can’t be served to researchers anymore due to its condition. We have excellent in-house digitization capabilities and also utilize external contractors and partners to digitize our content.

Generally, that does answer my question, but unfortunately the answer from LOC did not come soon enough for a class assignment I had where I asked reference questions in the same vein of different institutions (AskUsNow!, Maryland State Archives, and UMD Archives). I’ll post that on Academia.edu likely later this month.

This also shows the site is made possible with a partnership via Amazon’s SES [Simple Email Service], a worrying infiltration of public institutions with those from the corporate world. Even so, the Crowd program runs on open source software, so that is a positive.
Most exciting of all is not the digital strategy, but LOC’s new “crowd” program, which is a bit like the citizen archivist initiative of the National Archives which I have participated a bit with in the past. While there are only five campaigns to transcribe, review, or tag information currently, but it is only in its beta stage, so this will likely be expanded in the future, without a doubt. This could become something of linked open data at its finest, not only connecting people with content, but bringing them further into the process to make the usage of records more collaborative for all, going beyond past efforts. In the coming days, I will test out the site and let the rest of you know on this blog what it is like. They even tied in the anniversary of the Gettysburg Address to this program.

With that, this new program fulfills the digital strategy of LOC (without a doubt different than the one in 2000), which states that their mission is to “engage, inspire, and inform the Congress and the American people with a universal and enduring source of knowledge and creativity,” with initiatives such as this one trying to ensure that “all Americans are connected to the Library of Congress.” This is also connected to their strategic plan which has four major goals: expanding access, enhancing services, optimizing resources, and measuring results. As for the digital strategy it also notes the role of digital technology in fulfilling the mission of this institution, while also “throwing open the treasure chest, connecting, and investing in our future.” This strategy is also forward-thinking, stating that:

The Library’s content, programs, and expertise are national treasures…We will make that content available and accessible to more people, work carefully to respect the expectations of the Congress and the rights of creators, and support the use of our content in software-enabled research, art, exploration, and learning The Library will continue to build a universal and enduring source of knowledge and creativity…We will expedite the availability of newly acquired or created content to the web and on-site access systems…We will explore creative solutions to reduce the barriers to material while respecting the rights of creators, the desires of our donors, and our other legal and ethical responsibilities…We will continue to enable computational use of our content and metadata…The Library offers an incredible wealth of content, programs, and services to Congress and the American people. We strive to connect with more users by making those services and content accessible for all…Many of the Library’s digital users come directly to our websites to discover content. To expose even more people to the Library’s content and services, we will bring digital content to users by making more of our material available in other websites and apps that they are already using…We will continue to participate in professional organizations and cooperatives that expand our perspectives and enable us to share our experiences. Additionally, developing partners in industry can allow us to connect the Library with new areas of expertise and resources…We will cultivate an innovation culture by empowering our staff, who have expertise in a wide range of subject areas, including the work of Congress, United States copyright law, American and foreign law, and our collections…Our plans for the future must entail preserving and protecting our collections and content…While we plan for our future, we are also paying close attention to innovations and trends that will present future challenges and opportunities. Newer tools, such as augmented and virtual reality, computer vision, natural language processing, and machine learning, are already transforming how we live and work.

Screenshot of the opening section of LOC’s digital strategy

There aren’t many other articles on this subject [1], from a quick online search, but all of the ones I found are relatively positive, although some are more critical than others. Roll Call, in their article on the subject, described how the digital strategy is “digital forward,” advocated strongly by Librarian of Congress Carla Hayden (who heads LOC, and formerly the Pratt Library in Baltimore), and Kate Zwaard, the Director of Digital Strategy. Most interesting in this article was not that Accenture, a huge contractor, won a contract “to build the long-planned new data center” for LOC, or that the plan includes “employing user-centered design to invite digital and physical visitors to explore more offerings” but that the organization has been stuck in the past, trying to shed this past, because it has “a computing system built in the 1970s to static processes for staff.” Having a 21st century computing system is important for LOC, which holds over 167 million items in its collections which sit on “approximately 838 miles of bookshelves,” making it the “largest library in the world.”

FedScoop also wrote about the digital strategy, noting that the “The Library of Congress…is interested in exploring what artificial intelligence and similar technologies can do for its mission,” saying this focus on digital aspects is not “out of the blue” as LOC launched labs.loc.gov, “a home for digital experiments…last year…[and] it…recently began experimenting with geographic information systems mapping as a way to explore collections online.” Both are positive aspects, to say the least.

Finally, there is Cory Doctrow of Boing Boing, which often has short articles with little content other than the document(s) they are quoting from. Regardless, Doctrow describes how the digital strategy supports “data-driven research with giant bulk-downloadable corpuses of materials and metadata…crowdsourc[ing] the acquisition of new materials…[and] preserv[ing] digital assets with the same assiduousness that the Library has shown with its physical collection for centuries,” among other aspects. He interestingly notes how the LOC has an “outsized role” in the current digital era because it contains the Copyright Office, which is “patient zero in the epidemic of terrible internet law that reaches into every corner of our lives.” This clashes with the fact that Carl Hayden, the Librarian of Congress “is the most freedom-friendly, internet-friendly, access-friendly leader in the Library’s history, replacing unfit leaders who were brought down in grotesque corruption scandals” even though her leadership has fallen short, in Doctrow’s view, because “the Copyright Office is still a creature of Big Content, and it has direct oversight over your ability to modify, repair, sell, and use all of your digital property.” Still, he argues that

…this digital strategy is a very bright light, but it shines in a dark and menacing cave. I love the Library — I love its work, its collections, its diligent and thoughtful staff, its magnificent building. But for all that, the Library has become a locus of terrible policy that runs directly counter to its mission. The contradiction between the Library’s mission and its real role in policy has never been more clear than it is in this wonderful document. [2]

That brings me to the end of this article. What are your thoughts on this new digital strategy of LOC and its new Crowd program?


Notes

[1] Through a further search I found a snippet from the report on infodocket, dh+lib blog of the ALA, and the Digital Journal.

[2] James Tanner of Genealogy’s Star makes a similar point, but says that LOC is not “certainly not the leader in the number and value of their online offerings” since the “the recent history of the Library of Congress is far from promising” with the closure of the Local History and Genealogy Reading Room in 2013, and the “inherent contradiction in the current efforts of the Library of Congress due to the fact that they are also the agency responsible for the controversial access policies inherent in the United States Copyright Law because the Copyright Office is an integral part of the Library.” This means, as Tanner argues,due to “Congressional action, use and access to many valuable research materials have been overwhelmingly restricted” while adding that “policies and budgetary constraints at both the Library of Congress and the National Archives have severely limited the number and availability of digitized records from both institutions. It would be a huge change if this present plan includes real changes in the number and availability to access items in both institutions collections.” Still, he is optimistic, saying that “it will be interesting to see what will happen, although I do not expect any significant changes during what is left of my lifetime,” although he says that the Internet Archive “may become the largest library in the world considering its growth during the past few months and years assuming they catch up with the National Library of Australia.”