CALIcon19 has ended
  • 1:00 PM – 7:00 PM  - Sponsor setup at the law school
  •  3:00 PM – 7:00 PM - Conference check-in at the law school
    • Pizza and beverage will be served
  • 6:00 PM - 6:30 PM - optional presenters meeting - room 136

All CALIcon19 sessions will be live streamed. Just click on the "Watch Live!" button in each session description. 
If you’re watching remotely you  can use the CALIcon Slack community to ask questions. To join the community visit http://slack2019.calicon.org/  and enter your email address. Once you’ve joined there is a  channel for each room, join those channels ask questions during the session. Here are links for each channel:
Friday, June 7 • 1:00pm - 2:00pm
Automating processing and intake in the institutional repository with Python

Sign up or log in to save this to your schedule and see who's attending!

The Charles B. Sears Law Library at the University at Buffalo School of Law recently completed a seven-month project to load the entire backfile of the schoo's six law journals onto its Digital Commons repository. The vast majority went fairly quickly, but some of the early volumes required a large amount of additional processing.

For its first 22 volumes, the Buffalo Law Review covered current legal developments through case notes, including 14 years of in-depth coverage of the previous year's New York Court of Appeals term. These case notes provide a contemporary review of the development of New York law through the 1950s and 1960s. Unfortunately, these case notes were trapped in large files that contained every case note for a single issue. Additionally, there was no indexing to help users find individual case notes. For the library to make these notes available individually, 100 PDF files would have to be split into almost 1,600 articles, and metadata created for each.

In the past, this processing would have been completed by multiple librarians and student workers. Right now, however, the libraries are facing severe staffing shortages and budget shortfalls. So, instead, through the power of Python, one faculty scholarship librarian was able to split and upload all 1,600 articles within six weeks. Using Python and a few free libraries, the library built a small suite of tools that were used to scan each large file, pull metadata from its embedded text, split the PDFs, and output everything into Digital Commons upload format.

In this session, you will learn about useful Python libraries for this type of project, the workflows used, problems encountered and their solutions, if any. You will also learn about the code structure used and how you can use this in your own repository projects. This session will be useful to any IR manager, whether using Digital Commons or another platform, who has or might have resources needing similar processing. The session does not assume previous Python programming experience, as the presenter had none before starting the project. Some coding knowledge will be helpful to someone embarking on a similar project, but is not necessary.

avatar for John Beatty

John Beatty

Faculty Scholarship Outreach Librarian, Charles B. Sears Law Library, University at Buffalo

Friday June 7, 2019 1:00pm - 2:00pm
288 University of South Carolina School of Law, Columbia SC

Attendees (9)

Twitter Feed