MARC / PERL: Koha Integrated Library System
MARC::Doc::Tutorial - A documentation-only module for new users of MARC::Record
What is MARC? The MAchine Readable Cataloging format was designed by the Library of Congress in the late 1960s in order to allow libraries to convert their card catalogs into a digital format. The advantages of having computerized card catalogs were soon realized, and now MARC is being used by all sorts of libraries around the world to provide computerized access to their collections. MARC data in transmission format is optimized for processing by computers, so it's not very readable for the normal human. For more about the MARC format, visit the Library of Congress at http://www.loc.gov/marc/
What is this Tutorial?
The document you are reading is a beginners guide to using Perl to processing MARC data, written in the 'cookbook' style. Inside, you will find recipes on how to read, write, update and convert MARC data using the MARC::Record CPAN package. As with any cookbook, you should feel free to dip in at any section and use the recipe you find interesting. If you are new to Perl, you may want to read from the beginning. The document you are reading is distributed with the MARC::Record package, however in case you are reading it somewhere else, you can find the latest version at CPAN: http://www.cpan.org/modules/by-module/MARC/. You'll notice that some sections aren't filled in yet, which is a result of this document being a work in progress. If you have ideas for new sections please make a suggestion to perl4lib: http://www.rice.edu/perl4lib/.
History of MARC on CPAN
In 1999, a group of developers began working on MARC.pm to provide a Perl module for working with MARC data. MARC.pm was quite successful since it grew to include many new options that were requested by the Perl/library community. However, in adding these features the module swiftly outgrew its own clothes, and maintenance and addition of new features became extremely difficult. In addition, as libraries began using MARC.pm to process large MARC data files (1000 records) they noticed that memory consumption would skyrocket. Memory consumption became an issue for large batches of records because MARC.pm's object model was based on the 'batch' rather than the record... so each record in the file would often be read into memory. There were ways of getting around this, but they were not obvious. Some effort was made to reconcile the two approaches (batch and record), but with limited success. In mid 2001, Andy Lester released MARC::Record and MARC::Field which provided a much simpler and maintainable package for processing MARC data with Perl. As its name suggests, MARC::Record treats an individual MARC record as the primary Perl object, rather than having the object represent a given set of records. Instead of forking the two projects, the developers agreed to encourage use of the MARC::Record framework, and to work on enhancing MARC::Record rather than extending MARC.pm further. Soon afterwards, MARC::Batch was added, which allows you to read in a large data file without having to worry about memory consumption. In Dec., 2004, the MARC::Lint module, an extension to check the validity of MARC records, was removed from the MARC::Record distribution, to become a separately distributed package. This tutorial contains examples for using MARC::Lint.
Overview of MARC Classes
The MARC::Record package is made up of several separate packages. This can be somewhat confusing to people new to Perl, or Object Oriented Programming. However this framework allows easy extension, and is built to support new input/output formats as their need arises. For a good introduction to using the object oriented features of Perl, see the perlboot documentation that came with your version of Perl. Here are the packages that get installed with MARC::Record:
A convenience class for accessing MARC data contained in an external file.
An object for representing the indicators and subfields of a single MARC field.
This primary class represents a MARC record, being a container for multiple MARC::Field objects.
A superclass for representing files of MARC data.
A subclass of MARC::File for working with data encoded in the MicroLIF format.
A subclass of MARC::File for working with data encoded in the USMARC format.
It's already been mentioned but it's worth mentioning again: MARC::Doc::Tutorial is a work in progress, and you are encouraged to submit any suggestions for additional recipes via the perl4lib mailing list at http://www.rice.edu/perl4lib. Also, the development group is always looking for additional developers with good ideas; if you are interested you can sign up at SourceForge: http://sourceforge.net/projects/marcpm/.