Support Scripts

Table of Contents

authority_control_fields: Connecting Bibliographic and Authority records
make_concerto_from_evergreen_db.pl: Generating Evergreen enhanced datasets
marc_export: Exporting Bibliographic Records into MARC files
- Options
Parallel Ingest with pingest.pl
- Command Line Options
Importing Authority Records from Command Line
Juvenile-to-adult batch script
MARC Stream Importer

Various scripts are included with Evergreen in the /openils/bin/ directory (and in the source code in Open-ILS/src/support-scripts and Open-ILS/src/extras). Some of them are used during the installation process, such as eg_db_config, while others are usually run as cron jobs for routine maintenance, such as fine_generator.pl and hold_targeter.pl. Others are useful for less frequent needs, such as the scripts for importing/exporting MARC records. You may explore these scripts and adapt them for your local needs. You are also welcome to share your improvements or ask any questions on the Evergreen IRC channel or email lists.

Here is a summary of the most commonly used scripts. The script name links to more thorough documentation, if available.

action_trigger_aggregator.pl — Groups together event output for already processed events. Useful for creating files that contain data from a group of events. Such as a CSV file with all the overdue data for one day.
action_trigger_runner.pl — Useful for creating events for specified hooks and running pending events
authority_authority_linker.pl — Links reference headings in authority records to main entry headings in other authority records. Should be run at least once a day (only for changed records).
authority_control_fields.pl — Links bibliographic records to the best matching authority record. Should be run at least once a day (only for changed records). You can accomplish this by running authority_control_fields.pl --days-back=1
autogen.sh — Generates web files used by the OPAC, especially files related to organization unit hierarchy, fieldmapper IDL, locales selection, facet definitions, compressed JS files and related cache key
clark-kent.pl — Used to start and stop the reporter (which runs scheduled reports)
eg_db_config — Creates database and schema, updates config files, sets Evergreen administrator username and password
fine_generator.pl
hold_targeter.pl
marc2are.pl — Converts authority records from MARC format to Evergreen objects suitable for importing via pg_loader.pl (or parallel_pg_loader.pl)
make_concerto_from_evergreen_db.pl — This experimental script is responsible for generating the enhanced concerto dataset from a live Evergreen database.
marc2bre.pl — Converts bibliographic records from MARC format to Evergreen objects suitable for importing via pg_loader.pl (or parallel_pg_loader.pl)
marc2sre.pl — Converts serial records from MARC format to Evergreen objects suitable for importing via pg_loader.pl (or parallel_pg_loader.pl)
marc_export — Exports authority, bibliographic, and serial holdings records into any of these formats: USMARC, UNIMARC, XML, BRE, ARE
osrf_control — Used to start, stop and send signals to OpenSRF services
parallel_pg_loader.pl — Uses the output of marc2bre.pl (or similar tools) to generate the SQL for importing records into Evergreen in a parallel fashion

authority_control_fields: Connecting Bibliographic and Authority records

This script matches headings in bibliographic records to the appropriate authority records. When it finds a match, it will add a subfield 0 to the matching bibliographic field.

Here is how the matching works:

Bibliographic field	Authority field it matches	Subfields that it examines
100	100	a,b,c,d,f,g,j,k,l,n,p,q,t,u
110	110	a,b,c,d,f,g,k,l,n,p,t,u
111	111	a,c,d,e,f,g,j,k,l,n,p,q,t,u
130	130	a,d,f,g,h,k,l,m,n,o,p,r,s,t
600	100	a,b,c,d,f,g,h,j,k,l,m,n,o,p,q,r,s,t,v,x,y,z
610	110	a,b,c,d,f,g,h,k,l,m,n,o,p,r,s,t,v,w,x,y,z
611	111	a,c,d,e,f,g,h,j,k,l,n,p,q,s,t,v,x,y,z
630	130	a,d,f,g,h,k,l,m,n,o,p,r,s,t,v,x,y,z
648	148	a,v,x,y,z
650	150	a,b,v,x,y,z
651	151	a,v,x,y,z
655	155	a,v,x,y,z
700	100	a,b,c,d,f,g,j,k,l,n,p,q,t,u
710	110	a,b,c,d,f,g,k,l,n,p,t,u
711	111	a,c,d,e,f,g,j,k,l,n,p,q,t,u
730	130	a,d,f,g,h,j,k,m,n,o,p,r,s,t
751	151	a,v,x,y,z
800	100	a,b,c,d,e,f,g,j,k,l,n,p,q,t,u,4
830	130	a,d,f,g,h,k,l,m,n,o,p,r,s,t

make_concerto_from_evergreen_db.pl: Generating Evergreen enhanced datasets

This script makes it possible to continue to improve/maintain the Evergreen enhanced dataset. This script requires access to a Postgres database. It will automate the process of making the enhanced dataset match the current branch of Evergreen. You need to provide the login credentials to the database as well as a path to the Evergreen repository where you’re currently on the intended branch.

This script has known bugs and should be considered experimental. Its output should be carefully reviewed before committing changes to to Evergreen or opening a pull request for updating the dataset.

Generate new dataset from existing DB

This command will produce new output sql from an already-existing database. It requires that you’ve also pre-created a PG database representing the "seed" database. The seed database is an Evergreen database created without data but from the branch of Evergreen that matches the dataset’s branch.

./make_concerto_from_evergreen_db.pl \
--db-host localhost \
--db-user evergreen \
--db-pass evergreen \
--db-port 5432 \
--db-name eg_enhanced \
--output-folder output \
--seed-db-name seed_from_1326 \
--evergreen-repo /home/opensrf/repos/Evergreen

If you don’t have a seed database, you can omit it, and the software will make one based upon the version we find in the file <output_folder>/config.upgrade_log.sql

./make_concerto_from_evergreen_db.pl \
--db-host localhost \
--db-user evergreen \
--db-pass evergreen \
--db-port 5432 \
--db-name eg_enhanced \
--output-folder output \
--evergreen-repo /home/opensrf/repos/Evergreen

Or, you can have this software make a seed DB, and that’s all it will do. The version of Evergreen it will use will be found in <output_folder>/config.upgrade_log.sql

./make_concerto_from_evergreen_db.pl \
--db-host localhost \
--db-user evergreen \
--db-pass evergreen \
--db-port 5432 \
--output-folder output \
--evergreen-repo /home/opensrf/repos/Evergreen \
--create-seed-db

Or, you can have this software make a seed DB based on your specified version of Evergreen

./make_concerto_from_evergreen_db.pl \
--db-host localhost \
--db-user evergreen \
--db-pass evergreen \
--db-port 5432 \
--output-folder output \
--evergreen-repo /home/opensrf/repos/Evergreen \
--create-seed-db \
--seed-from-egdbid 1350

Upgrade a previously-created dataset

Use this when cutting new releases of Evergreen and you want to include the enhanced dataset to match. It will use the current git branch found in the provided path to the EG repo.

./make_concerto_from_evergreen_db.pl \
--db-host localhost \
--db-user evergreen \
--db-pass evergreen \
--db-port 5432 \
--output-folder output \
--evergreen-repo /home/opensrf/repos/Evergreen \
--perform-upgrade

Test the existing dataset

Create a new database and restore the dataset. The software will first create a database that matches the version of Evergreen in the dataset output folder, then restore the dataset into the newly created database.

./make_concerto_from_evergreen_db.pl \
--db-host localhost \
--db-user evergreen \
--db-pass evergreen \
--db-port 5432 \
--output-folder output \
--evergreen-repo /home/opensrf/repos/Evergreen \
--test-restore

marc_export: Exporting Bibliographic Records into MARC files

The following procedure explains how to export Evergreen bibliographic records into MARC files using the marc_export support script. All steps should be performed by the opensrf user from your Evergreen server.

Processing time for exporting records depends on several factors such as the number of records you are exporting. It is recommended that you divide the export ID files (records.txt) into a manageable number of records if you are exporting a large number of records.

Create a text file list of the Bibliographic record IDs you would like to export from Evergreen. One way to do this is using SQL:
```
SELECT DISTINCT bre.id FROM biblio.record_entry AS bre
    JOIN asset.call_number AS acn ON acn.record = bre.id and not acn.deleted
    WHERE bre.deleted='false' and owning_lib=101 \g /home/opensrf/records.txt;
```
This query creates a file called records.txt containing a column of distinct IDs of items owned by the organizational unit with the id 101.

Navigate to the support-scripts folder

cd /home/opensrf/Evergreen-ILS*/Open-ILS/src/support-scripts/

Run marc_export, using the ID file you created in step 1 to define which files to export. The following example exports the records into MARCXML format.

cat /home/opensrf/records.txt | ./marc_export --store -i -c /openils/conf/opensrf_core.xml \
    -x /openils/conf/fm_IDL.xml -f XML --timeout 5 > exported_files.xml

marc_export does not output progress as it executes.

Options

The marc_export support script includes several options. You can find a complete list by running ./marc_export -h. A few key options are also listed below:

--descendants and --library

The marc_export script has two related options, --descendants and --library. Both options take one argument of an organizational unit

The --library option will export records with holdings at the specified organizational unit only. By default, this only includes physical holdings, not electronic ones (also known as located URIs).

The descendants option works much like the --library option except that it is aware of the org. tree and will export records with holdings at the specified organizational unit and all of its descendants. This is handy if you want to export the records for all of the branches of a system. You can do that by specifying this option and the system’s shortname, instead of specifying multiple --library options for each branch.

Both the --library and --descendants options can be repeated. All of the specified org. units and their descendants will be included in the output. You can also combine --library and --descendants options when necessary.

--pipe

If you want to use the --library and --descendants options with a list of bib ids from standard input, you can make use of the --pipe option.

If you have a master list of bib ids, and only want to export bibs that have holdings from certain owning libraries then this option will help you reach that goal.

It will not work to combine --all or --since with --pipe.

--items

The --items option will add an 852 field for every relevant item to the MARC record. This 852 field includes the following information:

Subfield	Contents
$b (occurrence 1)	Call number owning library shortname
$b (occurrence 2)	Item circulating library shortname
$c	Shelving location
$g	Circulation modifier
$j	Call number
$k	Call number prefix
$m	Call number suffix
$p	Barcode
$s	Status
$t	Copy number
$x	Miscellaneous item information
$y	Price

Subfield

Contents

$b (occurrence 1)

Call number owning library shortname

$b (occurrence 2)

Item circulating library shortname

Shelving location

Circulation modifier

Call number

Call number prefix

Call number suffix

Barcode

Status

Copy number

Miscellaneous item information

Price

--since

You can use the --since option to export records modified after a certain date and time.

--store

By default, marc_export will use the reporter storage service, which should work in most cases. But if you have a separate reporter database and you know you want to talk directly to your main production database, then you can set the --store option to cstore or storage.

--uris

The --uris option (short form: -u) allows you to export records with located URIs (i.e. electronic resources). When used by itself, it will export only records that have located URIs. When used in conjunction with --items, it will add records with located URIs but no items/copies to the output. If combined with a --library or --descendants option, this option will limit its output to those records with URIs at the designated libraries. The best way to use this option is in combination with the --items and one of the --library or --descendants options to export all of a library’s holdings both physical and electronic.

--check-leader

Ensure all leaders are exactly 24 characters long via adding or removing characters to maximize compatibility with other systems.

Parallel Ingest with pingest.pl

A program named pingest.pl allows fast bibliographic record ingest. It performs ingest in parallel so that multiple batches can be done simultaneously. It operates by splitting the records to be ingested up into batches and running all of the ingest methods on each batch. You may pass in options to control how many batches are run at the same time, how many records there are per batch, and which ingest operations to skip.

The browse ingest is presently done in a single process over all of the input records as it cannot run in parallel with itself. It does, however, run in parallel with the other ingests.

Command Line Options

pingest.pl accepts the following command line options:

--host

The server where PostgreSQL runs (either host name or IP address). The default is read from the PGHOST environment variable or "localhost."

--port

The port that PostgreSQL listens to on host. The default is read from the PGPORT environment variable or 5432.

--db

The database to connect to on the host. The default is read from the PGDATABASE environment variable or "evergreen."

--user

The username for database connections. The default is read from the PGUSER environment variable or "evergreen."

--password

The password for database connections. The default is read from the PGPASSWORD environment variable or "evergreen."

--batch-size

Number of records to process per batch. The default is 10,000.

--max-child

Max number of worker processes (i.e. the number of batches to process simultaneously). The default is 8.

--skip-browse

--skip-attrs

--skip-search

--skip-facets

--skip-display

Skip the selected reingest component.

--attr

This option allows the user to specify which record attributes to reingest. It can be used one or more times to specify one or more attributes to ingest. It can be omitted to reingest all record attributes. This option is ignored if the --skip-attrs option is used.

The --attr option is most useful after doing something specific that requires only a partial ingest of records. For instance, if you add a new language to the config.coded_value_map table, you will want to reingest the item_lang attribute on all of your records. The following command line will do that, and only that, ingest:

$ /openils/bin/pingest.pl --skip-browse --skip-search --skip-facets \
    --skip-display --attr=item_lang

--rebuild-rmsr

This option will rebuild the reporter.materialized_simple_record (rmsr) table after the ingests are complete.

This option might prove useful if you want to rebuild the table as part of a larger reingest. If all you wish to do is to rebuild the rmsr table, then it would be just as simple to connect to the database server and run the following SQL:

SELECT reporter.refresh_materialized_simple_record();

Importing Authority Records from Command Line

The major advantages of the command line approach are its speed and its convenience for system administrators who can perform bulk loads of authority records in a controlled environment. For alternate instructions, see the cataloging manual.

Run marc2are.pl against the authority records, specifying the user name, password, MARC type (USMARC or XML). Use STDOUT redirection to either pipe the output directly into the next command or into an output file for inspection. For example, to process a file with authority records in MARCXML format named auth_small.xml using the default user name and password, and directing the output into a file named auth.are:
```
cd Open-ILS/src/extras/import/
perl marc2are.pl --user admin --pass open-ils --marctype XML auth_small.xml > auth.are
```
The MARC type will default to USMARC if the --marctype option is not specified.
Run parallel_pg_loader.pl to generate the SQL necessary for importing the authority records into your system. This script will create files in your current directory with filenames like pg_loader-output.are.sql and pg_loader-output.sql (which runs the previous SQL file). To continue with the previous example by processing our new auth.are file:
```
cd Open-ILS/src/extras/import/
perl parallel_pg_loader.pl --auto are --order are auth.are
```
To save time for very large batches of records, you could simply pipe the output of marc2are.pl directly into parallel_pg_loader.pl.
Load the authority records from the SQL file that you generated in the last step into your Evergreen database using the psql tool. Assuming the default user name, host name, and database name for an Evergreen instance, that command looks like:
```
psql -U evergreen -h localhost -d evergreen -f pg_loader-output.sql
```

Juvenile-to-adult batch script

The batch juv_to_adult.srfsh script is responsible for toggling a patron from juvenile to adult. It should be set up as a cron job.

This script changes patrons to adult when they reach the age value set in the library setting named "Juvenile Age Threshold" (global.juvenile_age_threshold). When no library setting value is present at a given patron’s home library, the value passed in to the script will be used as a default.

MARC Stream Importer

The MARC Stream Importer can import authority records or bibliographic records. A single running instance of the script can import either type of record, based on the record leader.

This support script has its own configuration file, marc_stream_importer.conf, which includes settings related to logs, ports, uses, and access control.

By default, marc_stream_importer.pl will typically be located in the /openils/bin directory. marc_stream_importer.conf will typically be located in /openils/conf.

The importer is even more flexible than the staff client import, including the following options:

--bib-auto-overlay-exact and --auth-auto-overlay-exact: overlay/merge on exact 901c matches
--bib-auto-overlay-1match and --auth-auto-overlay-1match: overlay/merge when exactly one match is found
--bib-auto-overlay-best-match and --auth-auto-overlay-best-match: overlay/merge on best match
--bib-import-no-match and --auth-import-no-match: import when no match is found

One advantage to using this tool instead of the staff client Import interface is that the MARC Stream Importer can load a group of files at once.

To report a problem with this documentation or provide feedback, please contact the DIG mailing list.