Loading a VSAM data set: initial load with REPRO

Initial load is the moment an empty VSAM cluster becomes a populated production file—or the moment a conversion project proves it can. Unlike a tiny test file where you type six records by hand, real loads move millions of rows from sequential extracts, legacy dumps, or another VSAM cluster into a freshly defined structure. The mechanical tool is almost always IDCAMS REPRO, but the engineering work is everything around REPRO: proving the cluster attributes, sorting keyed input, choosing REPLACE semantics, reconciling counts, and making sure the catalog story matches what applications will open Monday morning. This page focuses on that surrounding discipline so beginners understand why “just run REPRO” is never the whole job.

What “initial load” means operationally

Initial load establishes the first complete population of records (and for a KSDS, the index structure that mirrors those keys). It usually happens once per major release, after a full redefine, or during platform migration. Steady-state processing afterward might insert or update records through COBOL programs, CICS transactions, or smaller batch extracts. Because initial load often runs with elevated authority and tight downtime windows, change tickets should list prerequisites explicitly: sort product level, temporary dataset space, restart checkpoints, and back-out steps if REPRO abends halfway.

KSDS: the sort is not optional decoration

A KSDS keeps records ordered by the primary key. REPRO writes sequentially and expects ascending key order for the input stream so the index can be built consistently as records arrive. If your extract is keyed on customer number but sorted by load timestamp, you must re-sort by customer number before REPRO. For files with duplicate keys allowed, your sort should be stable and consistent with UNIQUEKEY versus NONUNIQUEKEY expectations on DEFINE. Skipping the sort because “the data looked sorted” is a common source of weekend pages.

text
1
2
3
4
5
6
7
8
9
10
11
12
13
14
//S001 EXEC PGM=SORT (illustrative pattern) //SORTIN DD DISP=SHR,DSN=extract.seq //SORTOUT DD DISP=(,CATLG),DSN=&&SORTED, // SPACE=(CYL,(50,10)),UNIT=SYSDA //SYSIN DD * SORT FIELDS=(1,10,CH,A) /* //R001 EXEC PGM=IDCAMS //SYSPRINT DD SYSOUT=* //IN1 DD DISP=SHR,DSN=&&SORTED //OUTVS DD DISP=OLD,DSN=PROD.CUST.MASTER //SYSIN DD * REPRO INFILE(IN1) OUTFILE(OUTVS) /*

ESDS and RRDS loading differences

ESDS

Entry-sequenced loads preserve arrival order. You still care about record length and block size compatibility, but you do not sort by a VSAM primary key because there is none. Logical delete flags or reorganization strategies may still apply depending on application design.

RRDS

Relative record datasets load by slot. Input ordering interacts with SKIP/COUNT and with how your application maps business keys to RRNs. Initial load jobs should document which RRNs are intentionally empty so later online programs do not treat them as errors.

Phase checklist

Initial load phases (conceptual, not a vendor proc)
PhaseWhat to verify
Prove the empty clusterLISTCAT ALL; confirm RECORDSIZE, KEYS, volumes, and free space align with expected input LRECL and key layout.
Prepare sequential inputValidate record length, key field alignment, and character set. For KSDS, sort by primary key ascending with stable tie-breakers if duplicates are allowed.
Run REPROUse INFILE/OUTFILE or INDATASET/OUTDATASET consistently with site standards; add REPLACE only when policy allows overlay of duplicate keys.
Post-load verificationLISTCAT again, run record-count reconciliation, execute a read-only sample program or utility, and archive SYSPRINT.

REPLACE, REUSE, and policy language

REPLACE tells REPRO it may overlay records in the target when keys or RRNs collide. That is powerful during reruns of a load job after a failure, but dangerous if two different business feeds accidentally share keys. REUSE relates to cluster reuse patterns on DEFINE and should be interpreted strictly per IBM text for your release. Security and audit teams may require DELETE+DEFINE instead of overlay semantics for certain datasets—follow governance, not personal taste.

Verification beyond “condition code zero”

  • Record counts: Match input, REPRO messages, and application SQL or control totals when available.
  • Extreme keys: Spot-check min and max keys with a read utility or program.
  • Extents: LISTCAT to see whether secondary allocation triggered more than expected, signaling undersized primary.
  • Smoke test OPEN: A tiny COBOL or utility read proves path, RACF, and AMP alignment for the consuming application ID.

Practice exercises

  1. Write a sort card sketch for a KSDS whose key is bytes 15–24 character ascending.
  2. List three reasons a load might succeed technically yet fail business validation.
  3. Pair-read the main REPRO page and note two parameters you would add for a partial reload scenario.
  4. Design a rollback paragraph for a change ticket if REPRO fails at 60 percent complete.

Explain like I'm five

Loading a KSDS is like filling a sticker book where every sticker has a number and the book only works if you put numbers in order. If you jump from 5 to 9 then back to 7, the book’s tabs get confused and the pages tear. Sorting first is your grown-up checking the numbers on the kitchen table before you paste. REPRO is the pasting step; it does not babysit the numbering for you.

Test your knowledge

Test Your Knowledge

1. You load a KSDS from a sequential file. The sort step failed but the operator restarted only the REPRO step. What is the highest risk?

  • No risk; REPRO always sorts internally
  • Keys may be out of order, breaking index assumptions or causing failures
  • CICS ignores KSDS order
  • SMS deletes the cluster

2. Which IDCAMS command typically precedes the first REPRO into a brand-new cluster?

  • LISTCAT only
  • DEFINE CLUSTER to create the empty structure
  • IEFBR14 without SYSIN
  • ICEGENER only

3. Why capture SYSPRINT from the load job?

  • Because printers need color ink
  • It is the audit trail for volumes, record counts, warnings, and messages that LISTCAT alone may not echo
  • It replaces RACF
  • It is optional for regulated data
Published
Read time13 min
AuthorMainframeMaster
Reviewed by MainframeMaster teamVerified: IBM IDCAMS REPRO load ordering guidanceSources: IBM z/OS DFSMS Access Method Services; VSAM Demystified (SG24-6105)Applies to: z/OS VSAM initial population with REPRO