Must input be sorted before REPRO loads a KSDS?

Yes. REPRO builds the KSDS index while writing; records must arrive in ascending primary key order. If you load unsorted input, you risk failures or logically wrong data. Use DFSORT or an equivalent sort step unless the upstream file is already guaranteed ordered.

Can I load into a cluster that already has records?

You may use REPLACE or delete and redefine strategies depending on business rules. REPLACE overwrites records with matching keys; it is not a free-form merge of two arbitrary unsorted sources. Always test in a sandbox and capture LISTCAT before and after.

What is the difference between initial load and daily incremental batch?

Initial load establishes the baseline population, often from a sequential extract or conversion. Daily batch usually updates records through application programs or smaller REPRO extracts. Initial loads stress sort, space, and catalog planning; daily work stresses locking and transaction design.

Do I need a separate step to empty the cluster before reload?

If policy requires a clean slate, teams DELETE and DEFINE, or use IDCAMS techniques appropriate to the situation. Sometimes REUSE and controlled REPRO REPLACE are enough. The correct answer is dictated by retention, audit, and downtime windows—document the standard for your application.

MainframeMaster

Loading a VSAM data set: initial load with REPRO

Initial load is the moment an empty VSAM cluster becomes a populated production file—or the moment a conversion project proves it can. Unlike a tiny test file where you type six records by hand, real loads move millions of rows from sequential extracts, legacy dumps, or another VSAM cluster into a freshly defined structure. The mechanical tool is almost always IDCAMS REPRO, but the engineering work is everything around REPRO: proving the cluster attributes, sorting keyed input, choosing REPLACE semantics, reconciling counts, and making sure the catalog story matches what applications will open Monday morning. This page focuses on that surrounding discipline so beginners understand why “just run REPRO” is never the whole job.

What “initial load” means operationally

Initial load establishes the first complete population of records (and for a KSDS, the index structure that mirrors those keys). It usually happens once per major release, after a full redefine, or during platform migration. Steady-state processing afterward might insert or update records through COBOL programs, CICS transactions, or smaller batch extracts. Because initial load often runs with elevated authority and tight downtime windows, change tickets should list prerequisites explicitly: sort product level, temporary dataset space, restart checkpoints, and back-out steps if REPRO abends halfway.

KSDS: the sort is not optional decoration

A KSDS keeps records ordered by the primary key. REPRO writes sequentially and expects ascending key order for the input stream so the index can be built consistently as records arrive. If your extract is keyed on customer number but sorted by load timestamp, you must re-sort by customer number before REPRO. For files with duplicate keys allowed, your sort should be stable and consistent with UNIQUEKEY versus NONUNIQUEKEY expectations on DEFINE. Skipping the sort because “the data looked sorted” is a common source of weekend pages.

text

1
2
3
4
5
6
7
8
9
10
11
12
13
14
//S001   EXEC PGM=SORT           (illustrative pattern)
//SORTIN DD DISP=SHR,DSN=extract.seq
//SORTOUT DD DISP=(,CATLG),DSN=&&SORTED,
//           SPACE=(CYL,(50,10)),UNIT=SYSDA
//SYSIN  DD *
  SORT FIELDS=(1,10,CH,A)
/*
//R001   EXEC PGM=IDCAMS
//SYSPRINT DD SYSOUT=*
//IN1    DD DISP=SHR,DSN=&&SORTED
//OUTVS  DD DISP=OLD,DSN=PROD.CUST.MASTER
//SYSIN  DD *
  REPRO INFILE(IN1) OUTFILE(OUTVS)
/*

ESDS and RRDS loading differences

ESDS

Entry-sequenced loads preserve arrival order. You still care about record length and block size compatibility, but you do not sort by a VSAM primary key because there is none. Logical delete flags or reorganization strategies may still apply depending on application design.

RRDS

Relative record datasets load by slot. Input ordering interacts with SKIP/COUNT and with how your application maps business keys to RRNs. Initial load jobs should document which RRNs are intentionally empty so later online programs do not treat them as errors.

Phase checklist

Initial load phases (conceptual, not a vendor proc)
Phase	What to verify
Prove the empty cluster	LISTCAT ALL; confirm RECORDSIZE, KEYS, volumes, and free space align with expected input LRECL and key layout.
Prepare sequential input	Validate record length, key field alignment, and character set. For KSDS, sort by primary key ascending with stable tie-breakers if duplicates are allowed.
Run REPRO	Use INFILE/OUTFILE or INDATASET/OUTDATASET consistently with site standards; add REPLACE only when policy allows overlay of duplicate keys.
Post-load verification	LISTCAT again, run record-count reconciliation, execute a read-only sample program or utility, and archive SYSPRINT.

REPLACE, REUSE, and policy language

REPLACE tells REPRO it may overlay records in the target when keys or RRNs collide. That is powerful during reruns of a load job after a failure, but dangerous if two different business feeds accidentally share keys. REUSE relates to cluster reuse patterns on DEFINE and should be interpreted strictly per IBM text for your release. Security and audit teams may require DELETE+DEFINE instead of overlay semantics for certain datasets—follow governance, not personal taste.

Verification beyond “condition code zero”

Record counts: Match input, REPRO messages, and application SQL or control totals when available.
Extreme keys: Spot-check min and max keys with a read utility or program.
Extents: LISTCAT to see whether secondary allocation triggered more than expected, signaling undersized primary.
Smoke test OPEN: A tiny COBOL or utility read proves path, RACF, and AMP alignment for the consuming application ID.

Practice exercises

Write a sort card sketch for a KSDS whose key is bytes 15–24 character ascending.
List three reasons a load might succeed technically yet fail business validation.
Pair-read the main REPRO page and note two parameters you would add for a partial reload scenario.
Design a rollback paragraph for a change ticket if REPRO fails at 60 percent complete.

Explain like I'm five

Loading a KSDS is like filling a sticker book where every sticker has a number and the book only works if you put numbers in order. If you jump from 5 to 9 then back to 7, the book’s tabs get confused and the pages tear. Sorting first is your grown-up checking the numbers on the kitchen table before you paste. REPRO is the pasting step; it does not babysit the numbering for you.

Test your knowledge

Test Your Knowledge

1. You load a KSDS from a sequential file. The sort step failed but the operator restarted only the REPRO step. What is the highest risk?

No risk; REPRO always sorts internally
Keys may be out of order, breaking index assumptions or causing failures
CICS ignores KSDS order
SMS deletes the cluster

2. Which IDCAMS command typically precedes the first REPRO into a brand-new cluster?

LISTCAT only
DEFINE CLUSTER to create the empty structure
IEFBR14 without SYSIN
ICEGENER only

3. Why capture SYSPRINT from the load job?

Because printers need color ink
It is the audit trail for volumes, record counts, warnings, and messages that LISTCAT alone may not echo
It replaces RACF
It is optional for regulated data