How do you sample every nth record in DFSORT?

Use OUTFIL with SAMPLE=n. For example OUTFIL FNAMES=SORTOUT,SAMPLE=10 writes every 10th record (exact rule—e.g. 1st, 11th, 21st or 10th, 20th, 30th—is product-dependent). The selection is based on Relative Record Number (RRN), the position of the record in the input. You can combine with STARTREC= and ENDREC= to limit the range first.

What is STARTREC and ENDREC in DFSORT OUTFIL?

STARTREC=n means start selecting from the nth record (by position in the input, RRN). ENDREC=m means stop after the mth record. So only records with RRN between n and m (inclusive) are considered. You can use them together to process a slice of the file (e.g. STARTREC=100,ENDREC=200) and optionally add SAMPLE= to take every kth record in that slice.

What is the difference between SAMPLE=n and SAMPLE=(n,m)?

SAMPLE=n selects every nth record (one number controls the interval). SAMPLE=(n,m) usually defines a take/skip pattern: e.g. copy m records then skip, or similar, so you get blocks of m records with gaps. Exact (n,m) behavior is product-dependent; see your DFSORT manual. Use SAMPLE=n for simple “every nth”; use SAMPLE=(n,m) for “copy a few, skip a few” patterns.

Can you combine STARTREC, ENDREC, and SAMPLE?

Yes. For example OUTFIL FNAMES=OUT,STARTREC=10,ENDREC=50,SAMPLE=5 first restricts to records 10–50 (by RRN), then from that range selects every 5th record (e.g. 10, 15, 20, 25, 30, 35, 40, 45, 50). So you get a positional range and then sample within it.

What is Relative Record Number (RRN) in DFSORT?

RRN is the position of the record in the input dataset: the first record is 1, the second is 2, and so on. OUTFIL uses RRN for STARTREC=, ENDREC=, and SAMPLE= so you can select or sample by position rather than by the data in the record. RRN is implicit (based on order) and is not a field in the record.

DFSORT Sampling Records - OUTFIL SAMPLE STARTREC ENDREC RRN

Sampling Records in OUTFIL

Sampling means selecting a subset of records by their position in the file rather than by their content. For example, you might want every 10th record for an audit, or records 100 through 200 for a test extract. In DFSORT OUTFIL you do this using the Relative Record Number (RRN)—the position of each record in the input (1 for the first, 2 for the second, and so on). The main parameters are STARTREC= and ENDREC= to define a range of record positions, and SAMPLE= to select every nth record (or SAMPLE=(n,m) for a take/skip pattern). This page explains RRN, STARTREC, ENDREC, SAMPLE=n, and SAMPLE=(n,m), and when to use sampling instead of INCLUDE/OMIT.

OUTFIL Advanced

Relative Record Number (RRN)

The Relative Record Number is the ordinal position of the record in the input dataset. The first record has RRN 1, the second has RRN 2, and so on. RRN is not stored in the record; DFSORT assigns it based on the order records are read. So after a SORT, the “first” record is the one that sorted first (RRN 1), and the “last” is the one that sorted last (RRN n). All OUTFIL sampling and range selection is based on this implicit RRN.

STARTREC= and ENDREC=: Selecting a Range

STARTREC=n means “start including records from the nth record onward.” ENDREC=m means “stop after the mth record.” So together they restrict the output to records whose RRN is between n and m inclusive. If you specify only STARTREC=10, records 1–9 are skipped and 10 through the end are written. If you specify only ENDREC=100, records 1–100 are written and the rest are skipped. If you specify both STARTREC=10 and ENDREC=50, only records 10–50 are considered for the output. This is useful for extracting a slice of the file (e.g. for testing) or for limiting the scope before applying SAMPLE=.

Sampling and range parameters
Parameter	Meaning	Effect
STARTREC=n	First record (by RRN) to include	Skip records before position n
ENDREC=m	Last record (by RRN) to include	Stop after position m
SAMPLE=n	Every nth record	Reduce output to 1/n of the (range) records
SAMPLE=(n,m)	Take/skip pattern (product-dependent)	Blocks of m with gaps (see manual)

Example: records 5 through 10 only

text

1
2
  SORT FIELDS=COPY
  OUTFIL FNAMES=SORTOUT,STARTREC=5,ENDREC=10

Only the 5th, 6th, 7th, 8th, 9th, and 10th records (by position) are written to SORTOUT. Records 1–4 and 11 onward are not written.

SAMPLE=n: Every nth Record

SAMPLE=n selects every nth record. The exact rule can vary by product: some implementations include the first record and then every nth (so SAMPLE=3 gives RRN 1, 4, 7, 10, …), others include records at positions n, 2n, 3n (so SAMPLE=3 gives RRN 3, 6, 9, …). The MainframeTechHelp example shows SAMPLE=3 producing records 1, 4, 7, 10—i.e. first record then every third. Check your DFSORT manual for the exact behavior. Either way, the output has roughly 1/n of the records (over the range that is being considered).

Example: every 3rd record

text

1
2
  SORT FIELDS=COPY
  OUTFIL FNAMES=SORTOUT,SAMPLE=3

Only every 3rd record (by the product’s rule) is written. If the input has 1000 records, the output has on the order of 333 records. Use this for statistical sampling or to reduce volume for testing.

Combining Range and Sampling

You can use STARTREC= and ENDREC= to limit the range, then SAMPLE= to take every nth record within that range. For example STARTREC=10,ENDREC=50,SAMPLE=5 considers only records 10–50 (41 records), then selects every 5th from that set (e.g. 10, 15, 20, 25, 30, 35, 40, 45, 50—9 records). So you get a window of the file and a sample within that window.

Example: range then every 5th

text

1
2
  SORT FIELDS=COPY
  OUTFIL FNAMES=SORTOUT,STARTREC=10,ENDREC=50,SAMPLE=5

Records 10–50 are in scope; from those, every 5th is written. Exact list depends on whether SAMPLE=5 starts at 10 or at 15; typically you get a subset of {10, 15, 20, 25, 30, 35, 40, 45, 50}.

SAMPLE=(n,m): Take/Skip Pattern

Some products support SAMPLE=(n,m) to define a pattern of “copy m records, then skip some, then repeat.” The exact meaning of n and m is product-dependent. In one example, SAMPLE=(3,2) with STARTREC=2 produces records 2, 3, 5, 6, 8, 9: that is, from record 2 onward, copy 2 records (2 and 3), skip 1 (4), copy 2 (5 and 6), skip 1 (7), copy 2 (8 and 9). So (3,2) can mean “in groups of 3, take 2”—i.e. take 2, skip 1. Use SAMPLE=(n,m) when you want blocks of consecutive records with gaps between blocks. Check your manual for the exact (n,m) semantics.

Example: SAMPLE=(3,2) from record 2

text

1
2
  SORT FIELDS=(1,5,ZD,A)
  OUTFIL FNAMES=OUTPUT3,STARTREC=2,SAMPLE=(3,2)

Starting at RRN 2, the (3,2) pattern copies 2 records then skips 1. So output gets records 2, 3, then 5, 6, then 8, 9 (RRNs 2, 3, 5, 6, 8, 9). Record 4 and 7 are skipped. Useful when you want “pairs” of records with one skipped between each pair.

Multiple OUTFILs with Different Sampling

You can have several OUTFIL statements with different STARTREC/ENDREC/SAMPLE settings so that different subsets go to different files. For example: one OUTFIL with SAMPLE=3 for a 1-in-3 sample, another with STARTREC=4,SAMPLE=4,ENDREC=10 for records 4 and 8 only. Each OUTFIL is independent; the same input is read once and each OUTFIL applies its own selection. So you can produce multiple sample extracts in one pass.

Sampling vs INCLUDE/OMIT

Sampling (STARTREC, ENDREC, SAMPLE) selects by position: “which record number” or “every nth record.” INCLUDE and OMIT select by content: “records where this field equals this value” or “records where this condition is true.” Use sampling when the criterion is positional (e.g. audit 1% by taking every 100th record, or test on records 1–500). Use INCLUDE/OMIT when the criterion is data-driven (e.g. department = 5, or amount > 1000). You can combine both: for example INCLUDE to filter by department, then SAMPLE= to take every 10th of those.

Explain It Like I'm Five

Imagine a long line of people. RRN is their place in line: first person is 1, second is 2, and so on. STARTREC=5 means “start from the 5th person,” and ENDREC=10 means “stop after the 10th.” So you only look at people 5 through 10. SAMPLE=2 means “pick every 2nd one”—so from those six people you might pick the 5th, 7th, and 9th. So we are not choosing by name or age; we are choosing by where they stand in line. That’s sampling by position. SAMPLE=(3,2) is like “take 2 people, skip 1, take 2, skip 1”—so you get little groups of 2 with a gap between.

Exercises

Write an OUTFIL that writes only records 20 through 30 (by RRN). Use STARTREC and ENDREC.
Write an OUTFIL that writes every 10th record from the full file. Use SAMPLE=10.
Combine STARTREC=1, ENDREC=100 and SAMPLE=5. How many records do you expect in the output (approximately)?
When would you use SAMPLE= instead of INCLUDE? Give an example of each.

Quiz

Test Your Knowledge

1. What is RRN in DFSORT OUTFIL?

A field in the record
Relative Record Number—the position of the record in the input (1 for first, 2 for second, etc.)
Random record number
Only for VB files

2. What does SAMPLE=5 do?

Writes 5 records only
Selects every 5th record (e.g. RRN 1, 6, 11, ... or product-defined pattern like 5, 10, 15, ...)
Samples 5 percent
Skips the first 5

3. How do you limit sampling to a range of records (e.g. records 10 through 50)?

Use INCLUDE only
Use STARTREC=10 and ENDREC=50 to define the range; you can combine with SAMPLE= to take every nth within that range
SAMPLE has no range
Use OMIT with position

4. What does SAMPLE=(3,2) mean?

Every 3rd and every 2nd record
A pattern: copy 2 records, then skip until the next cycle (e.g. copy 2, skip 1, copy 2, skip 1), so you get blocks of 2 with a gap—exact behavior is product-dependent
Only 2 or 3 records
Records 2 and 3 only

5. When would you use sampling instead of INCLUDE/OMIT?

Only when INCLUDE is not available
When you want to select by position (every nth record or a record range) rather than by field value; INCLUDE/OMIT select by content
Never
Only for reports

Sampling Records in OUTFIL

Relative Record Number (RRN)

STARTREC= and ENDREC=: Selecting a Range

Example: records 5 through 10 only

SAMPLE=n: Every nth Record

Example: every 3rd record

Combining Range and Sampling

Example: range then every 5th

SAMPLE=(n,m): Take/Skip Pattern

Example: SAMPLE=(3,2) from record 2

Multiple OUTFILs with Different Sampling

Sampling vs INCLUDE/OMIT

Explain It Like I'm Five

Exercises

Quiz

Test Your Knowledge

Related Concepts

OUTFIL statement

SPLIT datasets

INCLUDE/OMIT

Related Pages