How do you write unique records to one file and duplicate records to another in DFSORT?

Use ICETOOL with two SELECT operators. SELECT FROM(input) TO(unique_file) ON(position,length,format) NODUPS sends records with unique keys to the first file. SELECT FROM(input) TO(duplicate_file) ON(position,length,format) ALLDUPS sends all records that have at least one duplicate key to the second file. The input should be sorted by the key specified in ON().

What is XSUM in sort?

XSUM is a Syncsort feature used with SUM FIELDS=NONE. It keeps the first occurrence of each key in SORTOUT and writes all eliminated duplicate records to the dataset specified by the SORTXSUM DD. DFSORT does not support XSUM; use ICETOOL SELECT with NODUPS and ALLDUPS to achieve the same result.

What is the difference between NODUPS and ALLDUPS in ICETOOL?

NODUPS selects only records whose key appears exactly once in the input—the "unique" records. ALLDUPS selects every record whose key appears more than once—all occurrences of duplicate keys. So NODUPS gives you one file with no duplicate keys; ALLDUPS gives you one file with only the duplicate records (every copy of keys that had duplicates).

Do I need to sort before using ICETOOL SELECT NODUPS or ALLDUPS?

Yes. SELECT with NODUPS/ALLDUPS determines duplicates by comparing adjacent records. The input must be sorted by the key you specify in the ON() clause. Otherwise, the same key might appear in different places and the tool may not correctly identify which records are unique and which are duplicates. Sort by the duplicate key first, then run ICETOOL SELECT.

Can I use SUM FIELDS=NONE to remove duplicates and still get the duplicates in another file in DFSORT?

SUM FIELDS=NONE alone keeps only the first of each key in SORTOUT and discards the rest; it does not write duplicates to a second file. To get duplicates into a separate file in DFSORT, use ICETOOL SELECT with ALLDUPS to send duplicate records to one DD and NODUPS (or a separate SELECT) to send unique records to another DD. XSUM (which does this in one step) is not available in DFSORT.

Removing Duplicates Into Separate Files

Sometimes you need to split your data into two outputs: one file that contains only unique records (one per key), and another that contains only the duplicate records (every extra copy of a key that appeared more than once). For example, you might send uniques to a master file and duplicates to an exception file for review. In Syncsort you can do this in one step with SUM FIELDS=NONE and XSUM, which writes the eliminated duplicates to a SORTXSUM dataset. DFSORT does not support XSUM. In DFSORT you use ICETOOL with the SELECT operator: one SELECT with NODUPS sends unique records to one file, and another SELECT with ALLDUPS sends all duplicate occurrences to another file. This page explains the goal, the Syncsort XSUM approach, and the DFSORT ICETOOL SELECT approach with NODUPS and ALLDUPS.

OUTFIL Advanced / ICETOOL

Progress0 of 0 lessons

What “Unique” and “Duplicate” Files Mean

Assume you have a key (e.g. employee ID in positions 1–10). A record is unique if no other record has the same key. A record is a duplicate if at least one other record has the same key. When we say “unique file” we mean a file that contains only records whose key appears exactly once in the input—so one record per key and no key repeated. When we say “duplicate file” we mean a file that contains every record whose key appears more than once—so all the “extra” copies. The first occurrence of a key can be sent to the unique file (or you might send only keys that never duplicate; that depends on the operator). NODUPS and ALLDUPS in ICETOOL implement these two sets: NODUPS = keys that appear only once; ALLDUPS = every record that is part of a duplicate group.

Syncsort XSUM (Not in DFSORT)

In Syncsort, SUM FIELDS=NONE removes duplicates and keeps the first occurrence of each key in SORTOUT. Adding XSUM writes all the records that were eliminated (the duplicates) to the dataset pointed to by the SORTXSUM DD. So in one step you get: SORTOUT = one record per key; SORTXSUM = all the duplicate records. DFSORT does not support XSUM. If you code XSUM in DFSORT you get message ICE172A (XSUM IS NOT SUPPORTED - USE ICETOOL SELECT IF APPROPRIATE). So on DFSORT you use ICETOOL SELECT to achieve the same result.

Syncsort example (for comparison)

text

1
2
  SORT FIELDS=(1,10,CH,A)
  SUM FIELDS=NONE,XSUM

SORTOUT receives the first of each key; SORTXSUM receives all eliminated duplicates. In DFSORT you omit XSUM and use ICETOOL SELECT instead.

ICETOOL SELECT: NODUPS and ALLDUPS

ICETOOL SELECT copies records from an input to an output based on criteria. The ON() clause defines the key (position, length, format). NODUPS means “select only records whose key appears exactly once in the input.” ALLDUPS means “select every record whose key appears more than once.” So one SELECT with NODUPS fills the “unique” file, and another SELECT with ALLDUPS fills the “duplicates” file. Both read the same input; the input must be sorted by the key so that duplicate detection works correctly.

SELECT options for duplicate handling
Option	Meaning	Output
NODUPS	Key appears exactly once	Only those records (unique keys)
ALLDUPS	Key appears more than once	Every occurrence of such keys (all duplicates)
FIRST	First of each key group	One record per key (like SUM FIELDS=NONE)

Input Must Be Sorted by the Key

NODUPS and ALLDUPS determine “same key” by comparing adjacent records (or by building a view of key counts, depending on implementation). For correct results, the input must be sorted by the same key you specify in ON(). If the input is not sorted, two records with the same key might be far apart and the tool might treat them as unique. So before running ICETOOL SELECT, run a SORT (or use a SORT control statement inside the same ICETOOL run if supported) so that all records with the same key are adjacent.

Two SELECT Operators for Two Files

To produce both a unique file and a duplicate file, use two SELECT operators in the same ICETOOL run. The first SELECT writes to the unique file with NODUPS; the second writes to the duplicate file with ALLDUPS. Each SELECT specifies FROM(input_dd) TO(output_dd) ON(position,length,format) and the appropriate option. The USING() clause can point to a control statement file if you need SORT or INCLUDE/OMIT; for a simple SELECT, you may only need ON() and the option.

Example: unique and duplicate files with ICETOOL

text

1
2
  SELECT FROM(IN) TO(UNIQUE) ON(1,10,CH) NODUPS
  SELECT FROM(IN) TO(DUPS)   ON(1,10,CH) ALLDUPS

IN is the input DD (sorted by positions 1–10). UNIQUE receives only records whose key (1–10) appears exactly once. DUPS receives every record whose key appears more than once. So you get the same logical split as XSUM: one file with “first occurrence only” style uniques, one file with all the duplicates. Exact ICETOOL syntax (e.g. whether ON takes position,length,format or a different form) is product-dependent; check your ICETOOL manual.

Using a SORT Control Statement with ICETOOL

If your input is not yet sorted, you can have ICETOOL run a SORT first using a control statement file. You define a CTL dd (e.g. CTL1) with SORT FIELDS=(1,10,CH,A) and then reference it in SELECT with USING(CTL1). The order of operations is typically: SORT (if USING is present), then SELECT. So the data is sorted by the key before NODUPS/ALLDUPS is applied. This way you can do “sort and split into unique/duplicate” in one job step.

Example: sort then SELECT (conceptual)

text

1
2
3
4
5
  //CTL1 DD *
  SORT FIELDS=(1,10,CH,A)
  /*
  SELECT FROM(IN) TO(UNIQUE) ON(1,10,CH) USING(CTL1) NODUPS
  SELECT FROM(IN) TO(DUPS)   ON(1,10,CH) USING(CTL1) ALLDUPS

The USING(CTL1) supplies the SORT fields so that IN is processed in sorted order by the key before the SELECT logic runs. Syntax for combining SORT and SELECT in one ICETOOL run varies by product; see your documentation.

FIRST: One Record per Key (No Separate Duplicate File)

If you only need one record per key (like SUM FIELDS=NONE) and do not need a separate file of duplicates, you can use SELECT with FIRST: it keeps the first occurrence of each key and discards the rest. That gives you a single “deduplicated” file. Use FIRST when you do not need to keep or inspect the duplicate records.

Comparing the Two Outputs

Unique file (NODUPS): Every key in this file appears exactly once in the input. Record count ≤ input count; if there were no duplicates, it equals the input count.
Duplicate file (ALLDUPS): Every key in this file appears at least twice in the input. So every record in this file has at least one “twin.” Record count = total input records minus the number of unique keys (if we count first occurrence as unique) or product-specific; check how your product counts.

So the unique file has no repeated keys; the duplicate file has only keys that were repeated. Together they partition the input (each input record goes to one or the other, depending on whether its key was unique or duplicated).

Explain It Like I'm Five

Imagine a pile of name tags. Some names appear only once (unique), and some names appear two or more times (duplicates). You want two boxes: one box for “names that appear only once,” and one box for “every copy of a name that had duplicates.” NODUPS is the rule for the first box: put in only the name tags for names that have no twin. ALLDUPS is the rule for the second box: put in every name tag that has at least one twin. So you end up with one box of “lonely” names and one box of “duplicate” names. In the computer we do the same thing: one file gets the unique records, the other gets all the duplicate records. XSUM in Syncsort does this in one step; in DFSORT we use ICETOOL SELECT with NODUPS and ALLDUPS.

Exercises

Your input is sorted by employee ID (1–8, CH). Write the two ICETOOL SELECT statements to send unique IDs to DD UNIQUE and all duplicate occurrences to DD DUPS.
Why must the input be sorted by the key before using NODUPS or ALLDUPS?
If you get ICE172A when using SUM with XSUM, what does it mean and what should you use instead?
What is the difference between FIRST and NODUPS in SELECT? When would you use each?

Quiz

Test Your Knowledge

1. What does NODUPS do in ICETOOL SELECT?

Selects all duplicate records
Selects only records that have no duplicate—each key appears exactly once in the output
Removes the first record
Sorts by key only

2. What does ALLDUPS do in ICETOOL SELECT?

Selects only the first of each duplicate group
Selects every record that has at least one other record with the same key—all occurrences of duplicate keys go to the output
Removes all duplicates
Selects only the last duplicate

3. Does DFSORT support XSUM?

Yes, same as Syncsort
No—XSUM is a Syncsort feature; DFSORT gives ICE172A and you use ICETOOL SELECT with NODUPS/ALLDUPS instead
Only in ICETOOL
Yes, but only with SUM FIELDS=NONE

4. For ICETOOL SELECT with ON(1,10,CH), what must be true about the input?

Input can be unsorted
Input should be sorted by the same key (positions 1–10) so that duplicate keys are adjacent; otherwise duplicate detection may be wrong
Only one ON is allowed
ON is optional

5. How do you get both a "unique only" file and a "duplicates only" file in one DFSORT/ICETOOL run?

Use XSUM
Use two SELECT operators: one FROM(IN) TO(UNIQUE) ON(key) NODUPS and one FROM(IN) TO(DUPS) ON(key) ALLDUPS, with IN sorted by key
Use one SELECT with two TO()
Use OUTFIL SPLIT only

Removing Duplicates Into Separate Files

What “Unique” and “Duplicate” Files Mean

Syncsort XSUM (Not in DFSORT)

Syncsort example (for comparison)

ICETOOL SELECT: NODUPS and ALLDUPS

Input Must Be Sorted by the Key

Two SELECT Operators for Two Files

Example: unique and duplicate files with ICETOOL

Using a SORT Control Statement with ICETOOL

Example: sort then SELECT (conceptual)

FIRST: One Record per Key (No Separate Duplicate File)

Comparing the Two Outputs

Explain It Like I'm Five

Exercises

Quiz

Test Your Knowledge

Related Concepts

SUM statement

Removing duplicates (SUM)

SPLIT datasets

Related Pages