MainframeMaster

COBOL DUPLICATES Clause

Master duplicate record handling in COBOL SORT and MERGE operations for reliable data processing and maintaining data integrity in mainframe applications.

Overview

The DUPLICATES clause in COBOL controls how duplicate records are handled during SORT and MERGE operations. It ensures predictable behavior when processing records that have identical sort key values, which is crucial for maintaining data integrity and producing consistent results in business applications.

Understanding duplicate handling is essential for data processing applications where the order of records with identical keys matters for business logic, audit trails, or regulatory compliance. The DUPLICATES clause provides control over whether the relative order of duplicate records is preserved during sorting operations.

This functionality is particularly important in mainframe environments where large volumes of data are processed and the stability of sort operations can affect downstream processing, reporting accuracy, and business decision-making processes.

Basic DUPLICATES Syntax

WITH DUPLICATES Clause

The WITH DUPLICATES clause ensures stable sorting behavior:

cobol
1
2
3
4
5
SORT SORT-FILE ON ASCENDING KEY CUSTOMER-ID WITH DUPLICATES IN ORDER USING INPUT-FILE GIVING OUTPUT-FILE.

This maintains the original relative order of records with identical CUSTOMER-ID values.

Standard Sort Without DUPLICATES

Without the DUPLICATES clause, order of duplicates is not guaranteed:

cobol
1
2
3
4
SORT SORT-FILE ON ASCENDING KEY CUSTOMER-ID USING INPUT-FILE GIVING OUTPUT-FILE.

This may change the relative order of records with identical keys for optimization purposes.

Complete Sort Examples

Customer Transaction Processing

Processing customer transactions while preserving chronological order:

cobol
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
ENVIRONMENT DIVISION. INPUT-OUTPUT SECTION. FILE-CONTROL. SELECT TRANSACTION-FILE ASSIGN TO "TRANS.DAT". SELECT SORTED-TRANS-FILE ASSIGN TO "SORTED.DAT". SELECT SORT-WORK-FILE ASSIGN TO "SORTWORK". DATA DIVISION. FILE SECTION. FD TRANSACTION-FILE. 01 TRANSACTION-RECORD. 05 TRANS-CUST-ID PIC X(5). 05 TRANS-DATE PIC 9(8). 05 TRANS-TIME PIC 9(6). 05 TRANS-AMOUNT PIC 9(7)V99. 05 TRANS-TYPE PIC X(1). SD SORT-WORK-FILE. 01 SORT-RECORD. 05 SORT-CUST-ID PIC X(5). 05 SORT-DATE PIC 9(8). 05 SORT-TIME PIC 9(6). 05 SORT-AMOUNT PIC 9(7)V99. 05 SORT-TYPE PIC X(1). FD SORTED-TRANS-FILE. 01 SORTED-RECORD PIC X(25). PROCEDURE DIVISION. MAIN-SORT. SORT SORT-WORK-FILE ON ASCENDING KEY SORT-CUST-ID ON ASCENDING KEY SORT-DATE WITH DUPLICATES IN ORDER USING TRANSACTION-FILE GIVING SORTED-TRANS-FILE.

This preserves the chronological order of transactions for customers with multiple transactions on the same date.

Employee Payroll Processing

Sorting payroll records while maintaining entry sequence:

cobol
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
SORT PAYROLL-SORT-FILE ON ASCENDING KEY EMP-DEPARTMENT ON ASCENDING KEY EMP-GRADE ON ASCENDING KEY EMP-ID WITH DUPLICATES IN ORDER INPUT PROCEDURE IS PAYROLL-INPUT-PROC OUTPUT PROCEDURE IS PAYROLL-OUTPUT-PROC. PAYROLL-INPUT-PROC. OPEN INPUT PAYROLL-INPUT-FILE PERFORM UNTIL END-OF-PAYROLL-INPUT READ PAYROLL-INPUT-FILE AT END MOVE "Y" TO EOF-FLAG NOT AT END PERFORM VALIDATE-PAYROLL-RECORD IF VALID-RECORD MOVE PAYROLL-RECORD TO SORT-RECORD RELEASE SORT-RECORD END-IF END-READ END-PERFORM CLOSE PAYROLL-INPUT-FILE.

This ensures employees with identical department, grade, and ID maintain their original processing sequence.

MERGE Operations with DUPLICATES

Multi-File Customer Data Merge

Merging customer data from multiple sources:

cobol
1
2
3
4
5
6
7
8
MERGE CUSTOMER-MERGE-FILE ON ASCENDING KEY CUST-ID ON ASCENDING KEY CUST-REGION WITH DUPLICATES IN ORDER USING EAST-CUSTOMERS WEST-CUSTOMERS CENTRAL-CUSTOMERS GIVING MERGED-CUSTOMERS.

This preserves the source file order when customers appear in multiple regional files.

Sales Data Consolidation

Merging sales data while maintaining temporal relationships:

cobol
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
MERGE SALES-MERGE-FILE ON ASCENDING KEY SALE-DATE ON ASCENDING KEY SALE-REGION ON ASCENDING KEY SALE-REP-ID WITH DUPLICATES IN ORDER INPUT PROCEDURE IS MERGE-INPUT-PROC OUTPUT PROCEDURE IS CONSOLIDATE-SALES. MERGE-INPUT-PROC. PERFORM PROCESS-MORNING-SALES PERFORM PROCESS-AFTERNOON-SALES PERFORM PROCESS-EVENING-SALES. CONSOLIDATE-SALES. PERFORM UNTIL NO-MORE-SALES-DATA RETURN SALES-MERGE-FILE AT END MOVE "Y" TO EOF-MERGE NOT AT END PERFORM CALCULATE-COMMISSION PERFORM UPDATE-SALES-TOTALS WRITE CONSOLIDATED-RECORD END-RETURN END-PERFORM.

This maintains the chronological sequence of sales within each date/region/rep combination.

Duplicate Detection and Removal

Identifying Duplicate Records

Using OUTPUT PROCEDURE to detect and handle duplicates:

cobol
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
WORKING-STORAGE SECTION. 01 WS-PREVIOUS-RECORD. 05 WS-PREV-CUST-ID PIC X(5). 05 WS-PREV-CUST-NAME PIC X(30). 01 WS-DUPLICATE-COUNT PIC 9(5) VALUE ZERO. 01 WS-FIRST-RECORD-FLAG PIC X(1) VALUE "Y". PROCEDURE DIVISION. SORT-WITH-DUPLICATE-CHECK. SORT CUSTOMER-SORT-FILE ON ASCENDING KEY CUST-ID WITH DUPLICATES IN ORDER INPUT PROCEDURE IS READ-CUSTOMERS OUTPUT PROCEDURE IS CHECK-DUPLICATES. CHECK-DUPLICATES. PERFORM UNTIL NO-MORE-SORTED-DATA RETURN CUSTOMER-SORT-FILE AT END MOVE "Y" TO EOF-SORT NOT AT END IF WS-FIRST-RECORD-FLAG = "Y" MOVE "N" TO WS-FIRST-RECORD-FLAG PERFORM SAVE-CURRENT-RECORD ELSE PERFORM COMPARE-WITH-PREVIOUS END-IF WRITE OUTPUT-RECORD FROM SORT-RECORD END-RETURN END-PERFORM. COMPARE-WITH-PREVIOUS. IF CUST-ID = WS-PREV-CUST-ID ADD 1 TO WS-DUPLICATE-COUNT DISPLAY "Duplicate found: " CUST-ID " - " CUST-NAME END-IF PERFORM SAVE-CURRENT-RECORD. SAVE-CURRENT-RECORD. MOVE CUST-ID TO WS-PREV-CUST-ID MOVE CUST-NAME TO WS-PREV-CUST-NAME.

This approach identifies duplicates while preserving all records in the output.

Removing Duplicate Records

Creating a unique record set by eliminating duplicates:

cobol
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
REMOVE-DUPLICATES. PERFORM UNTIL NO-MORE-SORTED-DATA RETURN PRODUCT-SORT-FILE AT END MOVE "Y" TO EOF-SORT NOT AT END IF WS-FIRST-RECORD-FLAG = "Y" MOVE "N" TO WS-FIRST-RECORD-FLAG PERFORM WRITE-UNIQUE-RECORD PERFORM SAVE-CURRENT-KEY ELSE IF PRODUCT-CODE NOT = WS-PREV-PRODUCT-CODE PERFORM WRITE-UNIQUE-RECORD PERFORM SAVE-CURRENT-KEY ELSE ADD 1 TO WS-DUPLICATES-REMOVED DISPLAY "Removing duplicate: " PRODUCT-CODE END-IF END-IF END-RETURN END-PERFORM. WRITE-UNIQUE-RECORD. WRITE UNIQUE-PRODUCT-RECORD FROM SORT-RECORD ADD 1 TO WS-UNIQUE-COUNT. SAVE-CURRENT-KEY. MOVE PRODUCT-CODE TO WS-PREV-PRODUCT-CODE.

This creates a file containing only the first occurrence of each unique product code.

Performance Considerations

Memory Usage Optimization

Optimizing sort performance with large datasets:

cobol
1
2
3
4
5
6
7
8
9
10
11
ENVIRONMENT DIVISION. CONFIGURATION SECTION. SPECIAL-NAMES. SORT-CORE-SIZE IS 64000. INPUT-OUTPUT SECTION. FILE-CONTROL. SELECT SORT-WORK-FILE ASSIGN TO "SORTWORK" ORGANIZATION IS SORT ACCESS MODE IS SEQUENTIAL.

Configuring adequate sort memory helps maintain performance when preserving duplicate order.

Efficient Duplicate Processing

Minimizing overhead in duplicate handling:

cobol
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
WORKING-STORAGE SECTION. 01 WS-SORT-STATISTICS. 05 WS-RECORDS-READ PIC 9(8) VALUE ZERO. 05 WS-RECORDS-WRITTEN PIC 9(8) VALUE ZERO. 05 WS-DUPLICATE-SETS PIC 9(6) VALUE ZERO. EFFICIENT-DUPLICATE-PROC. PERFORM UNTIL EOF-SORT = "Y" RETURN SORT-FILE AT END MOVE "Y" TO EOF-SORT NOT AT END ADD 1 TO WS-RECORDS-READ IF CURRENT-KEY = PREVIOUS-KEY * Process duplicate efficiently PERFORM HANDLE-DUPLICATE-INLINE ELSE * New key group PERFORM START-NEW-KEY-GROUP END-IF END-RETURN END-PERFORM.

This approach minimizes comparisons and processing overhead for duplicate detection.

Hands-on Exercise

Exercise: Customer Order Processing

Create a program that sorts customer orders while preserving the chronological order of orders placed on the same day.

Requirements:

  • Sort by customer ID and order date
  • Preserve order sequence for same-day orders
  • Count and report duplicate order scenarios
  • Generate summary statistics
View Solution
cobol
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
SORT ORDER-SORT-FILE ON ASCENDING KEY SORT-CUST-ID ON ASCENDING KEY SORT-ORDER-DATE WITH DUPLICATES IN ORDER INPUT PROCEDURE IS READ-ORDERS OUTPUT PROCEDURE IS PROCESS-SORTED-ORDERS. READ-ORDERS. OPEN INPUT ORDER-INPUT-FILE PERFORM UNTIL EOF-INPUT = "Y" READ ORDER-INPUT-FILE AT END MOVE "Y" TO EOF-INPUT NOT AT END MOVE ORDER-RECORD TO SORT-RECORD RELEASE SORT-RECORD END-READ END-PERFORM CLOSE ORDER-INPUT-FILE. PROCESS-SORTED-ORDERS. OPEN OUTPUT SORTED-ORDER-FILE MOVE "N" TO WS-FIRST-RECORD PERFORM UNTIL EOF-SORT = "Y" RETURN ORDER-SORT-FILE AT END MOVE "Y" TO EOF-SORT NOT AT END PERFORM CHECK-SAME-DAY-ORDERS WRITE SORTED-ORDER-RECORD FROM SORT-RECORD END-RETURN END-PERFORM CLOSE SORTED-ORDER-FILE.

Quiz

Test Your Knowledge

1. What does WITH DUPLICATES IN ORDER guarantee?

2. Can the DUPLICATES clause be used with MERGE operations?

3. How are records considered duplicates in multi-key sorts?

View Answers

1. Preserves relative order of records with identical keys - The DUPLICATES clause maintains stable sorting behavior.

2. Yes, it works the same way - DUPLICATES can be used with both SORT and MERGE operations.

3. When ALL specified keys have identical values - Records are duplicates only when all sort keys match exactly.

Frequently Asked Questions