MainframeMaster

Debugging Techniques

When a DFSORT step fails or produces wrong results, a systematic approach saves time. This page covers how to use SYSOUT messages (including ICE000I), message severity, the DEBUG statement, and checks on control statements and data to find and fix the cause. The goal is to identify whether the failure is due to wrong control statements, bad data, resource limits, or environment (DDs, DCB) and then correct it.

Diagnostics & Debugging
Progress0 of 0 lessons

A Systematic Debugging Approach

Rather than guessing, follow a consistent order. First, confirm what the step actually ran (ICE000I). Second, find the message that describes the failure (first A or E). Third, verify your control statements against the data and layout. Fourth, use DEBUG or other diagnostics if you need more detail. Fifth, check the environment: DDs, dataset attributes, and input data quality.

Suggested debugging order
StepActionPurpose
1Check ICE000IConfirm control statements read from SYSIN; compare to what you intended.
2Find first A/E messageIdentifies the actual failure (syntax, field, capacity, data exception).
3Verify positions and formatsMatch SORT/SUM/INREC/OUTREC to record layout; avoid S0C7/S0C4.
4Use DEBUG when neededGet trace/diagnostic output; remove after problem is fixed.
5Check DDs and dataEnsure SORTIN/SORTOUT/SORTWKnn exist; validate input data and LRECL/RECFM.

Using ICE000I to Verify Control Statements

ICE000I is the first message DFSORT prints. It includes the product and version and, importantly, the control statements that were read from SYSIN. If the wrong PROC is invoked or the wrong dataset is concatenated to SYSIN, the echoed statements will not match what you intended. If a continuation line is in the wrong column (e.g. not in 72 or beyond), the parser may read something different. So the first debugging step is to open the step SYSOUT, find ICE000I, and compare the echoed lines to the SYSIN you meant to use. Mismatches here explain many "wrong result" or syntax-error cases.

Finding the Real Failure Message

DFSORT messages have a severity suffix: I (informational), W (warning), A (application error), E (severe). ICE000I is I—it is not an error. When the step fails, look for the first message with suffix A or E in that step's output. That message usually states the cause: invalid keyword, field out of range, text in wrong column, sort capacity exceeded, or data exception. Read the message text and the associated documentation (IBM Messages and Codes or your site's guide). Do not stop at ICE000I; the failure is reported later.

Checking Positions, Lengths, and Formats

Many abends and wrong results come from specifying the wrong position, length, or format for a field. DFSORT uses 1-based column positions. A field at columns 21–28 with length 8 must be specified as start=21, length=8. The format must match the data: CH for character, PD for packed decimal, ZD for zoned decimal, BI for binary, and so on. If you specify PD for a field that contains character data (or vice versa), you can get S0C7. If the length or start is off, you may be reading the wrong bytes and get wrong sort order or wrong values in SUM or INREC/OUTREC. Keep a record layout handy and verify every SORT FIELDS=, SUM FIELDS=, INREC, OUTREC, and OUTFIL BUILD= against it.

Common format mistakes

  • Using PD (packed decimal) for a field that is character or zoned. Use CH or ZD as appropriate.
  • Using CH for a numeric field that is packed or zoned. Use PD or ZD so the sort or arithmetic is correct.
  • Wrong length: e.g. length 6 when the field is 8 bytes. That can cause wrong keys or overlapping fields.
  • Wrong start position: off-by-one or aligned to the wrong column. Double-check with the layout.

Using the DEBUG Statement

When the normal messages are not enough, add the DEBUG control statement to SYSIN. DEBUG tells DFSORT to produce additional diagnostic or trace output—for example, how control statements were parsed, record counts at various stages, or sample record content. The exact options depend on your DFSORT product and version; see the DFSORT Messages and Diagnosis or Application Programming guide. Run the job and look at SYSOUT (or the DD used for messages if you use OPTION MSGDDN=) for the extra lines. Use DEBUG only while diagnosing; remove it once the problem is fixed to avoid extra output and overhead in production.

Diagnosing Common Abends

S0C4 (protection exception) can occur with empty input, wrong LRECL or RECFM, or internal limits. Check that SORTIN has the correct DCB and is not corrupt; ensure the file is not empty if the step logic assumes records. S0C7 (data exception) usually means a format or position error in SORT FIELDS, SUM FIELDS, INREC, or OUTREC; verify all specs and the actual data. S322 means the step exceeded the CPU time limit; increase TIME or optimize the step (more sortwork, filtering, or tuning). S013 indicates a data management problem (open failure, DD missing, or DCB mismatch); check all required DDs and dataset attributes. ICE046A means sort capacity exceeded—add or enlarge sortwork datasets or increase FILSZ with DYNALLOC. For each abend, use the message or code with your site's abend documentation and the IBM guides to confirm the cause and remedy.

Checking the Environment

Ensure all required DD statements are present: SORTIN, SORTOUT, SYSIN, and at least one SORTWKnn (or DYNALLOC). If you use JOINKEYS, the JOINKEYS file DDs must be correct. If you use OUTFIL with multiple outputs, each output DD must be defined. Verify that datasets exist, are not in use by another job, and have the correct DISP. Check LRECL and RECFM: they must match what your control statements assume (e.g. fixed 80-byte records). If the input is variable-length, RECFM should reflect that and your positions must account for the RDW if you reference it. A quick way to rule out data issues is to run with OPTION COPY and perhaps STOPAFT=10 to see if a small subset of records flows through without error.

Where to Look: SYSOUT and Message Flow

DFSORT writes its messages to SYSOUT by default—the same logical output stream as the step's JCL SYSOUT. If you use OPTION MSGDDN=ddname, messages go to that DD instead. When you debug, open the dataset that contains the step's messages (often the same as the step listing). Messages usually appear in order: first ICE000I with the control statement echo, then any parsing or setup messages, then processing messages (e.g. record counts), and finally completion or error messages. If the step abends, the last messages before the abend dump are especially important. Learning to scan SYSOUT quickly for ICE000I and then for the first A or E message speeds up diagnosis; the SYSOUT analysis tutorial goes into more detail on how to read and interpret the full output.

Documenting Your Record Layout

A major source of errors is a mismatch between the physical record and what you coded in control statements. Keep a simple record layout (on paper or in a doc): for each field you use in SORT FIELDS, SUM FIELDS, INREC, OUTREC, or OUTFIL, note the starting column (1-based), length, and format (CH, PD, ZD, BI, etc.). When you get a data exception or wrong results, go through this list and verify each value. If your shop uses copybooks or record descriptions, the column numbers may be 0-based in the source; DFSORT uses 1-based positions, so convert carefully. A single wrong digit in a position or length can cause subtle or dramatic failures.

Explain It Like I'm Five

When your sort "game" doesn't work, first look at what instructions the sort program says it got (that's ICE000I). If those are wrong, fix the instructions. Then find the first message that says "something went wrong" (the A or E message)—that tells you what broke. Then check that every number you gave (where a field starts, how long it is, and whether it's a number or letters) matches the real data. If you still don't know, turn on DEBUG so the program prints extra clues. Once you fix the problem, turn DEBUG off so you don't get too much paper.

Exercises

  1. Your step fails with ICE105A "INVALID KEYWORD." Where do you look first, and what does ICE000I tell you?
  2. You get S0C7. List the control statement types (e.g. SORT FIELDS, SUM FIELDS, INREC) you must verify against the record layout, and why each can cause S0C7.
  3. Add DEBUG to a simple sort step, run it, and locate the extra diagnostic lines in SYSOUT. What information do they add beyond ICE000I?
  4. A step completes with return code 0 but the record count in SORTOUT is lower than expected. What debugging steps would you take (without changing the logic yet) to understand why?

Quiz

Test Your Knowledge

1. What is the first thing you should check in SYSOUT when a DFSORT step fails?

  • The SORTOUT record count
  • ICE000I to confirm which control statements were read, then the first A-level or error message to find the actual failure cause
  • Only the return code
  • The SORTIN DD name

2. Your step abends with S0C7. What debugging approach is most useful?

  • Add more SORTWKnn
  • Verify every position, length, and format in SORT FIELDS, SUM FIELDS, INREC, OUTREC against the record layout; ensure PD/ZD/CH match the data. Check for bad input bytes in numeric fields
  • Use OPTION COPY only
  • Ignore and rerun

3. When should you use the DEBUG statement?

  • In every production run
  • When diagnosing why a step is failing or producing wrong results; remove it once the problem is fixed to avoid extra output and overhead
  • Only for MERGE
  • Only when ICE000I is missing

4. What does message severity (I, W, A, E) tell you?

  • Only the message number
  • I = informational, W = warning, A = application error (often the step fails), E = severe. The first A or E in the step usually indicates why the step failed
  • The DFSORT version
  • The record count

5. How can you confirm that the correct SYSIN was used by the step?

  • Check the JCL only
  • Look at the control statements echoed in ICE000I in SYSOUT; they show exactly what DFSORT read. Compare to the PROC or in-stream data you intended
  • Use DEBUG only
  • Check SORTOUT content only