MainframeMaster

Legacy System Data Transformation

When you migrate from an older or legacy system to a new one, the data often does not match the new system's expected layout and formats. Record lengths and field positions differ; dates may be 2-digit year or in a different format; codes and status values may use different values. DFSORT can transform legacy data into the format required by the new system: reordering and rebuilding records with INREC and OUTREC, replacing codes with FINDREP or with a lookup table (JOINKEYS), converting dates with built-in date conversion and century windowing, normalizing case and padding, and converting fixed-length to variable-length output when the target expects VB. This page explains how to use DFSORT for legacy system data transformation: record layout changes, code mapping, date conversion, and format normalization so that the output is ready for the new system.

Real World Use Cases
Progress0 of 0 lessons

Why Transform Legacy Data?

Legacy systems were built with different record layouts, date conventions, and code sets. The new system expects specific column positions, 4-digit years, and standard codes. If you simply copy the old file to the new, loads may fail or data may be misinterpreted. Transformation converts the old layout and values into the new: same information, different format. DFSORT can do much of this in one or a few steps without writing a custom program.

Transformation techniques with DFSORT
TechniqueDFSORT toolPurpose
Record layoutINREC/OUTREC BUILDReorder fields, change positions, add constants for target layout
Code replacementFINDREP or JOINKEYS lookupMap legacy codes to new system codes
Date conversionINREC/OUTREC date conversionYYMMDD to YYYYMMDD, format normalization
Case/paddingTranslation, UPPERCASE, edit masksNormalize text and numeric format
FB to VBOUTFIL FTOVProduce variable-length output for target system

Record Layout: INREC and OUTREC

The target system expects fields at certain positions and lengths. Use BUILD= in INREC or OUTREC to define the new record. You specify each segment as position, length, format (e.g. 1,20,CH for the first 20 bytes; 21,8,PD for a packed field). You can take fields from any position in the input and place them in the output at new positions. Add constants (e.g. C\' \' for blanks, 1,8,\'DEFAULT\') and use edit masks for numeric fields. INREC runs before the sort and INCLUDE/OMIT, so use INREC when the new layout or converted values must be used for sorting or filtering. Use OUTREC when only the final output needs the new layout.

Code Replacement: FINDREP and Lookup Tables

Legacy systems often used different code values than the new system. For a simple one-to-one replacement (e.g. old "X" to new "ACTIVE"), use FINDREP in INREC or OUTREC: FINDREP=(IN=C\'X\',OUT=C\'ACTIVE\'). You can list multiple IN/OUT pairs. For a many-to-one or table-driven mapping, use a lookup file. The lookup file has two columns: legacy code and new code. Use JOINKEYS: F1 is your data (key = legacy code field), F2 is the lookup (key = legacy code). REFORMAT to output the new code from F2 and the rest of the record from F1. Every record gets the translated code from the lookup. If a legacy code has no match, JOINKEYS UNPAIRED handling determines whether the record is dropped or passed through with a default.

Date Conversion

Legacy data often has 2-digit years (YYMMDD or MMDDYY) or other formats. The new system usually expects YYYYMMDD or a standard format. Use DFSORT date conversion in INREC or OUTREC. You specify the input format (e.g. position and format of the legacy date) and the output format (e.g. YYYYMMDD at a new position). Use century windowing so that 2-digit years are interpreted correctly (e.g. 50–99 as 1950–1999, 00–49 as 2000–2049). The date conversion and century windowing tutorials describe the exact control statement syntax. After conversion, all date fields are in one format and sort/compare correctly.

Normalizing Case and Padding

Legacy data may have mixed case or inconsistent padding (e.g. "ABC" vs "abc", or "7" vs "007"). The new system may expect uppercase and leading zeros. Use translation (TR=) or UPPERCASE in INREC/OUTREC to normalize case. Use edit masks or BUILD with fixed-length output to normalize numeric padding. The data normalization tutorial covers these in detail. Normalize in INREC if the sort key or INCLUDE/OMIT uses these fields; otherwise OUTREC is enough.

Fixed to Variable (FTOV)

If the legacy system produced fixed-length records (RECFM=FB) and the target expects variable-length (RECFM=VB), use OUTFIL with FTOV. Build the data portion with BUILD= so that you only include the needed bytes (e.g. trim trailing spaces). DFSORT writes each record with an RDW followed by the data; the output DD must be allocated with RECFM=VB. So the transformation step can both change layout and convert FB to VB. See the fixed-to-variable conversion tutorial for syntax.

Example: layout and code transformation

text
1
2
3
4
5
6
7
8
9
INREC BUILD=(1,10, /* key - move to 1-10 */ 11,20, /* name - move to 11-30 */ 31,2, /* legacy status */ C' ', /* filler for new code at 33-34 */ 35,8, /* legacy date YYMMDD */ ...) OUTREC FINDREP=(IN=(33,2,C'01'),OUT=(33,2,C'A1'), IN=(33,2,C'02'),OUT=(33,2,C'A2')), BUILD=(1,34,35,8,DATE1=(Y2Y,CE1),...) /* expand date */

This is illustrative. INREC builds a record with key, name, legacy status, filler, and legacy date. OUTREC applies FINDREP to replace legacy status codes 01/02 with new codes A1/A2, then BUILD includes a date conversion (DATE1= with century) so the 8-byte legacy date becomes a 4-digit year format. Exact syntax (DATE1=, CE1, etc.) is product-dependent; see your manual.

Multi-Step Transformation

When one step is too complex, split the work. Step 1: normalize and convert dates, write to &&WORK. Step 2: join with lookup for code translation, write to &&WORK2. Step 3: final BUILD and FTOV if needed, write to target. Each step is easier to test and maintain. Use temporary datasets between steps as in batch processing automation.

Best Practices

  • Document the legacy layout and the target layout (field list, positions, formats) so the BUILD specification is clear and auditable.
  • Use a lookup table for code mapping when there are many codes or when the mapping may change; keep the table in a dataset and join in DFSORT.
  • Validate date conversion with known dates (e.g. 991231 → 19991231) and document the century window.
  • Test with a small extract first; compare record counts and spot-check key fields before full migration.
  • Consider writing a reject file (OUTFIL with OMIT for records that fail lookup or conversion) so you can fix and reprocess.

Explain It Like I'm Five

Imagine the old system wrote labels in a different language and different order (name first, then number). The new system wants the number first and the name in capital letters. DFSORT is like a helper that rewrites each label: it moves the number to the front, writes the name in capitals, and replaces old words with new words using a dictionary (lookup table). At the end, every label looks the way the new system expects.

Exercises

  1. Legacy record: positions 1–10 key, 11–30 name, 31–32 status code. Target wants: 1–10 key, 11–40 name (padded), 41–42 new status. Status "01" → "A1", "02" → "A2". Write INREC BUILD and OUTREC FINDREP (conceptually).
  2. Why use JOINKEYS instead of FINDREP for code translation when there are 100 different legacy codes?
  3. Legacy date at 41,6 is YYMMDD. Target wants YYYYMMDD at 41,8. What DFSORT feature do you use, and in INREC or OUTREC?
  4. When would you use INREC for transformation instead of OUTREC?

Quiz

Test Your Knowledge

1. What is the main use of INREC and OUTREC in legacy data transformation?

  • Only sorting
  • To change record layout and content: reorder fields, convert dates and numeric formats, insert constants, translate codes, and build records that match the target system layout
  • Only INCLUDE
  • Only for VB

2. When would you use FINDREP in a legacy transformation?

  • Only for dates
  • When the legacy system used specific codes or strings that must be replaced with the new system codes or standard values (e.g. old status "A" to new "ACTIVE"); FINDREP does literal replacement
  • Only in SUM
  • Only for dedup

3. How can you map legacy codes to new codes when the mapping is not a simple find-replace?

  • Only FINDREP
  • Use a lookup table: one file has legacy code and new code; use JOINKEYS to join the legacy data with the lookup on the legacy code and REFORMAT to output the new code (and the rest of the record) so each record gets the translated value
  • Only INCLUDE
  • DFSORT cannot do lookups

4. Why normalize dates when moving data from a legacy system?

  • Only for display
  • Legacy systems often used 2-digit years (YYMMDD) or other formats; the new system may require 4-digit years (YYYYMMDD) or a specific format. Converting in INREC/OUTREC ensures correct sort order and avoids century errors
  • Dates cannot be converted
  • Only in OUTFIL

5. What is the role of fixed-to-variable (FTOV) in legacy transformation?

  • Only for sorting
  • Legacy systems often produced fixed-length (FB) files; the new system may expect variable-length (VB). Use OUTFIL with FTOV (and BUILD to trim or shorten records) to produce VB output from FB input
  • Only for INREC
  • FTOV is only for reports