VSAM Index Component

For Key Sequenced Data Sets (KSDS), VSAM keeps a separate structure that maps key values to the locations of records: the index component. This component has two main parts—the sequence set and the index set—and together they form a B-tree style hierarchy that makes key-based random access efficient. Only KSDS has an index component; Entry Sequenced (ESDS), Relative Record (RRDS), and Linear (LDS) datasets have only a data component. This page explains what the index component is, how the sequence set and index set work, and how VSAM uses them to find records by key.

What Is the Index Component?

The index component is the part of a KSDS cluster that holds the index records. It has its own name in the catalog (often the cluster name with a suffix such as .INDEX), its own space on DASD, and its own control intervals. The job of the index component is to answer: "Given a key value, which control interval in the data component contains the record with that key?" Without the index, VSAM would have to scan the data component sequentially to find a key. With the index, VSAM can traverse a small number of index levels (typically two or three) to get a pointer to the right data CI, then read that CI and return the record. So the index component is what makes random access by key fast.

You never open the index component directly. You open the cluster. When your program does a READ with a key (or a START followed by READ NEXT), the access method uses the index component internally to find the correct data CI. The index component is created when you define the cluster with DEFINE CLUSTER INDEXED; IDCAMS allocates space for both the data and the index. The index is updated automatically when you insert, delete, or update records—VSAM maintains the sequence set and index set so that they always point to the correct data CIs.

Sequence Set and Index Set

The index component is organized in two kinds of levels: the sequence set and the index set. The sequence set is the lowest level. It has one entry per control interval in the data component (or per range of keys that map to one CI). Each entry in the sequence set contains the highest key value in that data CI and a pointer to that CI. So if you have 1000 data CIs, you have 1000 sequence set entries. When VSAM needs to find a record by key, it looks at the sequence set to find the entry whose key range includes the search key, and that entry gives the pointer to the data CI. Reading that one data CI then gives you the record (or tells you it is not present).

If the sequence set is large, scanning it would be slow. So VSAM builds one or more levels above the sequence set: the index set. The index set is a tree of index records. Each index record contains separator keys and pointers to the next level down (either another index set level or the sequence set). To find a key, VSAM starts at the top of the index set, compares the search key to the separators, follows the correct pointer down, and repeats until it reaches the sequence set. The sequence set then points to the data CI. So the index set reduces the number of index records you have to look at; the sequence set gives the final pointer to the data. The number of index set levels depends on how many data CIs (and thus sequence set entries) you have. A small KSDS might have only a sequence set (one level); a large one might have two or three index set levels.

Index component structure (high level)
LevelRole
Index set (top)One or more levels; each entry has a separator key and pointer to next level or sequence set
Sequence set (bottom)One entry per data CI (or key range); high key + pointer to that data CI
Data componentActual records in key order in CIs; index points here

B-tree Style Structure

The index component is often described as a B-tree or B+-tree style structure. The idea is the same: a balanced tree where the leaves (the sequence set) point to the actual data, and the internal nodes (the index set) direct the search. Keys in the index are used as separators—you compare your search key to them to decide which branch to take. This keeps the number of I/Os needed to find a record low: typically one or two index reads plus one data CI read. So for a random read by key, you might do three or four I/Os total, regardless of how many records are in the file. That is the main benefit of the index component.

How Key-Based Access Works

When your program issues a READ with a key (or START key followed by READ NEXT), the access method does the following. First it looks up the key in the index. It starts at the top index set record (often cached), compares the key to the separators, and follows the pointer to the next level. It repeats until it reaches the sequence set. The sequence set entry for the key gives the RBA (relative byte address) or pointer of the data CI that should contain that key. The access method then checks whether that CI is already in a buffer; if not, it reads the CI from the data component. Once the CI is in memory, it searches within the CI for the record with the matching key (or for the insertion point if doing an insert). So the index component is used only to get to the right CI; the actual record is in the data component.

Index Component and DEFINE CLUSTER

When you define a KSDS with DEFINE CLUSTER, you specify INDEXED (or the equivalent for your IDCAMS). That tells IDCAMS to create both a data component and an index component. You can name the index component with the INDEX clause, e.g. INDEX(NAME(USERID.FILE.KSDS.INDEX)). Space for the index is allocated automatically; you do not usually specify a separate size for the index. The index starts small (one or a few CIs for the sequence set) and grows as you add data and the data component gains more CIs. When a data CI is split (because it is full), the sequence set is updated; when the sequence set or index set needs more room, VSAM splits index CIs and may add another index set level. All of this is handled by the access method; you do not define the index structure yourself.

jcl
1
2
3
4
5
6
7
8
9
10
11
12
13
14
//DEFKSDS EXEC PGM=IDCAMS //SYSPRINT DD SYSOUT=* //SYSIN DD * DEFINE CLUSTER ( - NAME(USERID.KEYED.FILE) - INDEXED - RECORDSIZE(80 120) - KEYS(8 0) - FREESPACE(15 10) - CYLINDERS(20 10)) - DATA (NAME(USERID.KEYED.FILE.DATA)) - INDEX (NAME(USERID.KEYED.FILE.INDEX)) /*

INDEXED means KSDS, so both DATA and INDEX components are created. The index component name is USERID.KEYED.FILE.INDEX. You never reference this name in JCL or in programs; you use the cluster name USERID.KEYED.FILE. The access method uses the index when you do key-based reads or writes.

Why ESDS, RRDS, and LDS Have No Index

ESDS has no key; records are in entry order and you access them by relative byte address (RBA). So there is nothing to index—the "key" is the position. RRDS has relative record numbers (RRN); you access by slot number, so again no key-based index is needed. LDS is a byte stream with no record structure. So only KSDS, where records are ordered by a key and you need to find a record by that key, has an index component. If you need key-like access to an ESDS, you would use an alternate index (AIX), which is a separate structure that points into the base cluster; that is not the same as the primary index component of a KSDS.

Key Takeaways

  • Only KSDS has an index component; it maps key values to data control intervals.
  • The sequence set is the lowest level: one entry per data CI (or key range), with high key and pointer to the CI.
  • The index set is one or more levels above the sequence set; it directs the search to the right part of the sequence set.
  • Key-based access: traverse index set then sequence set to get data CI pointer, then read that CI from the data component.
  • You never open the index component by name; you open the cluster, and VSAM uses the index internally.

Explain Like I'm Five

Imagine a big filing cabinet (the data) and a small index card box (the index). Each card in the box says "Keys A–M are in drawer 1, N–Z in drawer 2," and so on. When you want a record, you look in the card box first to see which drawer to open, then you open that one drawer. The index component is that card box: it doesn't hold the papers, it just tells you which part of the cabinet to look in. The sequence set is the final list of drawers; the index set is the extra cards that help you find the right part of that list quickly.

Test Your Knowledge

Test Your Knowledge

1. Which VSAM type has an index component?

  • ESDS
  • RRDS
  • LDS
  • KSDS

2. What does the sequence set contain?

  • The actual data records
  • One entry per data CI with high key and pointer
  • Catalog information
  • JCL

3. How does VSAM find a record by key in a KSDS?

  • Scan all records
  • Use the index set and sequence set to find the data CI
  • Use RBA only
  • Use RRN only
Published
Updated
Read time4 min
AuthorMainframeMaster
Reviewed by MainframeMaster teamVerified: IBM z/OS 2.5 documentationSources: IBM DFSMS Access Method Services, z/OS VSAM documentationApplies to: z/OS 2.5