What is the index set in VSAM?

The index set is the upper level(s) of the KSDS index component. It sits above the sequence set. Each index set level contains separator keys and pointers to the next level down (another index set level or the sequence set). VSAM uses the index set to quickly narrow down which part of the sequence set (and thus which data CI) to search when doing a random read by key.

How many levels does the VSAM index set have?

The index set can have one or more levels (often one to three) depending on how many data CIs and sequence set entries exist. A small KSDS might have only a sequence set (no index set or a single “top” level). A large KSDS has multiple index set levels so that searching does not require scanning the entire sequence set.

Is the index set used for sequential access?

Sequential access is driven mainly by the sequence set, which is in key order and points to data CIs. The index set is used primarily for random access: given a key, traverse the index set to find the right part of the sequence set, then use the sequence set to get the data CI. So the index set speeds up random key lookup; sequential read walks the sequence set in order.

MainframeMaster

VSAM Index Set

Q: What does the index set point to?

The index set points to the next level down: the top index set level points to the sequence set (or to a lower index set level). Lower index set levels point to the sequence set or to another index set level. The sequence set points to the data CIs. So the index set never points directly to data; it points to the sequence set or to more index set.

The index set is the upper level (or levels) of the index component in a Key-Sequenced Data Set (KSDS). It sits above the sequence set. The sequence set has one entry per data control interval and points directly to the data component. The index set does not point to data; it contains separator keys and pointers that direct the search to the correct part of the sequence set (or to a lower index set level). So when you do a random read by key, VSAM starts at the top of the index set, compares the search key to the separators, follows the right pointer down, and repeats until it reaches the sequence set. The sequence set then gives the pointer to the data CI. This multi-level structure keeps the number of index reads small even when there are many data CIs. This page explains what the index set is, how it is organized (B-tree style), how many levels it can have, and how VSAM uses it for random access.

What Is the Index Set?

The index set is the part of the KSDS index that is above the sequence set. The index component as a whole has two kinds of levels: the sequence set (bottom) and the index set (top). The index set can have one or more levels. Each index set record (stored in an index CI) contains separator keys and pointers. A separator key is a value that divides the key space—for example, “all keys less than this go left, all keys greater or equal go right.” Each pointer points to an index CI at the next level down: either another index set level or the sequence set. So the index set is a tree: you start at the root (top index set record), compare your search key to the separators, follow one pointer down, and repeat until you reach the sequence set. The sequence set then gives the pointer to the data CI that might contain your key. So the index set’s job is to reduce the search: instead of scanning every sequence set entry, you do a few index set lookups and then one sequence set lookup.

Why the Index Set Exists

If the sequence set had only a few entries (e.g. 10 or 20 data CIs), you could scan the sequence set to find the right CI without an index set. But when the sequence set has hundreds or thousands of entries, scanning would be slow. The index set groups the sequence set entries (or groups of them) and stores a separator and a pointer for each group. So one index set record might say: “keys A–M are in this branch, keys N–Z in that branch.” With one read you eliminate half (or more) of the sequence set. Another level of index set can subdivide further. So with two or three index reads you can narrow down to the one sequence set entry (and thus the one data CI) that might contain your key. That is why the index set exists: to make random key lookup efficient when there are many data CIs.

Index set and sequence set (high level)
Level	Content
Top index set	One index record (or few). Contains separator keys and pointers to the next level (lower index set or sequence set).
Lower index set (if any)	More index records. Each has separators and pointers to the next level down. Repeats until the sequence set is reached.
Sequence set	One entry per data CI; high key and pointer to data CI. The index set does not point to data; it points to the sequence set.

How Many Index Set Levels?

The number of index set levels depends on how many sequence set entries (and thus data CIs) you have and how many entries fit in each index CI. A very small KSDS might have only the sequence set—no index set, or a single “root” that is just the sequence set. As the cluster grows, VSAM adds index set levels. Typically there are one to three index set levels. The top level (root) has one index record (one index CI) in the smallest case; that record points to the sequence set. When the sequence set grows too large to be reached in one step, another level is added: the root then points to several index CIs at the next level, each of which points to a range of sequence set entries. So the structure is like a B-tree: the root splits the key space, each child splits a subset, and the leaves (sequence set) point to the data. The exact number of levels is managed by VSAM; you do not specify it. You only define the cluster; VSAM builds and maintains the index set as the data grows.

What Is in an Index Set Record?

Each index set record (in an index CI) contains multiple entries. Each entry typically has a separator key (or high key) and a pointer. The separator indicates the boundary: “keys in this range are in the branch pointed to by this pointer.” So when VSAM is searching for key K, it looks at the separators in the current index record, finds the entry whose range includes K (e.g. K is less than or equal to the separator, or between this separator and the next), and follows that entry’s pointer to the next level. The pointer is usually an RBA or similar—the location of the index CI (or sequence set CI) at the next level. So the index set is just “separators + pointers”; no data records are stored there. All the actual key and record data are in the data component; the index set is only for navigation.

B-tree Style Structure

The index component (index set + sequence set) is often described as a B-tree or B+-tree. In a B-tree, internal nodes (the index set) hold separators and pointers; the “leaves” (the sequence set) hold the final pointers to the data. The tree is balanced so that every path from the root to a leaf has roughly the same length. So the number of I/Os to find a key is proportional to the height of the tree—usually two, three, or four reads for the index, plus one read for the data CI. That is why random access stays fast even with millions of records: you do not scan the file; you follow a short path down the index. The index set is what makes that path short by grouping the sequence set into a hierarchy.

Random Access: Walking the Index Set

When your program does a READ with a key, VSAM does the following. It starts with the root of the index (the top index set record). Often this root is cached in memory. It compares the search key to the separators in the root and chooses the pointer that corresponds to the range containing the key. It follows that pointer to the next level (another index set CI or the sequence set). If it is an index set CI, it repeats: compare key to separators, follow pointer. When it finally reaches the sequence set, it finds the sequence set entry whose key range includes the search key. That entry’s pointer is the RBA of the data CI. VSAM then reads that data CI (if not already in a buffer), searches within the CI for the record with the exact key, and returns the record (or “not found”). So the index set is used only in this “top-down” traversal; the sequence set is the final step that points to the data.

Sequential Access and the Index Set

For sequential access (e.g. READ NEXT in key order), VSAM typically uses the sequence set, which is already in key order. The application might START at a key and then READ NEXT; VSAM finds the position in the sequence set (using the index set if needed for the START) and then walks the sequence set forward, reading each data CI in turn. So the index set is used for the initial positioning (finding the starting sequence set entry); after that, sequential read follows the sequence set order and may not need to go back to the index set. So the index set is most important for random access; for sequential access the sequence set does most of the work.

When the Index Set Grows

As you add records to the KSDS, the data component may gain new control intervals (CI splits) and the sequence set gains new entries. When the sequence set or an index set level gets too large, VSAM may split index CIs or add a new index set level. This is all done automatically. You do not define the number of index set levels or the size of index CIs (in many IDCAMS implementations the index CI size has a default or is derived). VSAM keeps the tree balanced so that search remains efficient. So from an application perspective you just use the cluster; the index set (and sequence set) are maintained by the access method.

Index Set vs Sequence Set: Summary

The sequence set has one entry per data CI and points directly to data CIs. The index set has one or more levels above the sequence set and points only to the sequence set (or to lower index set levels). The index set reduces the search space so that finding the right sequence set entry takes only a few index reads. Together they form the index component that makes key-based access efficient. Only KSDS has this structure; ESDS, RRDS, and LDS do not have an index component.

Key Takeaways

The index set is the upper level(s) of the KSDS index. It contains separator keys and pointers to the next level (index set or sequence set), not to the data.
VSAM uses the index set to narrow the search: a few index reads lead to the right sequence set entry, which then points to the data CI.
The number of index set levels (often one to three) depends on the size of the cluster. VSAM builds and maintains the levels automatically.
The index set is used mainly for random access; sequential access follows the sequence set in key order after initial positioning.

Explain Like I'm Five

Think of the index set as the “table of contents” at the top of a big book. You don’t read every page to find a topic; you look at the table of contents (index set), which says “chapter 1 is here, chapter 2 is there,” and you jump to the right chapter. The “chapters” are like the sequence set: each one points to the actual pages (data CIs). So the index set is the quick way to get to the right part of the list; the sequence set is the list that points to the real data.

Test Your Knowledge

1. What does the index set point to?

Data records directly
The sequence set or lower index set levels
The catalog
JCL

2. What is the main purpose of the index set?

Store data records
Speed up random key lookup by narrowing the search
Backup
Catalog

3. Can a KSDS have no index set level?

No, always has index set
Yes, a small KSDS may have only the sequence set
Only for ESDS
Only for alternate index

VSAM Index Set

What Is the Index Set?

Why the Index Set Exists

How Many Index Set Levels?

What Is in an Index Set Record?

B-tree Style Structure

Random Access: Walking the Index Set

Sequential Access and the Index Set

When the Index Set Grows

Index Set vs Sequence Set: Summary

Key Takeaways

Explain Like I'm Five

Test Your Knowledge

Test Your Knowledge

Index component

Sequence set

Control interval (CI)

KSDS structure