What are the main types of datasets on mainframes?

The main dataset types are Sequential (PS), Partitioned Data Set (PDS), Partitioned Data Set Extended (PDSE), VSAM (Virtual Storage Access Method), and Generation Data Group (GDG). Each type has different characteristics and is suited for different purposes. Sequential datasets are simple files, PDS/PDSE contain multiple members, VSAM provides advanced access methods, and GDG organizes related datasets by generation.

What is a sequential dataset?

A sequential dataset (PS) is the simplest dataset type—it's like a single continuous file where records are stored one after another in order. You read it from beginning to end, and you write to it sequentially. Sequential datasets are commonly used for data files, reports, log files, and simple text files. They have no internal structure—just a continuous stream of records.

What is the difference between PDS and PDSE?

PDS (Partitioned Data Set) is the traditional format that contains multiple members (like files within a file) in one dataset. PDSE (Partitioned Data Set Extended) is an enhanced version with better performance, automatic space management, support for more members, and improved recovery. Both can contain multiple members, but PDSE is generally preferred for new datasets because it offers better efficiency and features, while PDS is still widely used for compatibility.

What is VSAM and when should I use it?

VSAM (Virtual Storage Access Method) is an advanced dataset organization that provides indexed, direct, or sequential access to data. VSAM datasets can be KSDS (Key Sequenced), ESDS (Entry Sequenced), RRDS (Relative Record), or LDS (Linear). Use VSAM when you need efficient random access by key, need indexing capabilities, require direct access to specific records, or need advanced data management features beyond simple sequential access.

What is a GDG and why would I use one?

GDG (Generation Data Group) is a collection of related datasets that share a common base name and are organized by generation numbers (like .G0001V00, .G0002V00). GDGs are useful for maintaining historical versions such as daily backups, report generations, data snapshots, or any situation where you need to keep multiple versions of related data. You can reference generations relatively (like +1 for next, 0 for current, -1 for previous), making it easy to work with versions.

How do I know which dataset type to use?

Choose Sequential (PS) for simple files, single-purpose data, or when you only need sequential access. Choose PDS or PDSE when you need to store multiple related files (members) together, like source code libraries. Choose VSAM when you need indexed or direct access, efficient random access by key, or advanced data management. Choose GDG when you need to maintain multiple versions of related datasets over time.

Can I convert between dataset types?

No, you cannot directly convert a dataset from one type to another. Dataset organization (DSORG) is set when the dataset is created and cannot be changed. To change a dataset type, you must create a new dataset of the desired type and copy the data from the old dataset to the new one. This requires reading from the old dataset and writing to the new dataset using appropriate utilities or programs.

DSORG (Data Set Organization) is an attribute that specifies how data is organized in a dataset. It determines the dataset type (PS, PO for PDS, PDS for PDSE, VS for VSAM) and what access methods are available. DSORG is set when the dataset is created and cannot be changed afterward. It's one of the fundamental attributes that defines a dataset's characteristics and capabilities.

MainframeMaster

Progress0 of 0 lessons

Dataset Types Explained

Understanding dataset types is fundamental to working with mainframes. Datasets are the mainframe equivalent of files, but they come in different types, each designed for specific purposes and access patterns. This beginner-friendly guide explains the main dataset types you'll encounter: Sequential (PS), Partitioned Data Set (PDS), Partitioned Data Set Extended (PDSE), VSAM (Virtual Storage Access Method), and Generation Data Group (GDG).

Each dataset type has unique characteristics that make it suitable for different scenarios. Understanding what each type is, how it works, and when to use it helps you make informed decisions about data organization and choose the right dataset type for your needs. This tutorial provides clear explanations and practical guidance to help you understand and work with different dataset types.

What Are Datasets?

Before diving into specific types, it's important to understand what datasets are:

Mainframe Files: Datasets are the mainframe equivalent of files in other systems. They store data, programs, source code, and other information.
Organized Storage: Unlike simple files, datasets have organization attributes that determine how data is structured and accessed. This organization affects what you can do with the dataset.
Named Storage: Datasets have names (like USERID.SOURCE.COBOL) that identify them. Names follow specific rules and conventions.
Attributes: Datasets have attributes (like record format, record length, organization) that define their characteristics. These attributes are set when the dataset is created.
Access Methods: Different dataset types support different access methods—some allow sequential access only, others support random or indexed access.

Sequential Datasets (PS)

Sequential datasets are the simplest dataset type—think of them as a single continuous file.

What Are Sequential Datasets?

Sequential datasets (DSORG=PS) are simple files where records are stored one after another in order:

Simple Structure: Records are stored sequentially from first to last, with no internal organization or structure. It's like a long scroll or a single continuous stream of data.
Sequential Access: You read sequential datasets from beginning to end. To read a record in the middle, you must read all records before it. You write records sequentially, adding them to the end.
No Members: Sequential datasets contain just the data—there are no separate "members" or internal files. It's one continuous file.
Common Uses: Sequential datasets are commonly used for data files, reports, log files, output files, and simple text files where you process data from start to finish.

Characteristics of Sequential Datasets

Sequential datasets have these characteristics:

DSORG=PS: The dataset organization attribute is PS (Physical Sequential), indicating sequential organization.
Simple Access: Access is straightforward—read from start to finish, write sequentially to the end. No complex access methods or indexing.
Efficient for Sequential Processing: Very efficient when you need to process all records in order, such as reading an entire file or writing output sequentially.
Not Suitable for Random Access: Not efficient for accessing specific records in the middle without reading everything before them. If you need random access, consider VSAM.
Common Record Formats: Can use various record formats (F, V, FB, VB, U) depending on your needs. Record format determines how records are stored and delimited.

When to Use Sequential Datasets

Use sequential datasets when:

Simple Data Storage: You need simple storage for data files, reports, or text files without complex organization requirements.
Sequential Processing: You process data sequentially from start to finish, such as reading entire files or generating reports.
Output Files: You're creating output files that are written sequentially, such as program output, reports, or log files.
Data Transfer: You're transferring data between systems or applications where sequential format is appropriate.
Simple Requirements: Your requirements are simple and don't need the complexity of partitioned or VSAM datasets.

Example Uses of Sequential Datasets

Common examples include:

Data Files: Input data files for programs, such as transaction data, customer records, or business data.
Report Files: Output reports generated by programs, such as sales reports, inventory reports, or analysis reports.
Log Files: System or application log files that record events, errors, or activities sequentially.
Configuration Files: Simple configuration or parameter files that are read sequentially.
Backup Files: Backup copies of data stored in sequential format for simple restoration.

Partitioned Data Sets (PDS)

Partitioned Data Sets (PDS) are like filing cabinets—they contain multiple "members" (files) organized within one dataset.

What Are PDS Datasets?

PDS datasets (DSORG=PO) contain multiple members, each acting like a separate file:

Multiple Members: A PDS contains multiple members, each with its own name. Think of it like a folder containing multiple files, where each file is a "member".
Member Names: Each member has a name (up to 8 characters) that identifies it within the PDS. You access members by specifying both the dataset name and member name.
Common Uses: PDS datasets are commonly used for source code libraries (COBOL programs, JCL, etc.), where each program is a member. They're also used for organizing related files together.
Directory Structure: PDS datasets have a directory that lists all members and their locations. The directory helps locate members quickly.
Space Management: PDS datasets allocate space in fixed-size blocks. When space runs out, you may need to manually reorganize or allocate more space.

Characteristics of PDS Datasets

PDS datasets have these characteristics:

DSORG=PO: The dataset organization attribute is PO (Partitioned Organization), indicating partitioned structure.
Member Access: You access individual members by specifying both dataset name and member name, such as USERID.SOURCE.COBOL(MEMBER1) where MEMBER1 is the member name.
Directory: Contains a directory that lists all members. The directory helps locate members but takes up space in the dataset.
Fixed Block Structure: Space is allocated in fixed-size blocks. When you delete a member, its space may not be immediately reusable, potentially leading to space fragmentation.
Member Limits: Has practical limits on the number of members (typically thousands, depending on directory size and configuration).
Widely Used: Still very widely used, especially for source code libraries, due to compatibility and familiarity.

When to Use PDS Datasets

Use PDS datasets when:

Source Code Libraries: You need to store multiple related source code files together, such as COBOL programs, JCL procedures, or other program source files.
Organizing Related Files: You want to organize multiple related files together in one dataset for easier management.
Compatibility Requirements: You need compatibility with systems or tools that specifically require PDS format.
Traditional Workflows: Your organization uses traditional PDS-based workflows and you want to maintain consistency.
Simple Member Management: You need simple member management without the advanced features of PDSE.

Example Uses of PDS Datasets

Common examples include:

COBOL Source Libraries: Storing multiple COBOL programs, where each program is a member (e.g., USERID.SOURCE.COBOL with members PROG1, PROG2, etc.).
JCL Procedure Libraries: Storing JCL procedures (PROCs) where each procedure is a member.
Copybook Libraries: Storing COBOL copybooks or other reusable code components.
Test Data Libraries: Organizing test data files as members in a PDS.
Documentation Libraries: Storing multiple documentation files as members.

Partitioned Data Set Extended (PDSE)

PDSE is an enhanced version of PDS with better performance and features.

What Are PDSE Datasets?

PDSE datasets (DSORG=PO but with PDSE-specific attributes) are improved partitioned datasets:

Enhanced PDS: PDSE provides all PDS capabilities but with significant improvements in performance, space management, and features.
Automatic Space Management: PDSE automatically manages space, eliminating the fragmentation issues that can occur with PDS. When you delete a member, its space is immediately available for reuse.
Better Performance: PDSE generally provides better performance than PDS, especially for operations involving many members or frequent member additions and deletions.
More Members: PDSE can support significantly more members than PDS (millions vs. thousands), making it suitable for large libraries.
Improved Recovery: PDSE provides better recovery capabilities and data integrity features compared to PDS.

Characteristics of PDSE Datasets

PDSE datasets have these characteristics:

DSORG=PO with PDSE: Uses PO organization but with PDSE-specific implementation. The system recognizes it as PDSE based on other attributes.
Automatic Reorganization: Space is managed automatically—no need for manual reorganization. Deleted member space is immediately reusable.
No Directory Limitations: Directory can grow dynamically, supporting many more members than traditional PDS.
Better Concurrency: Improved support for multiple users accessing different members simultaneously.
Member Name Length: Supports longer member names (up to 8 characters like PDS, but with better handling).
Recommended for New Datasets: Generally recommended for new partitioned datasets due to superior features and performance.

When to Use PDSE Datasets

Use PDSE datasets when:

New Partitioned Datasets: You're creating new partitioned datasets and want the best performance and features available.
Large Libraries: You need to store many members (thousands or more) where PDS limitations might be an issue.
Frequent Changes: You frequently add and delete members, where PDSE's automatic space management provides significant advantages.
Performance Critical: Performance is important and you want the best performance available for partitioned datasets.
Modern Applications: You're building modern applications or libraries where PDSE's enhanced features are beneficial.

PDSE vs. PDS Comparison

Key differences between PDSE and PDS:

Space Management: PDSE automatically manages space; PDS may require manual reorganization. PDSE eliminates fragmentation issues common with PDS.
Performance: PDSE generally provides better performance, especially for operations involving many members or frequent changes.
Member Capacity: PDSE supports many more members (millions) compared to PDS (thousands).
Recovery: PDSE provides better recovery and data integrity features.
Compatibility: PDS is more widely compatible with older systems and tools; PDSE may not be supported in all environments.
Recommendation: Use PDSE for new datasets when possible; PDS is still widely used for compatibility.

VSAM (Virtual Storage Access Method)

VSAM is an advanced dataset organization that provides sophisticated access methods and data management.

What Is VSAM?

VSAM (Virtual Storage Access Method) provides advanced dataset organization with multiple access methods:

Multiple Organizations: VSAM supports different organization types: KSDS (Key Sequenced Data Set), ESDS (Entry Sequenced Data Set), RRDS (Relative Record Data Set), and LDS (Linear Data Set). Each type provides different access patterns.
Indexed Access: VSAM can provide indexed access, allowing you to access records directly by key without reading through all previous records. This is much more efficient than sequential access for random record retrieval.
Direct Access: VSAM supports direct access to specific records by key or relative record number, enabling efficient random access.
Advanced Features: VSAM provides features like alternate indexes, record-level sharing, and sophisticated data management capabilities.
Performance: VSAM is designed for high-performance data access, making it suitable for applications that need efficient random access.

VSAM Dataset Types

VSAM supports several organization types:

KSDS (Key Sequenced Data Set): Records are organized by key and can be accessed directly by key or sequentially in key order. KSDS provides both random and sequential access with excellent performance. This is the most commonly used VSAM type.
ESDS (Entry Sequenced Data Set): Records are stored in the order they were written (entry sequence). You can access them sequentially or by relative byte address, but not by key. ESDS is useful when you need to preserve write order.
RRDS (Relative Record Data Set): Records are accessed by relative record number (1, 2, 3, etc.). Each record has a slot number, and you can access records directly by number. RRDS is useful when you have a known record numbering scheme.
LDS (Linear Data Set): A byte-addressable dataset without record structure. LDS is used for special purposes like database data sets or when you need byte-level access.

When to Use VSAM

Use VSAM when:

Random Access Needed: You need to access specific records directly by key without reading through all previous records. VSAM's indexed access is much more efficient than sequential datasets for random access.
Key-Based Access: You need to look up records by key values, such as customer numbers, account numbers, or other identifiers. VSAM's key-sequenced organization makes this efficient.
High Performance Required: You need high-performance data access for applications that frequently access specific records. VSAM is optimized for efficient access.
Alternate Indexes Needed: You need to access the same data by different keys using alternate indexes. VSAM supports multiple access paths to the same data.
Database-Like Access: You need database-like capabilities with indexing, key-based access, and sophisticated data management.

Example Uses of VSAM

Common examples include:

Customer Master Files: Storing customer records accessed by customer number (key). Applications can quickly look up customer information by customer ID.
Transaction Files: Storing transaction records that need to be accessed by transaction ID or account number for processing or inquiry.
Reference Data: Storing reference data (like code tables) that applications need to access quickly by key.
Index Files: Creating indexes for other datasets to enable efficient lookup.
Application Data: Storing application data that requires efficient random access patterns.

Generation Data Groups (GDG)

GDGs organize related datasets by generation numbers, making it easy to work with versioned data.

What Are GDGs?

Generation Data Groups (GDG) are collections of related datasets organized by generation numbers:

Related Datasets: A GDG contains multiple related datasets that share a common base name but have different generation numbers. Each generation is a separate dataset, but they're organized as a group.
Generation Numbers: Each generation has a number like .G0001V00, .G0002V00, .G0003V00, etc. The number indicates the generation sequence.
Relative References: You can reference generations relatively: +1 for the next generation, 0 (or +0) for the current generation, -1 for the previous generation, -2 for two generations back, etc. This makes it easy to work with versions.
Version Management: GDGs are ideal for maintaining multiple versions of related data, such as daily backups, report generations, data snapshots, or historical versions.
Catalog Entry: GDGs have a catalog entry (the GDG base) that tracks all generations and their numbers.

Characteristics of GDGs

GDGs have these characteristics:

Base Name: GDGs have a base name (like USERID.REPORTS.DAILY) that represents the group. Individual generations have names like USERID.REPORTS.DAILY.G0001V00.
Generation Tracking: The system tracks generations and their sequence numbers automatically. When you create a new generation, it gets the next number.
Relative References: You can reference generations using relative notation (+1, 0, -1, etc.) instead of absolute generation numbers, making it easy to work with "current", "next", or "previous" versions.
Generation Limits: GDGs typically have limits on the number of generations kept (like keeping the last 10 generations). Older generations are automatically deleted when the limit is exceeded.
Any Dataset Type: Individual generations can be any dataset type (Sequential, PDS, VSAM, etc.). The GDG organization applies to how generations are managed, not the dataset type itself.

When to Use GDGs

Use GDGs when:

Versioned Data: You need to maintain multiple versions of related data over time, such as daily reports, backups, or data snapshots.
Historical Tracking: You need to keep historical versions for reference, audit, or recovery purposes.
Automated Generation: You have automated processes that create new versions regularly (daily, weekly, etc.) and you want easy reference to current, previous, or next versions.
Relative References Needed: You want to reference versions relatively (like "today's report" or "yesterday's backup") without hardcoding specific generation numbers.
Automatic Cleanup: You want automatic management of old generations, keeping only a specified number of recent generations.

Example Uses of GDGs

Common examples include:

Daily Reports: Storing daily reports where each day creates a new generation. You can reference "today's report" (+0), "yesterday's report" (-1), etc.
Backup Files: Maintaining backup generations where each backup is a new generation. You can easily reference the latest backup or restore from a previous generation.
Data Snapshots: Taking periodic snapshots of data where each snapshot is a generation. This enables point-in-time data access.
Log Files: Organizing log files by generation, where each time period or event creates a new generation.
Archive Data: Maintaining archived data where each archive period is a generation, enabling historical data access.

Choosing the Right Dataset Type

Choosing the right dataset type depends on your requirements. Here's a decision guide:

Decision Guidelines

Use this decision process:

Need to store multiple related files together? → Use PDS or PDSE (prefer PDSE for new datasets).
Need simple sequential file storage? → Use Sequential (PS).
Need random access by key? → Use VSAM (KSDS).
Need to maintain multiple versions? → Use GDG (with any underlying dataset type).
Need indexed or direct access? → Use VSAM.
Need simple, straightforward storage? → Use Sequential (PS).
Creating new partitioned dataset? → Prefer PDSE over PDS for better features and performance.

Comparison Summary

Quick comparison of dataset types:

Sequential (PS): Simplest, sequential access only, single file, good for simple data storage and sequential processing.
PDS: Multiple members, traditional format, widely compatible, may need manual space management.
PDSE: Multiple members, enhanced PDS, better performance and automatic space management, recommended for new datasets.
VSAM: Advanced organization, indexed/direct access, high performance, suitable for database-like access patterns.
GDG: Version management, relative references, automatic generation tracking, works with any dataset type.

Explain Like I'm 5: Dataset Types

Think of dataset types like different ways of organizing your toys:

Sequential (PS) is like putting all your toys in one big box, one after another. To find a toy in the middle, you have to take out all the toys before it. It's simple, but not great for finding specific toys quickly!
PDS is like having a toy box with drawers. Each drawer has a label (like "Cars", "Blocks", "Dolls"), and you can open any drawer to get those specific toys. It's organized, and you can find things by their drawer name!
PDSE is like having a super-smart toy box that organizes itself. It's like a regular toy box with drawers, but it automatically cleans up, can hold more toys, and works faster. It's like having a magic toy organizer!
VSAM is like having a toy catalog with an index. You can look up a toy in the catalog (like "find all red cars") and the catalog tells you exactly where it is. You don't have to search through everything—the catalog helps you find things quickly!
GDG is like having photo albums for the same event. You have "Vacation Day 1", "Vacation Day 2", "Vacation Day 3", etc. Each album is numbered, and you can easily say "show me today's photos" or "show me yesterday's photos" without remembering the exact album number!

So dataset types are like different ways of organizing and storing your information, each designed for different needs—simple storage, organized cabinets, smart systems, or versioned collections!

Practice Exercises

Complete these exercises to reinforce your understanding of dataset types:

Exercise 1: Identifying Dataset Types

Practice identification: use ISPF Data Set Utility to view dataset attributes, identify DSORG values, determine dataset types, and understand what each type means. Build familiarity with recognizing different dataset types.

Exercise 2: Working with Sequential Datasets

Practice with sequential datasets: create a sequential dataset, write data to it, read data from it sequentially, and understand sequential access patterns. Compare this to other dataset types.

Exercise 3: Working with PDS and PDSE

Practice with partitioned datasets: create a PDS or PDSE, add members to it, list members, access individual members, and understand member organization. Compare PDS and PDSE characteristics if both are available.

Exercise 4: Understanding VSAM

Learn about VSAM: if VSAM is available, explore VSAM datasets, understand different VSAM types (KSDS, ESDS, RRDS), and learn about key-based access. Understand when VSAM is appropriate.

Exercise 5: Understanding GDGs

Learn about GDGs: explore existing GDGs if available, understand generation numbering, practice relative references (+1, 0, -1), and understand version management concepts.

Test Your Knowledge

1. What is the simplest dataset type?

PDS
PDSE
Sequential (PS)
VSAM

2. Which dataset type can contain multiple members (files)?

Sequential only
PDS and PDSE
VSAM only
GDG only

3. What is the main advantage of PDSE over PDS?

It is simpler
Better performance, automatic space management, and more features
It is older
It requires less space

4. When should you use VSAM?

For simple sequential files
When you need indexed or direct access by key
For storing source code
For simple text files

5. What is a GDG used for?

Storing single files
Organizing related datasets by generation numbers for versioning
Indexed access
Simple sequential storage

6. Can you change a dataset's type after it's created?

Yes, easily
No, DSORG is set at creation and cannot be changed
Only for PDS to PDSE
Only with special utilities

Dataset Types Explained

What Are Datasets?

Sequential Datasets (PS)

What Are Sequential Datasets?

Characteristics of Sequential Datasets

When to Use Sequential Datasets

Example Uses of Sequential Datasets

Partitioned Data Sets (PDS)

What Are PDS Datasets?

Characteristics of PDS Datasets

When to Use PDS Datasets

Example Uses of PDS Datasets

Partitioned Data Set Extended (PDSE)

What Are PDSE Datasets?

Characteristics of PDSE Datasets

When to Use PDSE Datasets

PDSE vs. PDS Comparison

VSAM (Virtual Storage Access Method)

What Is VSAM?

VSAM Dataset Types

When to Use VSAM

Example Uses of VSAM

Generation Data Groups (GDG)

What Are GDGs?

Characteristics of GDGs

When to Use GDGs

Example Uses of GDGs

Choosing the Right Dataset Type

Decision Guidelines

Comparison Summary

Explain Like I'm 5: Dataset Types

Practice Exercises

Exercise 1: Identifying Dataset Types

Exercise 2: Working with Sequential Datasets

Exercise 3: Working with PDS and PDSE

Exercise 4: Understanding VSAM

Exercise 5: Understanding GDGs

Test Your Knowledge

Related Concepts

Dataset Types (Technical Reference)

Dataset Attributes

Allocating Datasets

Dataset Management

Related Pages