Understanding dataset types is fundamental to working with mainframes. Datasets are the mainframe equivalent of files, but they come in different types, each designed for specific purposes and access patterns. This beginner-friendly guide explains the main dataset types you'll encounter: Sequential (PS), Partitioned Data Set (PDS), Partitioned Data Set Extended (PDSE), VSAM (Virtual Storage Access Method), and Generation Data Group (GDG).
Each dataset type has unique characteristics that make it suitable for different scenarios. Understanding what each type is, how it works, and when to use it helps you make informed decisions about data organization and choose the right dataset type for your needs. This tutorial provides clear explanations and practical guidance to help you understand and work with different dataset types.
What Are Datasets?
Before diving into specific types, it's important to understand what datasets are:
Mainframe Files: Datasets are the mainframe equivalent of files in other systems. They store data, programs, source code, and other information.
Organized Storage: Unlike simple files, datasets have organization attributes that determine how data is structured and accessed. This organization affects what you can do with the dataset.
Named Storage: Datasets have names (like USERID.SOURCE.COBOL) that identify them. Names follow specific rules and conventions.
Attributes: Datasets have attributes (like record format, record length, organization) that define their characteristics. These attributes are set when the dataset is created.
Access Methods: Different dataset types support different access methods—some allow sequential access only, others support random or indexed access.
Sequential Datasets (PS)
Sequential datasets are the simplest dataset type—think of them as a single continuous file.
What Are Sequential Datasets?
Sequential datasets (DSORG=PS) are simple files where records are stored one after another in order:
Simple Structure: Records are stored sequentially from first to last, with no internal organization or structure. It's like a long scroll or a single continuous stream of data.
Sequential Access: You read sequential datasets from beginning to end. To read a record in the middle, you must read all records before it. You write records sequentially, adding them to the end.
No Members: Sequential datasets contain just the data—there are no separate "members" or internal files. It's one continuous file.
Common Uses: Sequential datasets are commonly used for data files, reports, log files, output files, and simple text files where you process data from start to finish.
Characteristics of Sequential Datasets
Sequential datasets have these characteristics:
DSORG=PS: The dataset organization attribute is PS (Physical Sequential), indicating sequential organization.
Simple Access: Access is straightforward—read from start to finish, write sequentially to the end. No complex access methods or indexing.
Efficient for Sequential Processing: Very efficient when you need to process all records in order, such as reading an entire file or writing output sequentially.
Not Suitable for Random Access: Not efficient for accessing specific records in the middle without reading everything before them. If you need random access, consider VSAM.
Common Record Formats: Can use various record formats (F, V, FB, VB, U) depending on your needs. Record format determines how records are stored and delimited.
When to Use Sequential Datasets
Use sequential datasets when:
Simple Data Storage: You need simple storage for data files, reports, or text files without complex organization requirements.
Sequential Processing: You process data sequentially from start to finish, such as reading entire files or generating reports.
Output Files: You're creating output files that are written sequentially, such as program output, reports, or log files.
Data Transfer: You're transferring data between systems or applications where sequential format is appropriate.
Simple Requirements: Your requirements are simple and don't need the complexity of partitioned or VSAM datasets.
Example Uses of Sequential Datasets
Common examples include:
Data Files: Input data files for programs, such as transaction data, customer records, or business data.
Report Files: Output reports generated by programs, such as sales reports, inventory reports, or analysis reports.
Log Files: System or application log files that record events, errors, or activities sequentially.
Configuration Files: Simple configuration or parameter files that are read sequentially.
Backup Files: Backup copies of data stored in sequential format for simple restoration.
Partitioned Data Sets (PDS)
Partitioned Data Sets (PDS) are like filing cabinets—they contain multiple "members" (files) organized within one dataset.
What Are PDS Datasets?
PDS datasets (DSORG=PO) contain multiple members, each acting like a separate file:
Multiple Members: A PDS contains multiple members, each with its own name. Think of it like a folder containing multiple files, where each file is a "member".
Member Names: Each member has a name (up to 8 characters) that identifies it within the PDS. You access members by specifying both the dataset name and member name.
Common Uses: PDS datasets are commonly used for source code libraries (COBOL programs, JCL, etc.), where each program is a member. They're also used for organizing related files together.
Directory Structure: PDS datasets have a directory that lists all members and their locations. The directory helps locate members quickly.
Space Management: PDS datasets allocate space in fixed-size blocks. When space runs out, you may need to manually reorganize or allocate more space.
Characteristics of PDS Datasets
PDS datasets have these characteristics:
DSORG=PO: The dataset organization attribute is PO (Partitioned Organization), indicating partitioned structure.
Member Access: You access individual members by specifying both dataset name and member name, such as USERID.SOURCE.COBOL(MEMBER1) where MEMBER1 is the member name.
Directory: Contains a directory that lists all members. The directory helps locate members but takes up space in the dataset.
Fixed Block Structure: Space is allocated in fixed-size blocks. When you delete a member, its space may not be immediately reusable, potentially leading to space fragmentation.
Member Limits: Has practical limits on the number of members (typically thousands, depending on directory size and configuration).
Widely Used: Still very widely used, especially for source code libraries, due to compatibility and familiarity.
When to Use PDS Datasets
Use PDS datasets when:
Source Code Libraries: You need to store multiple related source code files together, such as COBOL programs, JCL procedures, or other program source files.
Organizing Related Files: You want to organize multiple related files together in one dataset for easier management.
Compatibility Requirements: You need compatibility with systems or tools that specifically require PDS format.
Traditional Workflows: Your organization uses traditional PDS-based workflows and you want to maintain consistency.
Simple Member Management: You need simple member management without the advanced features of PDSE.
Example Uses of PDS Datasets
Common examples include:
COBOL Source Libraries: Storing multiple COBOL programs, where each program is a member (e.g., USERID.SOURCE.COBOL with members PROG1, PROG2, etc.).
JCL Procedure Libraries: Storing JCL procedures (PROCs) where each procedure is a member.
Copybook Libraries: Storing COBOL copybooks or other reusable code components.
Test Data Libraries: Organizing test data files as members in a PDS.
Documentation Libraries: Storing multiple documentation files as members.
Partitioned Data Set Extended (PDSE)
PDSE is an enhanced version of PDS with better performance and features.
What Are PDSE Datasets?
PDSE datasets (DSORG=PO but with PDSE-specific attributes) are improved partitioned datasets:
Enhanced PDS: PDSE provides all PDS capabilities but with significant improvements in performance, space management, and features.
Automatic Space Management: PDSE automatically manages space, eliminating the fragmentation issues that can occur with PDS. When you delete a member, its space is immediately available for reuse.
Better Performance: PDSE generally provides better performance than PDS, especially for operations involving many members or frequent member additions and deletions.
More Members: PDSE can support significantly more members than PDS (millions vs. thousands), making it suitable for large libraries.
Improved Recovery: PDSE provides better recovery capabilities and data integrity features compared to PDS.
Characteristics of PDSE Datasets
PDSE datasets have these characteristics:
DSORG=PO with PDSE: Uses PO organization but with PDSE-specific implementation. The system recognizes it as PDSE based on other attributes.
Automatic Reorganization: Space is managed automatically—no need for manual reorganization. Deleted member space is immediately reusable.
No Directory Limitations: Directory can grow dynamically, supporting many more members than traditional PDS.
Better Concurrency: Improved support for multiple users accessing different members simultaneously.
Member Name Length: Supports longer member names (up to 8 characters like PDS, but with better handling).
Recommended for New Datasets: Generally recommended for new partitioned datasets due to superior features and performance.
When to Use PDSE Datasets
Use PDSE datasets when:
New Partitioned Datasets: You're creating new partitioned datasets and want the best performance and features available.
Large Libraries: You need to store many members (thousands or more) where PDS limitations might be an issue.
Frequent Changes: You frequently add and delete members, where PDSE's automatic space management provides significant advantages.
Performance Critical: Performance is important and you want the best performance available for partitioned datasets.
Modern Applications: You're building modern applications or libraries where PDSE's enhanced features are beneficial.
PDSE vs. PDS Comparison
Key differences between PDSE and PDS:
Space Management: PDSE automatically manages space; PDS may require manual reorganization. PDSE eliminates fragmentation issues common with PDS.
Performance: PDSE generally provides better performance, especially for operations involving many members or frequent changes.
Member Capacity: PDSE supports many more members (millions) compared to PDS (thousands).
Recovery: PDSE provides better recovery and data integrity features.
Compatibility: PDS is more widely compatible with older systems and tools; PDSE may not be supported in all environments.
Recommendation: Use PDSE for new datasets when possible; PDS is still widely used for compatibility.
VSAM (Virtual Storage Access Method)
VSAM is an advanced dataset organization that provides sophisticated access methods and data management.
Multiple Organizations: VSAM supports different organization types: KSDS (Key Sequenced Data Set), ESDS (Entry Sequenced Data Set), RRDS (Relative Record Data Set), and LDS (Linear Data Set). Each type provides different access patterns.
Indexed Access: VSAM can provide indexed access, allowing you to access records directly by key without reading through all previous records. This is much more efficient than sequential access for random record retrieval.
Direct Access: VSAM supports direct access to specific records by key or relative record number, enabling efficient random access.
Advanced Features: VSAM provides features like alternate indexes, record-level sharing, and sophisticated data management capabilities.
Performance: VSAM is designed for high-performance data access, making it suitable for applications that need efficient random access.
VSAM Dataset Types
VSAM supports several organization types:
KSDS (Key Sequenced Data Set): Records are organized by key and can be accessed directly by key or sequentially in key order. KSDS provides both random and sequential access with excellent performance. This is the most commonly used VSAM type.
ESDS (Entry Sequenced Data Set): Records are stored in the order they were written (entry sequence). You can access them sequentially or by relative byte address, but not by key. ESDS is useful when you need to preserve write order.
RRDS (Relative Record Data Set): Records are accessed by relative record number (1, 2, 3, etc.). Each record has a slot number, and you can access records directly by number. RRDS is useful when you have a known record numbering scheme.
LDS (Linear Data Set): A byte-addressable dataset without record structure. LDS is used for special purposes like database data sets or when you need byte-level access.
When to Use VSAM
Use VSAM when:
Random Access Needed: You need to access specific records directly by key without reading through all previous records. VSAM's indexed access is much more efficient than sequential datasets for random access.
Key-Based Access: You need to look up records by key values, such as customer numbers, account numbers, or other identifiers. VSAM's key-sequenced organization makes this efficient.
High Performance Required: You need high-performance data access for applications that frequently access specific records. VSAM is optimized for efficient access.
Alternate Indexes Needed: You need to access the same data by different keys using alternate indexes. VSAM supports multiple access paths to the same data.
Database-Like Access: You need database-like capabilities with indexing, key-based access, and sophisticated data management.
Example Uses of VSAM
Common examples include:
Customer Master Files: Storing customer records accessed by customer number (key). Applications can quickly look up customer information by customer ID.
Transaction Files: Storing transaction records that need to be accessed by transaction ID or account number for processing or inquiry.
Reference Data: Storing reference data (like code tables) that applications need to access quickly by key.
Index Files: Creating indexes for other datasets to enable efficient lookup.
Application Data: Storing application data that requires efficient random access patterns.
Generation Data Groups (GDG)
GDGs organize related datasets by generation numbers, making it easy to work with versioned data.
What Are GDGs?
Generation Data Groups (GDG) are collections of related datasets organized by generation numbers:
Related Datasets: A GDG contains multiple related datasets that share a common base name but have different generation numbers. Each generation is a separate dataset, but they're organized as a group.
Generation Numbers: Each generation has a number like .G0001V00, .G0002V00, .G0003V00, etc. The number indicates the generation sequence.
Relative References: You can reference generations relatively: +1 for the next generation, 0 (or +0) for the current generation, -1 for the previous generation, -2 for two generations back, etc. This makes it easy to work with versions.
Version Management: GDGs are ideal for maintaining multiple versions of related data, such as daily backups, report generations, data snapshots, or historical versions.
Catalog Entry: GDGs have a catalog entry (the GDG base) that tracks all generations and their numbers.
Characteristics of GDGs
GDGs have these characteristics:
Base Name: GDGs have a base name (like USERID.REPORTS.DAILY) that represents the group. Individual generations have names like USERID.REPORTS.DAILY.G0001V00.
Generation Tracking: The system tracks generations and their sequence numbers automatically. When you create a new generation, it gets the next number.
Relative References: You can reference generations using relative notation (+1, 0, -1, etc.) instead of absolute generation numbers, making it easy to work with "current", "next", or "previous" versions.
Generation Limits: GDGs typically have limits on the number of generations kept (like keeping the last 10 generations). Older generations are automatically deleted when the limit is exceeded.
Any Dataset Type: Individual generations can be any dataset type (Sequential, PDS, VSAM, etc.). The GDG organization applies to how generations are managed, not the dataset type itself.
When to Use GDGs
Use GDGs when:
Versioned Data: You need to maintain multiple versions of related data over time, such as daily reports, backups, or data snapshots.
Historical Tracking: You need to keep historical versions for reference, audit, or recovery purposes.
Automated Generation: You have automated processes that create new versions regularly (daily, weekly, etc.) and you want easy reference to current, previous, or next versions.
Relative References Needed: You want to reference versions relatively (like "today's report" or "yesterday's backup") without hardcoding specific generation numbers.
Automatic Cleanup: You want automatic management of old generations, keeping only a specified number of recent generations.
Example Uses of GDGs
Common examples include:
Daily Reports: Storing daily reports where each day creates a new generation. You can reference "today's report" (+0), "yesterday's report" (-1), etc.
Backup Files: Maintaining backup generations where each backup is a new generation. You can easily reference the latest backup or restore from a previous generation.
Data Snapshots: Taking periodic snapshots of data where each snapshot is a generation. This enables point-in-time data access.
Log Files: Organizing log files by generation, where each time period or event creates a new generation.
Archive Data: Maintaining archived data where each archive period is a generation, enabling historical data access.
Choosing the Right Dataset Type
Choosing the right dataset type depends on your requirements. Here's a decision guide:
Decision Guidelines
Use this decision process:
Need to store multiple related files together? → Use PDS or PDSE (prefer PDSE for new datasets).
Need simple sequential file storage? → Use Sequential (PS).
Need random access by key? → Use VSAM (KSDS).
Need to maintain multiple versions? → Use GDG (with any underlying dataset type).
Need indexed or direct access? → Use VSAM.
Need simple, straightforward storage? → Use Sequential (PS).
Creating new partitioned dataset? → Prefer PDSE over PDS for better features and performance.
Comparison Summary
Quick comparison of dataset types:
Sequential (PS): Simplest, sequential access only, single file, good for simple data storage and sequential processing.
PDS: Multiple members, traditional format, widely compatible, may need manual space management.
PDSE: Multiple members, enhanced PDS, better performance and automatic space management, recommended for new datasets.
VSAM: Advanced organization, indexed/direct access, high performance, suitable for database-like access patterns.
GDG: Version management, relative references, automatic generation tracking, works with any dataset type.
Explain Like I'm 5: Dataset Types
Think of dataset types like different ways of organizing your toys:
Sequential (PS) is like putting all your toys in one big box, one after another. To find a toy in the middle, you have to take out all the toys before it. It's simple, but not great for finding specific toys quickly!
PDS is like having a toy box with drawers. Each drawer has a label (like "Cars", "Blocks", "Dolls"), and you can open any drawer to get those specific toys. It's organized, and you can find things by their drawer name!
PDSE is like having a super-smart toy box that organizes itself. It's like a regular toy box with drawers, but it automatically cleans up, can hold more toys, and works faster. It's like having a magic toy organizer!
VSAM is like having a toy catalog with an index. You can look up a toy in the catalog (like "find all red cars") and the catalog tells you exactly where it is. You don't have to search through everything—the catalog helps you find things quickly!
GDG is like having photo albums for the same event. You have "Vacation Day 1", "Vacation Day 2", "Vacation Day 3", etc. Each album is numbered, and you can easily say "show me today's photos" or "show me yesterday's photos" without remembering the exact album number!
So dataset types are like different ways of organizing and storing your information, each designed for different needs—simple storage, organized cabinets, smart systems, or versioned collections!
Practice Exercises
Complete these exercises to reinforce your understanding of dataset types:
Exercise 1: Identifying Dataset Types
Practice identification: use ISPF Data Set Utility to view dataset attributes, identify DSORG values, determine dataset types, and understand what each type means. Build familiarity with recognizing different dataset types.
Exercise 2: Working with Sequential Datasets
Practice with sequential datasets: create a sequential dataset, write data to it, read data from it sequentially, and understand sequential access patterns. Compare this to other dataset types.
Exercise 3: Working with PDS and PDSE
Practice with partitioned datasets: create a PDS or PDSE, add members to it, list members, access individual members, and understand member organization. Compare PDS and PDSE characteristics if both are available.
Exercise 4: Understanding VSAM
Learn about VSAM: if VSAM is available, explore VSAM datasets, understand different VSAM types (KSDS, ESDS, RRDS), and learn about key-based access. Understand when VSAM is appropriate.
Exercise 5: Understanding GDGs
Learn about GDGs: explore existing GDGs if available, understand generation numbering, practice relative references (+1, 0, -1), and understand version management concepts.
Test Your Knowledge
1. What is the simplest dataset type?
PDS
PDSE
Sequential (PS)
VSAM
2. Which dataset type can contain multiple members (files)?
Sequential only
PDS and PDSE
VSAM only
GDG only
3. What is the main advantage of PDSE over PDS?
It is simpler
Better performance, automatic space management, and more features
It is older
It requires less space
4. When should you use VSAM?
For simple sequential files
When you need indexed or direct access by key
For storing source code
For simple text files
5. What is a GDG used for?
Storing single files
Organizing related datasets by generation numbers for versioning
Indexed access
Simple sequential storage
6. Can you change a dataset's type after it's created?
Yes, easily
No, DSORG is set at creation and cannot be changed