In COBOL, a string is a fixed-length sequence of characters: letters, digits, spaces, or other symbols. Unlike in many modern languages, COBOL strings have no null terminator and do not shrink or grow at runtime. You define them with a PICTURE clause (such as PIC X(n) or PIC A(n)), and every operation respects that fixed length—with padding when the value is shorter and truncation when it is longer. This page explains what strings are in COBOL, how to define them, how they behave, and how the main string operations fit together.
Imagine a row of boxes, each holding one letter or character. The number of boxes is fixed: if you have 10 boxes and only put "HELLO" in them, the other 5 boxes are filled with spaces. If you try to put 15 letters in 10 boxes, only the first 10 are kept. In COBOL, a string is exactly that: a fixed number of character boxes. The computer always knows how long the string is because you said so when you defined it. There is no special "end" character; the length is the rule.
A string in COBOL is an alphanumeric or national data item: a field that holds character data and has a fixed length defined at compile time. The most common way to define it is with PIC X(n), where n is the number of characters. The field always occupies n bytes (in the default character set, often EBCDIC on the mainframe). It can contain any character that fits in one byte (for PIC X) or, with PIC A, only letters and space. National strings (PIC N) support double-byte or Unicode when the compiler and runtime support it. What matters for "string" in everyday COBOL is: fixed length, no null terminator, and operations that work on the whole length (with padding and truncation rules).
You define a string in the Data Division using a PICTURE (or PIC) clause. The symbol and the length in parentheses determine the type and size.
| Picture | Allows | Typical use |
|---|---|---|
| PIC X(n) | Any character (letters, digits, spaces, symbols) | Names, codes, free-form text, IDs |
| PIC A(n) | Letters and space only | Alphabetic codes, names when digits not allowed |
| PIC N(n) | National (Unicode) characters | DBCS or Unicode text when supported |
PIC X(n) is the workhorse: it allows any character in the character set. Use it for names, addresses, codes, IDs, and any free-form text. PIC A(n) restricts to alphabetic characters and space; some shops use it to enforce "letters only" so that invalid data is caught when moved. PIC N(n) is for national (often DBCS or Unicode) data when your compiler supports it; each character may use more than one byte. The length n is always in character positions (not necessarily bytes for N). All of these produce a fixed-length field: the program always allocates the same amount of storage.
12345WORKING-STORAGE SECTION. 01 WS-NAME PIC X(30). 01 WS-CODE PIC A(6). 01 WS-ADDRESS PIC X(60) VALUE SPACES. 01 WS-ID PIC X(10) VALUE 'ID0000001'.
VALUE is optional. If you omit it, the initial value is undefined (or spaces in some implementations). VALUE SPACES initializes the whole field to spaces. A literal like 'ID0000001' is left-aligned; the rest of the 10 characters are spaces. You cannot assign a "short" string and have the field shrink; the field is always the declared length.
COBOL strings are not null-terminated. In C or similar languages, a string might be "HELLO\0" and the length is found by scanning until the null. In COBOL, the length is always the declared length. A PIC X(10) field is 10 characters. If the meaningful data is only "HELLO", the remaining five positions are still part of the string—usually spaces. Comparisons and operations use the full 10 characters unless you use reference modification or delimiters (e.g. in STRING with DELIMITED BY SPACE) to work with only the significant portion. This design matches fixed-length records in files: every record has the same layout, and every field has a fixed position and length.
When you move or combine string data, the rules are simple. If the receiving field is longer than the data being placed, the extra positions on the right are filled with spaces (for alphanumeric). If the receiving field is shorter, the data is truncated on the right: only the leftmost characters that fit are kept. So MOVE 'ABCDEFGHIJ' to a PIC X(5) gives 'ABCDE'. MOVE 'ABC' to a PIC X(5) gives 'ABC ' (two trailing spaces). This applies to MOVE and to the result of STRING (when the receiving field is filled left to right). There is no automatic "trim" or "resize"; you control significant data with DELIMITED BY in STRING or with reference modification when you need a substring.
1234501 WS-SHORT PIC X(5). 01 WS-LONG PIC X(10). MOVE 'HELLO' TO WS-LONG *> WS-LONG = 'HELLO ' (5 spaces) MOVE 'HELLO WORLD' TO WS-SHORT *> WS-SHORT = 'HELLO' (truncated)
When you compare two alphanumeric strings with IF or EVALUATE, the comparison is character-by-character over the full length of the shorter field (or both if same length). Spaces are significant: 'ABC ' and 'ABC' are not equal if one is PIC X(5) and the other PIC X(3), because the first has two trailing spaces. For case-insensitive comparison, you often normalize with FUNCTION UPPER-CASE (or INSPECT CONVERTING) and then compare. Leading and trailing spaces can cause surprises; many programs use INSPECT or UNSTRING to normalize or trim before comparing. Reference modification lets you compare only a portion (e.g. the first 10 characters) if needed.
COBOL provides a set of verbs and functions for working with strings. They all assume fixed-length fields and the padding/truncation rules above.
| Operation | What it does |
|---|---|
| MOVE | Copy one string to another; padding or truncation by length |
| INSPECT | Count, replace, or convert characters in place |
| STRING | Concatenate several items into one receiving string |
| UNSTRING | Split one string into multiple receiving fields by delimiters |
| Reference modification | Refer to a substring by position and length |
| Intrinsic functions | LENGTH, UPPER-CASE, LOWER-CASE, REVERSE, etc. |
MOVE is the basic assignment: copy one string to another with padding or truncation. INSPECT lets you count occurrences (TALLYING), replace characters or strings (REPLACING), or translate characters (CONVERTING) in place. STRING concatenates two or more sending items into one receiving string; you control how much of each sending item is used with DELIMITED BY SIZE or DELIMITED BY delimiter. UNSTRING does the opposite: it splits one source string into multiple receiving fields using delimiters (e.g. comma or space). Reference modification (e.g. WS-NAME(1:10)) lets you refer to a substring by starting position and length. Intrinsic functions such as LENGTH, UPPER-CASE, LOWER-CASE, and REVERSE operate on string values and return new values or lengths. For full detail on INSPECT, STRING, and UNSTRING see the String Manipulation tutorial; for functions and reference modification see String Functions.
Use PIC X when the field can contain any character: part numbers, mixed names, addresses, codes that include digits or symbols. Use PIC A when the business rule is "letters and space only"; the compiler can help catch invalid data when something else is moved in. PIC A is less common in legacy applications that accept any character; PIC X(n) is the default for general-purpose strings. If you need to validate alphabetic-only input, you can also use PIC X and check with class tests (NUMERIC, ALPHABETIC) in the procedure code instead of relying on PIC A.
1. A COBOL string defined as PIC X(20) has how many characters?
2. When you MOVE a 10-character value into a PIC X(15) field, what happens?
3. To join first name and last name into one field you use: