Are Your Programs and Data Files Year 2000 Compliant?
Information Technologies staff have been assessing and correcting centrally supported application software and data files on the central UNIX systems and MVS systems so they are Y2K (Year 2000) compliant. However, it is your responsibility to assess and correct programs you have written or may be using, and to look at their input and output files for potential Y2K problems.
Introduction
The key question for you is whether you use 2-digit year-values or 4-digit year-values to represent or store dates. In these cases, the corresponding programs, input and output files need to be examined. The secondary question is whether the software you are using has Y2K limitations. This can often be answered by consulting our web page on Application Software Y2K Compliance. That web page includes links to simple workarounds supplied by several software vendors.Problems can occur whether you're using commercial software with command files you've written (e.g., SAS, SPSS) or compiled programs such as home-grown Fortran, C, or Cobol programs. The problems may be compounded by the format of data files that you may be receiving from others.
To aid you, IT staff are preparing guidelines and procedures to do an initial assessment. Y2K issues are discussed at great length in print and on the web. Our suggestion is that you use our introductory descriptions to do the preliminary assessment. Then, if the effort seems warranted, consider reading more detailed discussions that emphasize the sometimes complex remediation techniques, as described in references given at the end of this document. Our discussion draws heavily upon Effective Remediation Strategies, by Michael Wheatley, IBM Corporation.
Risk Assessment
First, you should assess the risks the Y2K problem brings to bear on your data files and programs, and on your work in general.
- How important is it to fix a particular Fortran program, an SPSS command file, a spreadsheet macro or a database query in anticipation of a problem?
- Is it better to be cautiously watchful and to defer any corrective action until you find that you actually need to use the program? Will you have the time and knowledge to quickly perform deferred program remediation when the need does become critical?
- Or, can you conduct your work satisfactorily without that program?
Conversion Assessment
The second step is the determination of whether a problem may exist. Remember: the source of problems is the presence and ambiguity of dates where the year is represented by 2 digits. Therefore, the assessment may involve testing the program with data whose dates span the century mark and through careful examination of the results.Since the success of this testing depends on the quality and representativeness of the sample runs, you may not actually detect a potential problem. A more structured approach is generally advised which involves analysis of the data files and the program files that are being processed (e.g., SPSS, SAS) or compiled (e.g., Fortran, C). The remainder of this document presents specific program and data characteristics that might lead to Y2K problems. The mere presence of these characteristics does not mean that there will be a problem. It does provide you, however, with a basis for deciding how much remediation may be needed.
Potential problems in data files
- Look for dates where the year is represented by 2 digits. These may appear as character strings having one of the following forms.
mmddyy mm/dd/yy ddmmyy ddmonyy monyy julyy (Julian date) qqyy (Quarter Year)- Look for the use of the special codes to represent missing values in date fields. For example, does the data file use "00" or "99" to represent missing values such as "year not known" or "year not applicable" (rather than the actual value of the year's last 2 digits)? You may have once assumed that these were extremely unusual values that would never be needed as actual dates, and therefore could be used for special projects.
- Look for the use of the special codes to mean that something "never expires." For example, if the 2-digit year represents the expiration date or a license or credit card, it may no longer be acceptable to let "99" and "00" mean "never expire."
- In interactive programs that prompt a user to type a 2-digit year, check that "00", "99" or other 2-digit codes are not designed to allow the user to quit or terminate the program.
- Consider what your data providers are doing to prepare for the Year 2000, as well as what other recipients of your output data files are expecting from you as input to their programs. For example, suppose you acquire your data from a company that is planning to convert its data products so that all year-values in future data sets will be 4-digits, rather than 2-digits. You will then need to anticipate what changes will need to be made to your input programs.
- Evaluate how you transfer data files involving dates from one software package to another. For example, the approach you take transferring an Excel 7 file to Excel 5 or to LOTUS, dBase, or MS Access may have Y2K consequences. The straightforward CSV (Comma Separated Values) ASCII file is generally regarded as completely portable. However, it may not be if the data contains dates. The format used for the dates in the CSV file are whatever your default Windows data format is. So, if your default is dd/mm/yy, then correct 4-digit year-values in your spreadsheet will be output as 2-digit year-values and the century information will be missing from the CSV file. To complicate matters, if the software into which the CSV file is imported uses a different date rule for assigning the century, the newly dates may be mis-assigned to a different century. Before exporting to a CSV file, be sure to change the properties of cells containing dates so that their display format used 4-digit year-values. (In Excel, you can identify the cells and select Format/Cells to make this change.) For more details, see Patrick O'Beirne's Year 2000 and Spreadsheets discussion.
- Consider changing your PC's settings so that the default "short date" format uses 4-year dates. From the Windows desktop, select Start / Settings / Control Panel / Regional Settings / Date / Short date style. Then select either M/d/yyyy or MM/dd/yyyy. The default that Windows uses is a 2-year date.
- Determine which version of the database software you are using. Database programs need to interpret (and store) each 2-digit year that you type by inferring which century you intended. The rule used depends on the version of the software, among other things. For example, MS Access 95 and earlier versions originally interpreted "00"-"29" as "1900"-"1929" and stored the 4-digit year in the database file. In contrast, MS Access 97 interprets this range of values as "2000"-"2029." To complicate things for you further, the rule is not linked only to the version number (95 vs. 97) of MS Access. MS Access 95 uses a separate file called the OLE Automation Library (OLEAUT32.DLL) to determine the rule. After MS Access 95 was released, a newer version of OLEAUT32.DLL was distributed by Microsoft, changing the century-assignment rule. If you have installed a newer version directly (or a part of installing some other Microsoft product, you may be using the latter rule. Specifically, if you have version 2.20.4029 of the OLEAUT32.DLL, then you are using the newer rule. (You can determine this by entering a date and seeing what is stored, or by using MS Explorer to examine the Properties of the OLEAUT32.DLL file.) More details are in Dan Haught's paper Solving the Year 2000 Problem in Microsoft Desktop Application Programs .
- Determine how your spreadsheet program interprets 2-digit year-values as you type them into the spreadsheet. MS Excel and other spreadsheets use yet a different rule for determining century. For example, MS Excel version 5 interprets "00"-"20" as "2000"-"2020", and "21"-"99" as "1921"-"1999." But MS Excel version 7 moved the cutpoint from 20 to 30. Hence "00"-"30" becomes "2000"-"2030" and "31"-"99" becomes "1931"-"1999". Similar behavior is exhibited in MS Word, versions 6-8. The Dan Haught paper provides more details.
- Look for embedded dates in database files. Some relational databases are constructed so that the primary key field, the field that links records in one table to records in another, is composed on several items including a date. For example, a record of experimental data received from Cornell University dated June 15, 1913 (06/15/13) could be assigned the primary key value CU130615. In order to properly link related data files, the primary key's values must be unique, which is obviously problematic if "00" can be used for both "1900" and "2000."
Potential problems in program logic
There are several parts of the program logic that you should examine in light of the Y2K problem. In this section, the most likely categories or situations you will encounter are described.
- Date variables.
Look for program variables whose names suggest that they may be used to store date information, especially 2-digit years. These may include names that contain the following character strings: date, dt, year, yr, yob, yod, yymmdd, dmy, ymd, yrmonday, time, tm. While this is not an exhaustive list, searching for these name fragments may help locate many date-related variables in your program.
- Date constants.
Date constants may be character strings that are stored in special variables to represent dates. In Fortran, for example, these might appear in DATA statements, BLOCK DATA subprograms, or assignment statements, as illustrated below:CHARACTER*6 START START = '970615'
- Input-output (I/O) statements.
Since 2-digit year-values are often read or written by the programs, the corresponding variable names will appear in the I/O statements. Judicious inspection of I/O statements (e.g., read, write, print, printf) are therefore likely to reveal potentially problematic variables.
- Output formats.
Closely related to I/O statements are the formats used as input and output masks. In Fortran programs, for example, formatting codes such as "2H19,I2" or "'19',I2" may signal the presence of 2-digit year-variables where the year is assumed to be in the 1900s. Similar formats, such as %y, in languages like C and C++ are also potential markers of Y2K problems. Statistical packages such as SPSS and SAS have many date formats. The subset most likely to reveal potential problems are DATE, MMDDYY, DDMMYY, QYR, MONYY, EDATE and JDATE.
- Functions, including system date functions.
Most built-in time and date functions in programming languages and application programs are not a problem since they accept and properly process 4-digit year-values. However, many do accept 2-digit year-values and use a cutoff date to determine the century. In SAS, for example, the utoff value assigns the 2-digit year "xx" to "19xx" for all values of "xx." However, you can use the SAS system option YEARCUTOFF to change this rule. SAS functions, such as the Julian date function DATEJUL(), determine the century based on YEARCUTOFF. Excel and SPSS users should note that their date functions always interpret 2-digit year-values as being in the 1900s. Changing this requires that you use 4-digit years or write special code to assign certain "xx" values to "20xx."
In C and C++, use of the %y format yields a 2-digit year-value. When employed with the functions strftime(), cftime(), ascftime() and date(), check that the functions have not been used in a way that inappropriately forces the century to be the 21st century, regardless of the 2-digit date. Finally, although Sun's Pascal compiler's date() function uses a 2-digit year-value, Sun will soon provide a 4-digit year function fdate(). More details on Sun compilers are in Sun Microsystems' "Year 2000 Developer's Guide."
- Home-grown day of week routines.
A program that determines the day of the week corresponding to a user-supplied mm/dd/yy date must know the century as well. For example, 1/1/01 was a Tuesday in 1901 but a Monday in 2001. Since many routines do not properly identify leap years, check that your day-of-week routine knows that 2000 is a leap year. Good test dates include 1/1/2000 (Saturday), 1/29/2000 (Tuesday) and 3/1/2000 (Wednesday). Others are provided in Sun's "Year 2000 Developer's Guide."
- Sorting.
Two-digit years are likely to cause unexpected results when sorting is based on dates represented by character strings such as yymmdd. For example, the three dates July 12, 1997 (970712), November 5, 1998 (981105) and April 15, 2002 (020415) will not be sorted properly, resulting in the 2002 date being out-of-order.
- Arithmetic operations.
Look for computations of date and time differences. If one 2-digit year-value is subtracted from another, the answer will only be correct if both years are on the same side of the century divider. Furthermore, you may need to understand the original intentions of your code when you encounter an expression such as YEAR-25. Does the expression mean a date 25 years earlier or does it mean the number of elapsed years since 1925?
- SQL (Structured Query Language) expressions.
SQL expressions are commonly found in database and statistical package command files. You should locate the SQL expressions in your programs to ensure that they are not dependent on logic that assumes that the dates are always in the 1900s. Watch for expressions using the LEFT and RIGHT functions to manipulate character-strings in which dates are stored. For example, an expression likeSELECT ... FROM table ... WHERE ( RIGHT( [datevariable],2) > 92 ;might be testing whether the year is later than 1992. This WHERE statement would obviously fail for years after 1999.
Techniques
There are several ways to locate potential Y2K problems. Unfortunately, no single approach is comprehensive or even guaranteed to work. In most cases, the data values, variable names, functions, and program logic identified by the procedure as potential problems are perfectly fine and harmless. In other cases, non-apparent real problems may not be identified at all. In the final analysis, knowing the details and assumptions of your program or source code is the real key to proper problem identification and remediation. Nonetheless, there are simple mechanical aids you should consider using to locate potential problems.
- Text editors and search routines (e.g., UNIX commands like grep, Excel commands like Edit / Find) can help identify locations of date variables with commonly used names or character string patterns (e.g., ??/??/??).
- The Fortran compiler used with the -Xlist option generates lists of variable names and cross-listings that identify characteristics of variables and the statements on which the variables appear. For example,
f77 -Xlist myprog.f
- Source browser environments such as Sun's Workshop Analyzer will allow you to step through a program and examine the results of executing every (or user-specified) lines in a program. In cases where you are not familiar with the source code, these programming environments provide an inside look at the quantities being calculated. These can be used with Fortran, C, C++, and Pascal.
- Statistical packages typically have commands that list the variable names and their characteristics. For example, SAS uses "proc contents"; SPSS uses "display names" and "display dictionary".
- Sun Microsystems has written two system utilities, "y2000" and "y2000_usage", that scan object files (e.g., the executable files that result from compiling Fortran, C, C++, Pascal programs). These search for the presence of calls to system routines such as asctime, ctime, getdate, gmtime, localtime, mktime, strftime, strptime, time, printf, gettext,. If such a dependency exists, the binary is further examined for a %y or 19% string, whose presence may suggest a Y2K problem. This is not only the most difficult way to deal with Y2K remediation, it is the most likely to fail since object code remediation is like working in the dark. There are many other potential problems that will be impossible to uncover without examining the original source code. Consequently, object code remediation should be viewed as a measure of last resort. Send mail to consult@udel.edu for more information on these programs.
- COBOL code can be analyzed by a large number of commercial Y2K analysis programs. Although the University does not license any of these, there are many available. If you are concerned about COBOL programs that you are running on Strauss using the AcuCOBOL compiler, please contact the Information Technologies Help Center at 831-6000. The AcuCOBOL compiler itself is Y2K-compliant.
More on the Y2K Problem
There are two very clear introductory overviews to the Y2K issues. IBM has an on-line animated presentation that you can view on a Windows 95 system. The presentation takes approximately 1-2 hours to complete. Microsoft also has a 20-minute, on-line video presentation.K.C. Bourne's Year 2000 Solutions for Dummies (IDG Books Worldwide, 1997) is another short, approachable resource.
Microsoft maintains a web site providing Y2K-relevant information on each of its products.
Information on the compilers and application software available on the University's central UNIX systems is on the University's Year 2000 Compliance for Application Software page.
The article Effective Remediation Strategies, by Michael Wheatley, IBM Corporation, provides a good overview of typical ways in which Y2K problems can be resolved. It is especially pertinent to user-written application software, rather than microcomputer-based applications like spreadsheets and databases.
Main Y2K Help Page
Questions or Comments
Copyright © University of Delaware, 1998.
Last Updated: July 7, 1998