Dr. Dobb's Journal May 1998
As the millennium looms, a computer pioneer has come out of retirement to create a product he claims will solve the Year 2000 problem. And he claims it will do so an order of magnitude faster than other approaches.
This veteran programmer knows all about the Year 2000 problem, because he helped create the Cobol language in which much of the problem code was written, he created many of the standards for data encoding that have existed since the early days of computing, and he wrote the Datamation article back in the 1970s that alerted many to the two-digit date problem that others are just waking up to now.
Bob Bemer is a programmer from the generation of legends: He worked with Fortran creator John Backus and COBOL creator Grace Murray Hopper and IBM 360 designer Fred Brooks. He was enjoying a well-earned retirement when he hit upon his idea for solving the Year 2000 problem. While another senior citizen might have been satisfied to pass the idea on to a younger programmer, Bob moved to Texas, set up a company, developed a software product, and started talking to the press. I recently talked with Bob about his past contributions to computer programming and his proposed solution to the Y2K problem. Here's what he had to say.
DDJ: When I think about the standards you are responsible for and the projects you have worked on in your career, it seems that the name "Bob Bemer" ought to be a household word among programmers. But I suspect that's not the case. Maybe after this interview, that will change a little.
You're known, among those who have heard of you, as the inventor of ASCII, the inventor of the escape sequence, the guy who named Cobol, a pioneer in word processing and time sharing and international standards for data processing. I want to hear about your Year 2000 solution, but I'd also like to know how you came to invent ASCII.
BB: I made a survey in 1960 [while working for IBM] and found out there were over 60 different ways the alphabet was coded for various computers. So I started to pick out the problem of interchanging files, and I started making proposals for a single code. Before that I did the character set for the Stretch machine at Los Alamos. That was the first eight-bit-byte computer that I know of. But I made a mistake. I put the alphabet in as capital A, lowercase a, capital B, lowercase b. And that was stupid. It was then that I wound up with the escape sequence idea that I published sometime in 1960.
DDJ: But ASCII became more than an IBM standard. How did the internationalization of ASCII come about?
BB: I was invited to talk to the British Standards Institution and I got to go to the electronic industry association. And finally I was called by two IBM vice presidents. They said they would like to revitalize what was then called the OEMI -- the Office Equipment Manufacturer's Institute -- and they wanted proposals from me for what should be done in the way of computer standards. We had a big meeting, the first meeting of X3 under ANSI auspices. And that later became an ISO committee when we went international with the standards for computers.
Then came the fateful day. We had ASCII going. We were about to sign off and the 360 was about to get out. And Freddie Brooks tells me that the printers and punches are not ready in ASCII.
DDJ: They were still designed for EBCDIC?
BB: Right. And one manager, now dead (but I won't say "God rest his soul") decided they were going to do both. They would put in p bit, and if the p bit was 0, the machine would run in EBCDIC. If the p bit was 1, it would run in ASCII. He thought that was a reasonable way to solve the problem, because he had to announce the 360 in a hurry. So they did. Unfortunately nobody told the programmers, and they did all their systems programming in EBCDIC. As a result, they couldn't make the thing run in ASCII. So ASCII originated at IBM, but they didn't follow through with it. Isn't that a crazy story?
DDJ: Okay, I also have to ask, what was your involvement with Cobol?
BB: I came to IBM in late '55 to [write] a system that allowed you to use the 705 commercial computers for scientific work. It was a big success, but when I did that, I was in the same room as John Backus. And I was watching the Fortran work, and saying, hell, this has got to happen.
So I got hold of [A.J.] Perlis and he allowed us to use his compiler from Carnegie Tech. And we chucked that in under a thing we called "Fortransit," which is the first time we had a programming language that worked on different computers.
And in January of 57, I got to meet Dr. Grace Hopper of the Franklin Institute in Philadelphia. And I started working on commercial translators. And we sort of blended all those things together and came up with Cobol.
Probably it was a good thing in one way, but it was a mistake in other ways because not everybody who got in the act of programming computers was careful to annotate it. People made mistakes. Like in computing the year.
DDJ: What was your main contribution to Cobol?
BB: I guess the major thing I did for Cobol was the picture clause.
DDJ: Which is?
BB: The picture clause? That's where you say, I have a piece of data, this is its domain, and these are its characteristics. If you said S99, that was going to be a two-digit number, signed. And we used various other symbols to say this is all alphabetic, punctuation, and the like. This is the first data typing. If you look at some of the later languages like ADA, PL1, everybody brags about heavy typing, you know, very strict. Control programmers against their own mistakes. Well, that was the first data typing.
DDJ: So you invented data typing?
BB: I never thought of that. I didn't [actually] create the picture clause, I [really] created data typing.
DDJ: All right, let's talk about the Year 2000 problem. I've heard all sorts of estimates for the magnitude of the problem. What's yours?
BB: I talked to a Gartner Group guy last November, and asked, how's your $600 billion dollar estimate holding up? And he said, real fine. When people start talking about it getting into the trillions, I won't be surprised, except I'm going to try to bring it down. If the U.S. got our act together 100 percent, we'd still be in terrible shape if the Asians didn't and the Europeans didn't. And the Europeans are trying to do monetary conversion to the Euro and the Year 2000 [solution] at the same time.
I heard from a guy at Microsoft that they moved the clock ahead on the equipment up at the Hannaford atomic energy place and they blew all the air conditioning, they've gotta get new air conditioning.
DDJ: That's scary.
BB: We know there's a lot of nuclear stuff that's gone haywire. A couple of months back, I talked to the gal who's running the Year 2000 program for the State of Texas, and they have a nuclear reactor, but she admitted she doesn't know what they're doing.
DDJ: The naïve question I sometimes hear from people is, how hard can it be to find references to dates in source code and change them? I guess one of the several ways that question is naïve is that it assumes the source program is even available.
BB: Source code is not around in about 30 percent of cases.
DDJ: And in some cases people may have the source but it's not the latest version. Or maybe the code hasn't been recompiled since the last time the compiler was revved and might break just as a result of recompiling.
BB: Yup. There are a lot of mismatches out there.
DDJ: And, of course, it's both the programs and the data.
BB: And when you try to stretch it out with expansion, you know, put in an extra 19 and an extra 20, everything gets shoved over. And those poor compiled programs, particularly the object programs, don't know where anything is anymore.
DDJ: So we are in this mess all because a bunch of Cobol programmers wanted to save a few bytes of memory?
BB: No. Everybody blames the programmers, but it wasn't their fault. It was people [like managers, customers]. They were happy they didn't have to use 19s. There were international standards, [but people wanted to use] month-day-year instead of year-month-day, like computers had to have it.
This complicated the problem tremendously. If you write the year first, you don't have to go through a lot of fancy computations. People say that they didn't have enough memory, that it was expensive. Well, if they'd used the proper order they would have had plenty of memory because they could have saved all the routines they had to have to jockey it around for compares and to write it on this or that report. I get pretty vehement about that. As a programmer, I don't think we're at fault at all.
DDJ: You have a solution to the Year 2000 problem, but your solution is different from anything else out there. Apparently you don't think much of approaches like date-field expansion and windowing.
BB: Date-field expansion will never make it. They thought so a year and a half ago, but now they don't have time.
It means you actually rewrite the Cobol program and say pic 9999 instead of pic 99 and if you're lucky you'll find all the things in the program that are dates. But as a practical matter you can't. I use the example of an insurance policy number where they've used the date of issue as part of the policy number. You can't insert a 19 in an insurance policy number.
DDJ: And windowing?
BB: Also flawed. For one thing it's evanescent. Another thing is a lot of people want sliding windows and by that time you don't know what's going on.
And then you've got these 28 guys. You know that one? You know what happens? Well on IBM equipment when you say pic 99, boy, that's always gonna be a positive number, right? Well, take my birthdate, 1920. That's stored as 20, you subtract 28 and you get minus eight and the compiler says, I know this value is always positive but just to be safe, I'm gonna overwrite it to make it positive, and I have every right to do that. And now I'm a plus eight, right? And on the way out, you add 28 back, and I was born in 1936. Well, I can use the extra years, but it's just not gonna work. And there are people out there selling that.
DDJ: What you've developed is an object-code solution. You piggyback millenium and century marks onto the two year bytes. As I understand it, you tweak the data in this way and then you sidetrack the object code that deals with dates into subroutines that understand this tweaked data. But maybe you'd better explain it.
BB: We've made a microprogram that will operate on Bigit arithmetic. Bigit is short for "Bemer digit."
DDJ: Bigitizing data is what I called tweaking it. How does that work?
BB: We put a millenium century indicator over the decade digit so we can handle dates from 1600 up to 2099. The problem, of course, is that the hardware won't handle it. So whenever we think we're going to come upon one of these things we say, hey we have to handle that by a subroutine. And we examine the object program. First we examine the opcode and if it says drive the printer, we say, hey we're not doing any arithmetic on years. If it says add and it's a decimal add, we derail that. But only if we look at the operands and they're of appropriate length. If it's an add of two 16-digit operands, we say that has nothing to do with years and we don't derail it. But every time we come across an opcode that could possibly be an [appropriate] arithmetic instruction, we derail it to this subroutine and we put it itself in a table of derailed instructions.
So when the object code runs, it hits this derail, indexes over the instructions in there, plays like a regular accumulator in a machine, goes out and gets the operands, inspects them, and if one of them has a Bigit and you're making a comparison and the other doesn't have a Bigit in it, it says, wait a minute. How can you be comparing a year value to a nonyear value? Maybe that should be a Bigit, too.
When we get down to that point we expand it to either 16, 17, 18, 19, or 20, do the arithmetic, recompress it, put it back where the answer is supposed to go and then come back to the original machine. I describe this as following the Napoleonic code where you're guilty until proven innocent.
But when you are proven innocent, when we find that it's not [date arithmetic], we take it out of the loop and put the original instruction back where it was. And thus we've reduced any time we can. However we've got a lot of people who say we don't care if it does add time; just fix it. But, as a matter of fact, we don't add all that much time. Can't detect it.
DDJ: And doesn't the Bigitized data look the same as the old data to old unmodified programs?
BB: That's another nice thing about this, as it applies to IBM equipment at least. They use packed decimal instructions, and when you pack a number only [one] zone gets [retained]. All the other zones disappear. If you run the old program against the old data, that's what you had, right? If you run the enabled program against the old data, there isn't any Bigit data there, so the program runs exactly as before. Now here's the kicker: When you run the old program against the new data, it doesn't see the decade zone because it gets tossed away in the packed decimal operation. And you still get the same wrong answer. Or if you were getting correct answers, no problem.
What that means is that you can put up one application at a time. All the other approaches make you change all your applications and databases simultaneously.
So, even if the first application Bigitizes the data, the other applications can still run as they were. They may be wrong but at least they're running.
DDJ: So where does the order of magnitude advantage over other approaches come from?
BB: We don't touch the source program and we don't cause any errors to be made in the source program. So we don't have this 50 percent testing time that everybody else has. We can test statistically. If it ran before there's good odds it'll run again if our device works properly. And instead of m users times n programs, we only have to check one program -- ours.
Now if your testing time goes down from 50 percent to 5 percent, you've got time before 2000 to go through [the code] in case there's any really weird stuff from programmer stupidity or clever tricks.
DDJ: Thanks Bob.
Bob's approach won't help you if you're working on PCs or with embedded systems: It's strictly a mainframe fix, at least so far. But that's where Bob's expertise is. There were a lot of things Bob and I didn't have time to get into, such as how his software figures out what data needs to be Bigitized. You can read Bob's many articles on the Year 2000 problem and his product, Vertex 2000, at the BMR Software web site (http://www.bmrsoftware.com/).
By the way, it's now a little easier to remember the URL for my web site, where I keep updated links, corrections, and reader feedback for this column. It's http://www.swaine.com/. And my new e-mail address is firstname.lastname@example.org.