Atari BASIC: A High-Level Language
Translator
The programming language which has become the de facto
standard for the Atari Home Computer is the Atari 8K BASIC
Cartridge, known simply as Atari BASIC. It was designed to
serve the programming needs of hoth the computer novice and
the experienced programmer who is interested in developing
sophisticated applications programs. In order to meet such a
wide range of programming needs, Atari BASIC was designed
with some unique features.
In this chapter we will introduce the concepts of high level
language translators and examine the design features of Atari
BASIC that allow it to satisfy such a wide variety of needs.
Language Translators
Atari BASIC is what is known as a high level language translator.
A language, as we ordinarily think of it, is a system for
communication. Most languages are constructed around a set
of symbols and a set of rules for combining those symbols.
The English language is a good example. The symbols are
the words you see on this page. The rules that dictate how to
combine these words are the patterns of English grammar.
Without these patterns, communication would be very
difficult, if not impossible: Out sentence this believe, of make
don't this trying if sense you to! If we don't use the proper
symbols, the results are also disastrous: @twu2 yeggopt
gjsiem, keorw?
In order to use a computer, we must somehow
communicate with it. The only language that our machine
really understands is that strange but logical sequence of ones
and zeros known as machine language. In the case of the Atari,
this is known as 6502 machine language.
When the 6502 central processing unit (CPU) "sees" the
sequence 01001000 in just the right place according to its rules
of syntax, it knows that it should push the current contents of
Chapter One
the accumulator onto the CPU stack. (If you don't know what
an "accumulator" or a "CPU stack" is' don't worry about it.
For the discussion which follows, it is sufficient that you be
aware of their existence.)
Language translators are created to make it simpler for
humans to communicate with computers. There are very few
6502 programmers, even among the most expert of them, who
would recognize 01001000 as the push-the-accumulator
instruction. There are more 6502 programmers, but still not
very many, who would recognize the hexadecimal form of
01001000, $48, as the push-the-accumulator instruction.
However, most, if not all, 6502 programmers will recognize
the symbol PHA as the instruction which will cause the 6502
to push the accumulator.
PHA, $48, and even 01001000, to some extent, are
translations from the machine's language into a language that
humans can understand more easily. We would like to be able
to communicate to the computer in symbols like PHA; but if
the machine is to understand us, we need a language translator
to translate these symbols into machine language.
The Debug Mode of Atari's Editor/Assembler cartridge, for
example, can be used to translate the symbols $48 and PHA to
the ones and zeros that the machine understands. The
debugger can also translate the machine's ones and zeros to
$48 and PHA. The assembler part of the Editor/Assembler
cartridge can be used to translate entire groups of symbols like
PHA to machine code.
Assemblers
An assembler - for example, the one contained in the
Assembler/Editor cartridge - is a program which is used to
translate symbols that a human can easily understand into the
ones and zeros that the machine can understand. In order for
the assembler to know what we want it to do, we must
communicate with it by using a set of symbols arranged
according to a set of rules. The assembler is a translator, and
the language it understands is 6502 assembly language.
The purpose of 6502 assembly language is to aid program
authors in writing machine language code. The designers of
the 6502 assembly language created a set of symbols and rules
that matches 6502 machine language as closely as possible.
This means that the assembler retains some of the
Chapter One
disadvantages of machine language. For instance, the process
of adding two large numbers takes dozens of instructions in
6502 machine language. If human programmers had to code
those dozens of instructions in the ones and zeros of machine
language, there would be very few human programmers.
But the process of adding two large numbers in 6502
assembly language also takes dozens of instructions. The
assembly language instructions are easier for a programmer to
read and remember, but they still have a One-to-one cor–
respondence with the dozens of machine language
instructions. The programming is easier, but the process
remains the same.
High Level Languages
High level languages, like Atari BASIC, Atari PILOT, and Atari
Pascal, are simpler for people to use because they more closely
approximate human speech and thought patterns. However,
the computer still understands only machine language. So the
high level languages, while seeming simple to their users, are
really much more complex in their internal operations than
assembly language.
Each high level language is designed to meet the specific
need of some group of people. Atari Pascal is designed to
implement the concept of structured programming. Atari
PILOT is designed as a teaching tool. Atari BASIC is designed
to serve both the needs of the novice who is just learning to
program a computer and the needs of the expert programmer
who is writing a sophisticated application program, but wants
the program to be accessible to a large number of users.
Each of these languages uses a different set of symbols and
symbol-combining rules. But all these language translators
were themselves written in assembly language.
Language Translation Methods
There are two different methods of performing language
translation - compilation and interpretation. Languages which
translate via interpretation are called interpreters. Languages
which translate via compilation are called compilers.
Interpreters examine the program source text and simulate
the operations desired. Compilers translate the program source
text into machine language for direct machine execution.
Chapter One
The compilation method tends to produce faster, more
efficient programs than does the interpretation method.
However, the interpretation method can make programming
easier.
Problems with the Compiler Method
The compiler user first creates a program source file on a disk,
using a text editing program. Then the compiler carefully
examines the source program text and generates the machine
language as required. Finally, the machine language code is
loaded and executed. While this three-step process sounds
fairly simple, it has several serious 'gotchas."
Language translators are very particular about their
symbols and symbol-combining rules. If a symbol is
misspelled, if the wrong symbol is used, or if the symbol is not
in exactly the right place, the language translator will reject it.
Since a compiler examines the enure program in one gulp, one
misplaced symbol can prevent the compiler from
understanding any of the rest of the program - even though
the rest of the program does not violate any rules! The result is
that the user often has to make several trips between the text
editor and the compiler before the compiler successfully
generates a machine language program.
But this does not guarantee that the program will work. If
the programmer is very good or very lucky, the program will
execute perfectly the very first time. Usually, however, the user
must debug the program.
This nearly always involves changing the source program,
usually many times. Each change in the source program sends
the user back to step one: after the text editor changes the
program, the compiler still has to agree that the changes are
valid, and then the machine code version must be tested again.
This process can be repeated dozens of times if the program is
very complex.
Faster Programming or Faster Programs?
The interpretation method of language translation avoids many
of these problems. Instead of translating the source code into
machine language during a separate compiling step, the
interpreter does all the translation while the program is running.
This means that whenever you want to test the program you're
writing, you merely have to tell the interpreter to run it. If
things don't work right; stop the program, make a few
changes, and run the program again at once.
Chapter One
You must pay a few penalties for the convenience of using
the interpreter's interactive process, but you can generally
develop a complex program much more quickly than the
compiler user can.
However, an interpreter is similar to a compiler in that the
source code fed to the interpreter must conform to the rules of
the language. The difference between a compiler and an
interpreter is that a compiler has to verify the symbols and
symbol-combining rules only once - when the program is
compiled. No evaluation goes on when the program is
running. The interpreter, however, must verify the symbols
and symbol-combining rules every time it attempts to run the
program. If two identical programs are written, one for a
compiler and one for an interpreter, the compiled program will
generally execute at least ten to twenty times faster than the
interpreted program.
Pre-compiling Interpreter
Atari BASIC has been incorrectly called an interpreter. It does
have many of the advantages and features of an interpretive
language translator, but it also has some of the useful features
of a compiler. A more accurate term for Atari's BASIC
Language Translator is pre-compiIing interpreter.
Atari BASIC, like an interpreter, has a text editor built into
it. When the user enters a source line, though, the line is not
stored in text form, but is translated into an intermediate code,
a set of symbols called tokens. The program is stored by the
editor in token form as each program line is enterred. Syntax
and symbol errors are weeded out at that time.
Then, when you run the program, these tokens are examined
and their functions simulated; but hecause much of
the evaluation has already been done, the execution of an Atari
BASIC program is faster than-that of a pure interpreter. Yet
Atari BASIC's program-building process is much simpler than
that of a compiler.
Atari BASIC has advantages over compilers
and interpreters alike. With Atari BASIC, every time you enter a
line it is verified for language correctness. You don't have to
wait until compilation; you don't even have to wait until a test
run. When you type RUN you already know there are no
syntax errors in your program.
Chapter OneInternal Design
Overview
Atari BASIC is divided into two major functional areas: the
Program Constructor and the Program Executor. The Program
Constructor is used when you enter and edit a BASIC program.
The source line pre-compiler, also part of the Program
Constructor, translates your BASIC program source text lines
into tokenized lines. The Program Executor is used to execute
the tokenized program - when you type RUN, the Program
Executor takes over.
Both the Program Constructor and the Program Executor
are designed to use data tables. Some of these tables are
already contained in BASIC's ROM (read-only memory).
Others are constructed by BASIC in the user RAM (random-
access memory). Understanding these various tables is an
important key to understanding the design of Atari BASIC.
Tokens
In Atari BASIC, tokens are the intermediate code into which
the source text is translated. They represent source-language
symbols that come in various lengths - some as long as 100
characters (a long variable name) and others as short as one
character ("+" or "-"). Every token, however, is exactly one
eight-bit byte in length.
Since most BASIC Language Symbols are more than one
character long, the representation of a multi-character BASIC
Language Symbol with a single-byte token can mean a
considerable saving of program storage space.
A single-byte token symbol is also easier for the Program
Executor to recognize than a multi-character symbol, since it
can be evaluated by machine language routines much more
quickly. The SEARCH routine - 76 bytes long - located at
$A462 isa good example of how much assembly language it
takes to recognize a multi-character symbol. On the other
hand, the two instructions located at $AB42 are enough to