A Simple Scripting Language – Part 2 

Part 1 | Part 2 | Part 3 | Part 4 | Part 5

The Look-ahead Reader

The first stage of any compiler is  simply being able to read characters from a text file.   While not the most glamorous part, what we will do in this post will set the foundation for everything that comes later.   For the sake of simplicity and maintaining cross platform compatibility I chose to use C and read script files using the stdio file operators (fopen, fread, fclose).

If you need to brush up on your file access functions  this site should get you up to speed.

Reading text from a file, that doesn’t seem too hard?

Not really, but there are several nuances in addition to reading characters.

As we all have been taught, calling a system interrupt repeatedly (fread is an abstraction on one) is hazardous to your program’s performance.  Instead of reading 1 character at a time, we will read chunks of 1024 characters a time.    The naive approach would be to load 1024 bytes from the file.  Once all that text is processed, load the next 1024 until we reach the end of file.

At this point we would have a BufferedReader.

In order to facilitate next weeks step, Lexical Analysis, we also need the ability to peek (look at without reading) several characters in advance (thus the term Look-ahead Reader).  Lexical Analysis is the act of taking characters and transforming them into meaningful symbols. (ie the characters “i” “n” “t” form “int” which defines a type of data).

However a simple BufferedReader is not sufficient to support peek operations.  For example, if we are currently examining the last character in the buffer, the next character cannot be peeked at without overwriting all the data in the buffer.  However if we split our buffer into 2 halves (bufferA and bufferB) we can easily support look-ahead.

  1. initially read both bufferA and bufferB
  2. when  you’ve processed all the data in bufferA and have started reading bufferB, populate bufferA from the file
  3. when you’ve processed all the data in bufferB and have started reading bufferA, populate bufferB from the file
  4. when you’ve reached the end of file stop populating the buffers.

This will guarantee that there is always half a buffer size of text available to peek at at all times.  Special care must be taken in the cast that a user tries to peek beyond the end of the file.   A meaningful character should be returned to indicate this condition.  In this implementation I chose to return a ‘\0′ (NULL character).

Also to allow for meaningful error messages later on, the Look-ahead Reader will keep track of the current line and column in the file. Nothing is more frustrating than knowing there is an error but having no idea what code caused it.

What does the interface look like?

I’ve taken a pseudo object-oriented approach to structuring my C, borrowing slightly from Objective-C.  The following are the functions contained within LookaheadReader.h.

int LookaheadReader_init(LookaheadReader * reader, const char * path);

Initialize the passed “reader” to read from the file indicated by “path”.   It opens the file, and populates the internal buffer with 1024 characters.

char LookaheadReader_read (LookaheadReader * reader);

Returns the next character, and reads chunks of 512 bytes from the file when needed into the internal buffer.  If you try to read past the end of the file ‘\0′ will be returned.

char LookaheadReader_peek(LookaheadReader * reader, int index);

Peeks at the next nth character.  0 indicates the next character to be read, 1 the one after that… etc.  If you try to peek past the end of the file ‘\0′ will be returned.

void LookaheadReader_eatWhitespace(LookaheadReader * reader);

Discards characters until it peeks a non white-space character.

void LookaheadReader_dealloc(LookaheadReader * reader);

Closes the file.

Lets see some code!

For this project I will be supplying the C code under the MIT license.  If there is enough interest in porting the code to another language besides C, I may make additional versions available.


Next week, we’ll start planning the syntax of our language and use the Look-ahead Reader to create meaningful symbols out of a text file.

You can follow myself or Incubator Games on twitter at @igtristan or @incubatorgames respectively.

See you next week!

Posted by Tristan Campbell
@igtristan, programmer at Incubator Games
IncubatorGames Says:

[Post] A Simple Scripting Language – Part 2 – http://www.incubatorgames.com/index.php/

igtristan Says:

RT @IncubatorGames: [Post] A Simple Scripting Language – Part 2 – http://www.incubatorgames.com/index.php/

Tweets that mention A Simple Scripting Language – Part 2 | Incubator Games -- Topsy.com Says:

[...] This post was mentioned on Twitter by Radek Koncewicz and Tristan Campbell, Incubator Games. Incubator Games said: [Post] A Simple Scripting Language – Part 2 – http://bit.ly/fk3Bfe [...]

infey Says:

RT @IncubatorGames: [Post] A Simple Scripting Language – Part 2 – http://www.incubatorgames.com/index.php/

justradek Says:

RT @IncubatorGames: [Post] A Simple Scripting Language – Part 2 – http://www.incubatorgames.com/index.php/

raketenpaul Says:

The C stream stuff is already buffered, so no need to do this on your own. There is also ungetc() which basically allows to peek at the input stream.

Tristan Says:

Good point! Generally I don’t like to tie functionality closely with underlying system calls.
If I swapped in non buffered file access from a different platform I’d have to rewrite the whole implementation.

A Simple Scripting Language – Part 3 | Incubator Games Says:

[...] this post we’ll be looking at Lexical Analysis.  In the previous article in the series, I created an interface for reading and peeking characters from a file, now it is [...]

A Simple Scripting Language – Part 4 | Incubator Games Says:

[...] 1 | Part 2 | Part 3 | Part 4 | Part [...]

A Simple Scripting Language – Part 5 | Incubator Games Says:

[...] 1 | Part 2 | Part 3 | Part 4 | Part [...]