More Detailed Tech Information for my Binary Star PC-Engine musicdisk / ROM

Several people have asked that I explain some of the music or graphic techniques that I used in my demo.  I'm a bit surprised, but I guess the demoscene and retro console (homebrew) scene don't overlap as much as I had thought, so knowledge of what the PCE can and cannot do is not widespread enough yet.

(Perhaps this write-up is in a bit too much detail.  Well, you asked for it.  Feel free to read as much or as little as you want of this doc.)

Music Conversion Details

Graphic Effect Details

 

First, a comparison between the PC-Engine's sound capabilities and the MSX with SCC cartridge:

PC-Engine
MSX
(MSX) SCC cartridge
CPU
8-bit 65C02 variant with enhancements
8-bit Z-80A
-
sound channels
6
3
5
sound hardware type
32-byte wavetable memory per channel
AY-8910: fixed square-wave PSG
32-byte wavetable memory for first 4 channels
noise generation
last 2 channels have independent switchable noise generators
1 switchable noise generator can take over any or all channels
-
wave bit depth
5
-
8
frequency bit depth
12
12
12
volume bit depth
5-bit logarithmic + 4-bit individual L/R stereo panning
4-bit logarithmic (?)
4-bit linear
other features

2nd channel can use its waveform as an LFO modulating 1st channel.

Any or all channels can be switched into "direct digital output" for CPU-streamed samples.

Single-hit/looped 'Envelope Generator' can be triggered for any channel, producing (high-voltage) 'pops' and percussion effects.
CPU memory space bankswitching
CPU instructions to bankswitch eight 8K slots
Convoluted slot and subslot nonsense.  You f#*%ing figure it out!

 

Waveform Bit Depth

As we can see, the SCC sound itself is quite comparable to the PC-Engine's, though being monaural, lacking one channel compared to the PCE, and having no noise capabilities.  Its higher bit-depth for wave samples is a little enviable, although at 32 bytes per sample, such depth is negated slightly.  Such primitive waveforms need to have sharp transitions and extreme values (rather than subtle tones) in their waves anyway, otherwise they'd be rather inaudible at any volume.  Here is a plot of some "instruments" used by Konami on the SCC, and when mapped down to the PCE's 5-bit waveforms:

 

The PSG Envelope

Most SCC games use the built-in AY PSG unit of the MSX for extra square wave channels, noise hits, and the volume envelope.  Although the AY envelope has several cycling modes and an adjustable cycle frequency, most Konami game sountracks use it as a percussive hit.  And this is unlike that of the NES or SID; the AY envelope seems to produce a logarithmic voltage SPIKE that drowns out all other channels.  At any rate, it would be impractical to emulate it exactly on the PC-Engine, even if it were electrically possible.

 

Getting MSX SCC music on the PC-Engine

Usually, transplanting music from one platform to the other is done in one of several ways:

Guess which one I chose?  Yes, the last one, mostly fun and not very difficult to do.  The downsides of a logged format are in the sometimes very large data size, and lack of  (/ errors with) loop points.  Songs all have to be logged manually as well.

 

VGMs

The VGM format is a rather popular music container format spanning many different types of sound hardware, and fortunately including register logs of the AY PSG as well as the SCC chip, here called "K051649".  There are not a lot of MSX emulators that record VGMs, though I did manage to get headache-inducing OpenMSX to record VGMs for some games, and then for the remainder I used the VGM tunes at VGMrips.net.

Let's look at an example VGM, the stage 8 music to Space Manbow.  Downloading the song, you get a GZipped file Brilliance.vgz, but that needs to be un-GZ'ed, to its .VGM file, which comes in at 224 KBytes.  Hmm, much too large to use as-is.  Why?  Let's look into the file, at an arbitrary point:

VGM Bytes
...0A
77
A0 06 00
75
A0 07 B5
61 2A 00
D2 01 02 E0
73
D2 01 03 05
72
D2...
Meaning
    write 00 to AY register 6 wait 6 "samples" write $B5 to AY register 7 wait $2A "samples" write $E0 to SCC register 2 via port 1 wait 4 "samples" write 05 to SCC register 3 via port 1    

The VGM format defines a "sample" as 1/44100th of a second, which is very high precision for, for example, digital sample streaming, but is a bit troublesome for grouping register writes.  Well, I understand its purpose, but it's a bit too high-precision for a VBlank-based music player.  So, while an ideal VBlank wait will be (44100/60=) 735 samples, it never is exactly this, due to the delay of the game's music routine, as well as all register writes that precede said perfect VBlank wait.  So, I had to interpret a wider range of "sample" waits (example between 310 and 1000) as a single VBlank, 2 VBlanks, etc.  But I'm getting ahead of myself a little bit here...

So, the reason that any VGM file is quite large is that it has to accommodate all possible sound chips at the same time, via its first command byte.  Realistically, a computer or game system may have 2, or maybe 3 or 4 sound chips wired in at a single time, but more than this is rare.  As we can see, AY PSG register writes take up three bytes, and SCC writes take up four.  Writes to the SCC's wavetable memory are done one byte write after another in this way, meaning a (4 x 32) = 128-byte sequence of data in the VGM file for changing a single channel's waveform -- and this waveform might be written to multiple channels one after the other, or even the same waveform to the same channel, and this is stored blindly each time in the VGM file despite mass redundancy.

We can also see that the AY chip has fewer than 16 addressable registers, so a byte value in the VGM for this is a bit wasteful.  Volume and High Frequency register writes are often 4 bits only, so more waste with an 8-bit data value each time.  In total, this means a lot of wasted and empty bits, repeated / redundant writes, and on and on.  There are many things that can be optimized out with even a cursory examination of the format, while thinking we need only SCC + AY + VBl Timing + Sample writes.

 

Optimizing the byte / nybble / bit stream

So, I started writing a program in C to parse a VGM file and write out a much shorter, more optimized custom file that my PCE demo would use.  At this point an Information Theory guru would probably come in and call for automated binary Huffman coding of the VGM command stream.  And, in a sense, that's sorta what I did.  However, to keep things somewhat human readable (one of my goals, in case a sound or timing or loop point needed, for some reason, human tweaking) I have made the granularity of most commands nybble-based, that is, a stream of 4 bits at a time.

Both the SCC and AY have register addresses that fit into 4 bits (mostly) and many registers have only 4 or 5 necessary bits each, so most SCC register writes can be simplified into a single byte write, a savings of 75% in the best case.  In the worst case, since I give the SCC priority, the writes in the AY register sub-groups may take 2 or 2.5 bytes, still shorter overall than the 3-byte VGM data.  My converter also keeps track of the data that gets written to each register (and waveform sample RAM) and if the new byte matches what was previously written, it will not include the repeated write in the nybble file.

As far as Konami's usage of these sound chips, for a lot of title screen and intro music, all 5 SCC channels are being used, along with 1 or 2 AY channels, typically.  Unfortunately, this means that there just aren't enough free PCE channels (6 in total, remember) to reproduce all the music.  I had to make the difficult choice of dropping one of the lesser PSG channels to give priority to the richer SCC.  For in-game tunes, many games left SCC channel 0 unused, and AY channel C unused.  I presume they get utilized for sound effects.  This means 4 SCC + 2 PSG = 6 channels that can be mapped to the PC-Engine's sound hardware with little fuss.  Usually, AY channel A was the one used the most for envelope hits and noise triggers for percussion.  Thus, I gave AY channel A priority in shortening the nybble length in my output file.  I even wrote in some options in the converter to ignore any number of PSG channels, or to slide down to channel A the PSG channel which gets the most use / has noise triggers, to make the optimized file a little bit shorter.

==> Here is the eventual Nybble Stream format that I came up with as I developed my PCE playing code. <==

So now, let's look at the same spot in the VGM above, converted into my format:

Nybbles
...
F A 0
F C 5
2 E 0
3 5
...
Meaning
    write 00 to AY register 6   write 05 to AY register 7   write $E0 to SCC register 2 via port 1   write 05 to SCC register 3 via port 1    

So the 18 bytes of the VGM get pared down to 5.5 bytes in my nybble format, a savings of 69%.  Space Manbow's stage 8 music comes out in the end to be 24.5 KBytes, down from its original 224K!  This makes putting multiple songs into one PCE ROM more feasible.

The VGM spec has a field for a loop start point in a song, specified in its header.  I made my program simply insert this loop start point as a special command in my nybble stream.  While my PCE player is reading in nybbles, when it encounters this command ($FE00) it'll just store the current file pointer and bank number, ready for use later when it encounters an EOF marker ($FEF).  If the music data ends part-way through a byte, ie: an odd number of nybbles, the EOF marker is extended to $FEFF.  I also added a "fade out from here" command ($FF) for songs that don't loop, which can be added in manually by me.  These three commands are an even number of nybbles to allow for manual insertion in a hex editor.

 

What about SCC Waveform writes?

Every Konami SCC game updates its 4 wave memory areas of 32 bytes each in the VGM file, one register at a time and one command at a time.  Simply setting all wave areas to the same sawtooth wave takes up 576 bytes in the VGM file. This will not do.  So, my conversion program has four 32-byte buffers where waveforms are stored as the VGM file is parsed.  When the final register of any of these buffers is written to, the whole waveform is compared to past waveforms (from a library built up by my program) to see if it is unique.  If it isn't, then my code writes a waveform command at that point, pointing to a previous waveform (see the Nybble Stream doc, command $F4XX).  If the waveform was unique, it is added to the end of this library, the command written in my nybble stream, and this library of samples is written out to a new file when my conversion program exits.

So, the more VGM files that are passed through the converter, the more samples get built up into a single library of unique SCC waveforms -- many of which are shared among multiple different Konami games.  The other alternative I had considered was for me to search through each Konami game ROM to find their sample banks, but this would take time, and would not sift out unused and repeated waveforms among all game soundtracks.  At the same time as this waveform library gets written, the wave data gets reduced from the SCC (8-bit signed) format to one that the PCE will use (5-bit unsigned).

example output from out-TEXT.txt

...
PSG: REG $9 DATA $9
PSG: REG $4 DATA $20
PSG: REG $A DATA $D
Wait ROUGHLY 0 VBLANKS (added 43 clock ticks).
SCC: PORT 2 CHAN 3 VOLUME 4
SCC: PORT 2 CHAN 4 VOLUME 4
Wait ROUGHLY 1 VBLANKS (added 641 clock ticks).
PSG: REG $A DATA $B
Wait ROUGHLY 4 VBLANKS (added 2942 clock ticks).
PSG: REG $A DATA $A
...

example output to console (showing sample waveform matching):

2 PSG channels will be converted.
Looping Point reads: 131
1 samples (32 bytes) in sample input file.
No matches, added to bank 1, Chan 1 DATA $F409
Match at 1, Chan 2 DATA $F40A
No matches, added to bank 2, Chan 3 DATA $F413
No matches, added to bank 3, Chan 1 DATA $F419
Match at 3, Chan 2 DATA $F41A
No matches, added to bank 4, Chan 1 DATA $F421
No matches, added to bank 5, Chan 2 DATA $F42A
Match at 1, Chan 1 DATA $F409
Match at 1, Chan 2 DATA $F40A

...

 

Adding Hexion and OKI MSM 6295 "emulation"

One game that uses the SCC happens not to be on the MSX at all, but rather is a 1992 arcade game, named Hexion.  It had some pretty groovy music but uses an ADPCM chip for percussion.  It has under a dozen samples, mostly tom-toms, cymbal crashes, synth hisses and a TB-303 'tonk' sound.  Now, while I could have used one of the PCE channels to stream these samples in 5-bit at about 7 Khz, I chose not to.  One, it would make the rest of my interrupt timing and code more complex and fragile; and two, I had meant this to be a simple player, not some all-encompassing rabbit-hole where I chase ever-increasing features.  It probably would have sounded kinda cool, but the SCC melody was the main star, anyway.

So, I made a new command in my nybble format just for the OKI sample hit, and added processing of these sample commands to my converter.  The OKI ADPCM chip has settings for playing a specific sample with an adjustable volume (attenuation) setting.  However, Hexion used a fixed volume setting for each sample, I discovered, meaning even less data in the nybble stream to convey.  I could hardcode the relative volumes in my PCE player.  Thus I did the simple and straightforward thing: listen to each sample, and with a little frequency analysis, approximate it in the 6th PCE channel either with noise or a square wave at a certain frequency.  For example, cymbal crashes are pure noise, while a tom-tom has noise at the start, then a square sweep downwards to a bassy tone.  This noise/tone/frequency is controlled 60 times a second in my PCE music/sound code, while the starting volume for each "ADPCM" instrument is set at the start of a fading volume envelope.

 

Music Player on the PC-Engine Side

I wrote all of the PC-Engine code for my musicdisk in HuC6280 assembly.  It is a superset of the 6502, powerful, but still 8-bit.

This is more-or-less straightforward, since a lot of the processing (and hard decisions) was made at the VGM conversion stage.  First, the nybble data is handled as a streaming bank in CPU address space $4000-$5FFF.  The music code need only call Get_Nybble, in which case the file handling code reads the data pointed to, advances (or not) the pointer, discards either the top or bottom nybble, and returns it in the low nybble of the accumulator.  Get_2_Nybbles, for when the music code knows it needs a whole byte of data, returns a byte.  The music code then interprets the commands (as described) in the nybble stream, and writes to the PCE frequency registers, passes a volume command to a linear-to-logarithmic volume table, enables/disables a channel, etc.

To map each song's potential 8 SCC+PSG registers down to the PCE's 6, for each song I have a channel mapping table, which looks something like .db $01,$FF,$FF,$FF,$FF,$00, meaning AY channel B gets mapped to PCE channel 0 (SCC channel 0 is unused in that song); SCC channels 1-4 are mapped normally to PCE channels 1-4; and AY channel A is mapped to PCE channel 5.  Each time an SCC command is interpreted, my code checks to see if that channel's mapping in the table is $FF (ie: normal) and if it isn't, that command is ignored.  Similarly, when an AY PSG command is read, there is a routine that searches each byte in the mapping table for a matching channel byte ($00, $01, or $02) and writes PSG notes into the PCE channel where it found a match.  This way, SCC and PSG channels can be disabled and moved around to wherever is most convenient.

The AY envelope emulation that I implemented is rudimentary enough, but it is good enough to add a "POP" to games such as Solid Snake that use it for percussion both with square waves and noise.  The envelope is a simple decaying volume slide that overrides the normal AY volume slides for channels that use the envelope.

For the undulating & shining box graphic effect seen in Binary Star, my music code also counts how many times the AY envelope is triggered (a simple "BPM" counter) and uses that value to scroll the sine wave slowly or quickly, so that the speed of the graphic effect matches the tempo of the music a bit.  The X-compression of the sine wave is also determined by the note pitch of one of the sound channels, while the Y-compression varies linearly over time, so that even a slow-tempoed song has the shining effect rotating and alternating between moving downwards and upwards.


Graphic Effects

The few effects in Binary Star are not that special, but I'll explain them anyway in reference to the PC-Engine's capabilities.

The PCE's VDC (graphics chip) has a single tile-based background layer that can be scrolled horizontally & vertically each scanline (using line interrupts), 64 sprites with a maximum size of 32x64 pixels each, and 4bpp tiles (15 colours + transparent) per tile.  It is attached to 64K of VRAM.  Background and Sprite tiles use a different format, so they can't be shared.  The VCE (colour encoder) chip has 512 palette entries, 256 each for BG and sprites, and each palette entry is 9-bit (3 bits each of R,G,B).  Thus each 8x8-pixel background tile can choose from one of 16 palettes, and each sprite can be assigned one of its 16 palettes.  The PC-Engine has no shadow/highlight effects in hardware (like the Genesis/Megadrive has) nor any translucency, colour math, HDMA, Mode 7, 8bpp, etc, like the SNES/SFC has.  It has a single background layer with no tile flipping, like the Famicom, but 4 times larger.

The VCE drives the pixel clock of the VDC, and its clock is selectable from one of three rates: 10Mhz, 7.16Mhz, and 5Mhz.  This gives a usual screen resolution of 512, 352, and 256 pixels wide, although this can be expanded further using overscan.  Unlike the SNES, all graphics (backgrounds and sprites) on the PCE share the same clock rate and thus pixel width.  (The SNES has a 512-pixel-wide pseudo-hires mode for backgrounds, but its sprites are still low-res, as though on a 256-pixel-wide screen.)  I had decided to use the 512-pixel resolution mode for my demo, at first just to get more text on-screen, and then to allow for more detailed box art for the MSX games.  The PCE can have a screen height of varying sizes up to 242 lines, but I chose 224 lines just to give myself a little more VBlank time.

The main effect of my demo is a 184x128-pixel image of the box art from the game whose music is currently playing, with a moving & undulating highlight & shadow pattern overlaid.

Here's the screen layout for my demo:

In addition to the box art, the associated palette for the picture needs to be loaded in both for a sprite as well as for the background.  The obvious strength of the PCE's graphics hardware (besides large sprites) is its generous pair of 16-palette arrays of 15 colours each, far more than the SNES & MD, and comparable to arcade machines of the '80s.  (The obvious weakness of the PCE's hardware, the 9-bit colour depth making for a global palette of only 512 colours, kind of diminishes this strength a bit...)  But anyway, I made use of this surfeit of palettes by having my code fade the box art palette a step at a time, both darker and lighter, to create shading and shining variations (seen below.)  And by changing only the palette number in the background map for each tile, I can have all these tonal variations on-screen simultaneously.

My palette fading up / down code uses a 512-entry lookup table (to point to the next darker / lighter shade for each current colour) to make the code run faster (faster than if splitting each RGB bit triplet and running math on them individually.)  The table for fading to black has a slight blue bias in it that I also used for HuZero; it kinda looks nicer, more "spacy".  Two of the 16 BG palettes are used for the text display, coloured VU meters, PCE & SCC logo, and a couple other colours for cycling effects; the remaining 14 are taken up by the box art palettes.

It's a little wasteful, but I store all graphics uncompressed in my demo for quick VRAM loading.  When my player moves on to the next song, the graphics code looks up the box art number of the next song, and loads up the image into VRAM (about 12K of data).  As there are 3 or 4 sprites on-screen showing the previous and next songs with mini-versions of the box art, one of the sprites not on-screen at the moment also has to have its box art updated in VRAM.  Once the next box art begins loading in VRAM, the BG is disabled in the middle of the screen and the mini-box sprites begin sliding in unison to advance to the next or previous song.  By the time the sprites reach their new position, the large box art tiles will have been loaded into VRAM and the BG will get re-enabled.

Fortunately, the PCE (unlike many game systems) allows for VRAM writing (& reading) at any time, whether during VBlank, or while the screen is being drawn by the VDC, so there is no strict VRAM writing bottleneck.  The only danger is that if a horizontal/vertical interrupt triggers in the middle of a VRAM write, the VDC register number may be suddenly changed, and thus that VRAM write may be lost or redirected to a different register.  So, usually in a game with mixed VRAM writing and lots of interrupts, the code manages (and retries) a VRAM write if it was interrupted.  Mine does not.  :]

Undulating Effect

As seen in the screen layout picture above, the left half of the 128-tile-wide background map is normally shown on-screen (512 pixels / 8 = 64 tiles across) and contains one copy of the box art.  The right side of the map is blank except for a scrolltext space, and another copy of the same box art.  Each field (1/60 sec.) my code runs through either the top half or bottom half of the box art by reading from the map in VRAM and offsetting its palette pointer by a sine wave value, both in the X and Y direction (similar to a plasma effect popular in oldschool demos) to brighten or darken each tile in that map.  The code does a single horizontal line of the box art, jumps over to the right side box art for another line, then goes down one line to the left side box art, etc.  The math is simple because on the map, the right side box art is 64 words after the left side box art, and the next line is 64 words after that.  (Oh, yeah, the PCE's VDC accesses everything using 16-bit registers and word addresses, hence the '16' in "Turbografx-16.")

In addition to applying an undulating brightness map, my code simultaneously makes the top & left edges of the box brighter, and the bottom & right edges darker, to simulate some kind of beveled depth.  You can see this effect by itself by putting the music into MONO (the scrolltext also goes back to being "MSX-like" :-D).

Normally the right-hand-side map is not seen on-screen, so I have set up raster interrupts through the VDC to switch between the left and right maps every 4 scanlines, in order to make the right side visible and thus give the undulating effect twice the definition.  Since the horizontal resolution of the screen is double the vertical resolution, the effect will now appear to have square granularity.

That's about it.  Below you can see how the PCE's CPU time is allocated:

 

Sixteen Candles

I had wanted to put a cake celebrating 30 years of the PCE and SCC games into my demo, so I decided to make a rather wide sprite that hovers near the top of the screen, moving back and forth between the PCE and SCC logos.  And, since all too many video games that feature cakes bake up the usual (and boring) strawberry shortcake, I wanted to try to draw my favourite, chocolate cake.  But this is all neither here nor there.

The flames of the candles flicker and waver as the cake moves back and forth, as though disturbed by turbulence (...in... the void of space) and the animation of this effect was accomplished by a trick that's as old as video games themselves.  As can be seen below, I reserved two colour indices in the cake palette to use for each direction that the flame leans in.  When the cake is decelerating at the vertex of its movement, both indices are black and the candle flame appears to be still. At other times, only one of the indices is painted red, illuminating the flame against the direction the cake is travelling.

Incidentally, the movement of the cake was described using a sine-wave lookup table (as zillions of oldschool demos have in the past) but I actually used a differential table, where the data byte in one value of the table describes the X-offset from the previous frame's position.  This created a table of mostly -2s, -1s, 0s, 1s, and 2s (roughly).  I could then just use the sign (or zero) of the value in the table to choose the direction the candle flame was pointing in.  The flickery transitions at the vertices of the cake's path were an accidental, nice byproduct of using this table.

 

Fading Scanline Transition

Between my "Tongueman / C.COVELL" logo, which is in low-resolution (256 pixels across), and my main demo screen, I had wanted a somewhat smooth transition.  So I decided to make an effect where scanlines fade progressively from white through to blue and then to black, in random order.  It is, again, a rather simple effect.  First, right as the Tongueman logo disappears by clearing the top half of the BG map, I change resolutions to 512 pixels across, and write the initial table of star positions into sprite RAM and start them scrolling.  I then set up an HSync interrupt routine on each scanline that reads a colour value (or just a simple index) for the corresponding scanline from a table in RAM.  This table starts out with all colour values set to 21 (white).  My main code then reads (on each VBlank) a successive value from a table of random numbers from 0 to 223, and decrements the colour value in the scanline table indexed by that random number.  <breathe>

Then a routine in my VBlank code runs through all 224 scanline table values.  If the value is 0 already, or 21, it does nothing.  Otherwise, it decrements the value once and goes on to the next entry in the table.  The HSync routine mentioned above then loads the colour value, uses it as an index to an RGB value from a table of black->blue->light blue->white values, and writes that to the background colour for that scanline.  Simple, nice-looking (??) fading in 22 steps.

Okay, so the routine and idea are not unique or difficult to do, but beginners might be wondering how to build a table of random numbers (like in this case) that has no repeated values.  Well, one, you could probably write some script to do it, or do what I did and use a spreadsheet program.  In one column of the spreadsheet, make a sequence of numbers counting up (0,1,2,..) to the final number that you want.  In the next column, fill the same number of cells with random numbers (using the RND function).  Then sort both columns (ascending, descending, it doesn't matter) using only the random number column as the sorting criterion.  The ascending numbers in the first column will now be jumbled up.  Useful?  Useless?  I don't know.


That is about all!

I hope this write-up has been a little informative for oldschool demo makers, or for people who didn't know much about the PC-Engine's technical details.  I hope the tone of my writing didn't come off as too boastful, or anything, since the techniques used in its making are not new or revolutionary.  I was just having a good time while programming, and enjoying the little challenges that popped up here and there, and that's the happy & fun feeling I hope has carried over in my text.  Programming for retro consoles is often pure pleasure!

<--BACK | Send Chris an E-Mail!