We have reached the third and final part of the project in which the goal is to optimize a piece of open source software, is part one we selected and bench marked the software and in part 2 we did some profiling to see what the slow parts of the program were. Building off of these two steps in this step I will look for ways to better optimize FFmpeg.
This task as I have learned is not an easy one, the first step was to decide on which method would best speed up this program. Looking back at my first test I did and seeing the gcc optimization fail to accomplish anything this would not be easy. The next thing I did was check for individual optimization files, and this led to some more promising results. While using the aarm64 machine aarchie, FFmpeg has 147 files in the libavcodec/aarch64 folder, and the only reason the number is this high is because many of the files generated output files when compiled. This is in contrast to the 194 files within the x86 directory, none of which being output files. This means there have been many more optimizations done specifically for x86. Additionally when doing research on the SIMD optimization of FFmpeg I found this page https://github.com/FFmpeg/FFmpeg/blob/master/doc/optimization.txt which explicitly states that the best way to look to optimize for non x86 systems is to look at what has already been done there.
So I went and checked down all the files looking for .S or .s extensions in the aarch64 directory and looking for .asm in the x86 directory. This gave me two lists which I could use to compare and see what is missing for either. Due to my profiling showing that a function called flac_encode_frame was taking up most of the time I started with the only two files which referenced to flac in the x86 directory, this being flacdsp.asm and flac_dsp_gpl.asm. I was not able to work out exactly what gpl stands for but i believe that dsp stands for display which is not helpful considering I am looking at audio files in my testing. Though this seems like a dead end it still sets me down the right path of checking out these files.
At this point I go back to my profiling and decide to find where in the code that this function is being called. I find the overarching function in flacenc.c. This file is used for encoding using flac, but file itself does not contain the offending code and so I do some digging. Luckily for me there are some excellent online resources for searching functions and files for FFmpeg namely https://ffmpeg.org/doxygen/0.11/index.html which although out of date still gave me the ability to go back into the up to date files and find the offending functions. Which I found in golomb.h
golomb.h is an interesting file. After doing some googling I found that Golomb Encoding is a tpye of encoding which is very much optimized for dealing with small input numbers. Apparently there is a subtype of this encoding known as Rice Coding which is popular with audio data compression which is exactly what I am looking for. So looking at the code which is taking up the most time out of any function for my test cases reveals the following two functions.
/**
* write unsigned golomb rice code (jpegls).
*/
static inline void set_ur_golomb_jpegls(PutBitContext *pb, int i, int k,
int limit, int esc_len)
{
int e;
av_assert2(i >= 0);
e = (i >> k) + 1;
if (e < limit) {
while (e > 31) {
put_bits(pb, 31, 0);
e -= 31;
}
put_bits(pb, e, 1);
if (k)
put_sbits(pb, k, i);
} else {
while (limit > 31) {
put_bits(pb, 31, 0);
limit -= 31;
}
put_bits(pb, limit, 1);
put_bits(pb, esc_len, i - 1);
}
}
/**
* write signed golomb rice code (ffv1).
*/
static inline void set_sr_golomb(PutBitContext *pb, int i, int k, int limit,
int esc_len)
{
int v;
v = -2 * i - 1;
v ^= (v >> 31);
set_ur_golomb(pb, v, k, limit, esc_len);
}
These functions are used to do the encoding of the audio. As we have learned this type of coding is popular because it is optimized for smaller input values, as such seeing int as the datatype for both v and e to me shows there is room for improvment.
ldr w19,[x23],#4
mvn w19, w19, lsl #1
eor w19, w19, w19, asr #31
These three lines represent these two lines of code
v = -2 * i - 1;
v ^= (v >> 31);
It is with confidence then That I can now say what my strategy would be for optimizing this code. Firstly I would test to see if I could turn that int v into a int16_t without any issues. If I am able to do this, then it would be possible to manually vecotrize this code, specifically mvn and eor both work with vecotrized data and so this would have potential to cut these to operations which in testing accounted for about 8 percent of the runtime down by a fraction.
This would work by changing the register from full words to quarter words and thus allowing us to run commands on four times the values at the same time. This should in theory quarter the amount of time the program spends on these commands. Which would for our test add up to approximately 6 seconds shaved off of our time an over 6% increase in performance. Though this is assuming two key things which I would have to test for. Firstly It is assuming that I can turn the ints into 16 bit ints. If this failed there may still be ways to only use 16 bit ints sometimes but then the question becomes if doing the check and making the adjustment is slower than the current method. The second assumption is that these lines of code will be faster even after converting between word size, considering these values are then going to be used later on in the program after being encoded.
In conclusion I believe I have found a viable line of thinking towards improving performance of the audio encoding in FFmpeg on aarch64 systems, though I would need to test the viability of my approaches before being sure they are either possible or more efficient in practice.
Saturday, April 18, 2020
Wednesday, April 15, 2020
Project Part 2: Profiling
In this Part of the project out goal was to take out program and profile it, seeing where the program was spending most of its time. Towards this goal we learned to utilize two tools which I used for my own profiling, gprof and perf.
gprof is a profiling tool which inserts itself into the program with the command -pg, which created a gmon.out file allowing us to see how our program runs. This file can then be read though it isn't the easiest to understand and is better when represented graphically. This data can be converted into a graph with:
gprof ./ffmpeg_g | gprof2dot | dot
This code pipes out gprof output into a program which converts it into a form that can then be piped into a graphical format. I was having issues getting a graph to appear in its own window using the added parameters -T x11, and so I took the output and pasted it onto a web interpreter http://www.webgraphviz.com/. This website took that output and generated the graph seen in figure 5 which shows that flac_encode_frame is where the program ends up spending a lot of its time. This function is most likely the part of the program where the file is re encoded into the new format.
After Finding this I ran perf. This tool allows us to look at which lines of the assembly code are taking up the most time to run on each system and so give us an idea of ways to speed them up.
to run perf first I reconfigured the make to remove -pg then I ran a make clean and then a make. After this I ran the program but with the extra call to perf which looked like:
perf record ./ffmpeg_g -i ../OneHourBenchmark.mp3 OneHourBenchmarkOut.ogg
which gave the following 2 images.
The first of these images is a screenshot of the highest use functions of code in the program as well as their percentage of time spent there and what file they are a part of. The second image is what happens when you run annotate function and this shows a breakdown of both the source code and the assembly code generated by the source code. This assembly code is then labeled with what percentage of time was spent on each action. This can be a bit misleading however especially with smaller sample sizes as what is actually happening is that perf is interrupting the program every so often to sample which command is being run, so near misses can occur and some lines may be missed entirely dispute clearly being run. My particular test took around 1 minute and so was enough data to give a reasonable assessment of the code. As we can see comparing the results above with the results bellow the program spends its time within this function doing different things depending on the hardware,with ARM being above and x86 being below. Because of this different types of optimization will only work on certain machines and we will have to look into what specifically we will want to change in the next part.
gprof is a profiling tool which inserts itself into the program with the command -pg, which created a gmon.out file allowing us to see how our program runs. This file can then be read though it isn't the easiest to understand and is better when represented graphically. This data can be converted into a graph with:
gprof ./ffmpeg_g | gprof2dot | dot
This code pipes out gprof output into a program which converts it into a form that can then be piped into a graphical format. I was having issues getting a graph to appear in its own window using the added parameters -T x11, and so I took the output and pasted it onto a web interpreter http://www.webgraphviz.com/. This website took that output and generated the graph seen in figure 5 which shows that flac_encode_frame is where the program ends up spending a lot of its time. This function is most likely the part of the program where the file is re encoded into the new format.
After Finding this I ran perf. This tool allows us to look at which lines of the assembly code are taking up the most time to run on each system and so give us an idea of ways to speed them up.
to run perf first I reconfigured the make to remove -pg then I ran a make clean and then a make. After this I ran the program but with the extra call to perf which looked like:
perf record ./ffmpeg_g -i ../OneHourBenchmark.mp3 OneHourBenchmarkOut.ogg
which gave the following 2 images.
figure 1 |
figure 2 |
The first of these images is a screenshot of the highest use functions of code in the program as well as their percentage of time spent there and what file they are a part of. The second image is what happens when you run annotate function and this shows a breakdown of both the source code and the assembly code generated by the source code. This assembly code is then labeled with what percentage of time was spent on each action. This can be a bit misleading however especially with smaller sample sizes as what is actually happening is that perf is interrupting the program every so often to sample which command is being run, so near misses can occur and some lines may be missed entirely dispute clearly being run. My particular test took around 1 minute and so was enough data to give a reasonable assessment of the code. As we can see comparing the results above with the results bellow the program spends its time within this function doing different things depending on the hardware,with ARM being above and x86 being below. Because of this different types of optimization will only work on certain machines and we will have to look into what specifically we will want to change in the next part.
figure 3 |
figure 4 |
figure 5 |
Wednesday, April 8, 2020
Project Part 0: The Before
In part 1 of the project blogs I discussed how I picked FFmpeg for the project but glossed over the process of deciding on that software. In general I don't use linux a lot in my daily life, I have windows on my laptop and desktop and prefer it for my usual use, with using things like winscp and putty to deal with linux tasks for school. Though with this task it is obviously needed to find a software that works in linux.
Luckily there are many tools in windows for running linux applications either with a virtual machine or with some sort of linux faking program and these generally work so that was not the hardest part. The hardest part, for me anyways, was picking a software. Since I didn't use linux much I was not very aware of linux software let alone open source software (though plenty of it is so that's helpful). I ended looking at both the github most popular repositories and GNU open source software such as gzip and after a while of digging which also involved looking at previous year's projects I decided on FFmpeg. This is because tasks such as changing formats for an audio or video file are cpu intensive and because I found it easiest to get test data for such tasks.
Luckily there are many tools in windows for running linux applications either with a virtual machine or with some sort of linux faking program and these generally work so that was not the hardest part. The hardest part, for me anyways, was picking a software. Since I didn't use linux much I was not very aware of linux software let alone open source software (though plenty of it is so that's helpful). I ended looking at both the github most popular repositories and GNU open source software such as gzip and after a while of digging which also involved looking at previous year's projects I decided on FFmpeg. This is because tasks such as changing formats for an audio or video file are cpu intensive and because I found it easiest to get test data for such tasks.
Lab 5 Simple Loop Program
For this lab, we were given a very simple task, to loop through some numbers and print them out. In any programming language this is one of the first things you learn to do and is super easy. Though assembly is not so simple and as such we had to work at it.
The first problem was getting the number displayed, the counting was easy, start with zero and loop till you reach a number, but getting that number displayed was a bit tricky. Luckily numbers are offset as characters by a set amount per digit and so as long as we could split the digits we could make the numbers into characters. This was far easier in x86 and Aarm64 vs 6502 as we had division operation we could do to separate the digits and thus get the characters.
After properly getting the characters we had to figure out how to add them to the display. After some trial and error and a bit of googling it was discovered that we could simply make the string for the loop have extra blank characters and replace the memory in those addresses with out characters thus giving us the proper output of a loop, printing out numbers.
The first problem was getting the number displayed, the counting was easy, start with zero and loop till you reach a number, but getting that number displayed was a bit tricky. Luckily numbers are offset as characters by a set amount per digit and so as long as we could split the digits we could make the numbers into characters. This was far easier in x86 and Aarm64 vs 6502 as we had division operation we could do to separate the digits and thus get the characters.
After properly getting the characters we had to figure out how to add them to the display. After some trial and error and a bit of googling it was discovered that we could simply make the string for the loop have extra blank characters and replace the memory in those addresses with out characters thus giving us the proper output of a loop, printing out numbers.
Wednesday, April 1, 2020
Project Part 1
For this project we are tasked with selecting an open source software and trying to optimize some part of it. For this I chose FFmpeg as it is a file conversion tool which is open source, I know how to get a large data input for bench marking and it is CPU reliant.
So the first Thing I did was build the source code, this was notably more difficult on windows since it is not make to be run on Windows natively but with a some third party downloads, most importantly MSYS2, I was able to get it working. On the linux machine aarchie I had far fewer problems as the tools required for the build were already installed and ready to go. So I built the software on both X86_64 and aarch64 and tested them out.
In testing the first thing I did was run it with default configurations. After the build was completed I ran it using my test data, a download of a video from youtube which was just over an hour long which had already been converted to an mp3 of 83.3MB, which I was not converting to a ogg file. Here were the results,
X86_64 Ryan desktop benchmark
Test1:
real 0m14.516s
user 0m0.015s
sys 0m0.000s
Test2:
real 0m14.087s
user 0m0.000s
sys 0m0.015s
Test3:
real 0m14.235s
user 0m0.000s
sys 0m0.000s
aarch64 aarchie benchmark
Test1:
real 1m50.783s
user 1m48.931s
sys 0m1.415s
Test2:
real 1m50.571s
user 1m49.255s
sys 0m0.897s
Test3:
real 1m50.683s
user 1m49.034s
sys 0m1.236s
This is with minimal background tasks running on my home windows computer, and then on aarchie. As can be seen my computer is a a bit more powerful for this purpose but the results of the benchmark seem consistent.
Now After this was done I cleaned the make and rebuilt using -O3 to see if tweaking the compiler optimization would speed up the program. Here are those results.
X86_64 Ryan desktop O3
Test1:
real 0m14.047s
user 0m0.000s
sys 0m0.015s
Test2:
real 0m14.014s
user 0m0.015s
sys 0m0.000s
Test3:
real 0m14.044s
user 0m0.000s
sys 0m0.000s
aarch64 aarchie O3
Test1:
real 1m50.490s
user 1m48.994s
sys 0m1.086s
Test2:
real 1m50.401s
user 1m48.777s
sys 0m1.196s
Test3:
real 1m50.386s
user 1m48.742s
sys 0m1.236s
As can be seen the results were negligible, though I am unsure if this is due to an improper use of the optimization settings with the makefile or due to the makefile already optimizing the output. Either way if I am going to find a way to considerably speed up FFmpeg I am going to do more than a compiler optimization.
So the first Thing I did was build the source code, this was notably more difficult on windows since it is not make to be run on Windows natively but with a some third party downloads, most importantly MSYS2, I was able to get it working. On the linux machine aarchie I had far fewer problems as the tools required for the build were already installed and ready to go. So I built the software on both X86_64 and aarch64 and tested them out.
In testing the first thing I did was run it with default configurations. After the build was completed I ran it using my test data, a download of a video from youtube which was just over an hour long which had already been converted to an mp3 of 83.3MB, which I was not converting to a ogg file. Here were the results,
X86_64 Ryan desktop benchmark
Test1:
real 0m14.516s
user 0m0.015s
sys 0m0.000s
Test2:
real 0m14.087s
user 0m0.000s
sys 0m0.015s
Test3:
real 0m14.235s
user 0m0.000s
sys 0m0.000s
aarch64 aarchie benchmark
Test1:
real 1m50.783s
user 1m48.931s
sys 0m1.415s
Test2:
real 1m50.571s
user 1m49.255s
sys 0m0.897s
Test3:
real 1m50.683s
user 1m49.034s
sys 0m1.236s
This is with minimal background tasks running on my home windows computer, and then on aarchie. As can be seen my computer is a a bit more powerful for this purpose but the results of the benchmark seem consistent.
Now After this was done I cleaned the make and rebuilt using -O3 to see if tweaking the compiler optimization would speed up the program. Here are those results.
X86_64 Ryan desktop O3
Test1:
real 0m14.047s
user 0m0.000s
sys 0m0.015s
Test2:
real 0m14.014s
user 0m0.015s
sys 0m0.000s
Test3:
real 0m14.044s
user 0m0.000s
sys 0m0.000s
aarch64 aarchie O3
Test1:
real 1m50.490s
user 1m48.994s
sys 0m1.086s
Test2:
real 1m50.401s
user 1m48.777s
sys 0m1.196s
Test3:
real 1m50.386s
user 1m48.742s
sys 0m1.236s
As can be seen the results were negligible, though I am unsure if this is due to an improper use of the optimization settings with the makefile or due to the makefile already optimizing the output. Either way if I am going to find a way to considerably speed up FFmpeg I am going to do more than a compiler optimization.
Sunday, March 15, 2020
Lab 4 - Pick Two pt.2
In Part 1 we looked at making an adder in 6502 assembly, this was one of the two tasks we chose for this lab, the other was a screen colour selector. The screen colour selector I personally found was a lot easier to code than the adder and took far lass code to implement. So without any delay here is the whole source code before we dive in to how it works.
; ROM routines
define SCINIT $ff81 ; initialize/clear screen
define CHRIN $ffcf ; input character from keyboard
define CHROUT $ffd2 ; output character to screen
define SCREEN $ffed ; get screen size
define PLOT $fff0 ; get/set cursor coordinates
jsr SCINIT
ldy #$00
initColours:
lda colours,y
beq doneInit
jsr CHROUT
iny
bne initColours
doneInit:
ldy #$00
ldx #$00
CLC
jsr PLOT
SEC
jsr PLOT
jsr flipSelect
checkIn:
SEC
jsr PLOT
jsr CHRIN
cmp #$80
beq up
cmp #$82
bne checkIn
down:
cpy #$0f
beq checkIn
jsr flipSelect
iny
jsr flipSelect
jsr drawScreen
jmp checkIn
up:
cpy #$00
beq checkIn
jsr flipSelect
dey
jsr flipSelect
jsr drawScreen
jmp checkIn
flipSelect:
ldx #$00
CLC
jsr PLOT
SEC
jsr PLOT
flipLoop:
cmp #$20
beq doneFlip
eor #$80
jsr CHROUT
SEC
jsr PLOT
clc
bcc flipLoop
doneFlip:
rts
drawScreen:
tya
pha
lda #$00 ; set pointer at $10 to $0200
sta $10
lda #$02
sta $11
pla
ldx #$06 ; max value for $11
ldy #$00 ; index
drawLoop:
sta ($10),y ; store colour
iny ; increment index
bne drawLoop ; branch until page done
inc $11 ; increment high byte of pointer
cpx $11 ; compare with max value
bne drawLoop ; continue if not done
rts
colours:
dcb "B","L","A","C","K",10
dcb "W","H","I","T","E",10
dcb "R","E","D",10
dcb "C","Y","A","N",10
dcb "P","U","R","P","L","E",10
dcb "G","R","E","E","N",10
dcb "B","L","U","E",10
dcb "Y","E","L","L","O","W",10
dcb "O","R","A","N","G","E",10
dcb "B","R","O","W","N",10
dcb "L","I","G","H","T",95,"R","E","D",10
dcb "D","A","R","K",95,"G","R","E","Y",10
dcb "G","R","E","Y",10
dcb "L","I","G","H","T",95,"G","R","E","E","N",10
dcb "L","I","G","H","T",95,"B","L","U","E",10
dcb "L","I","G","H","T",95,"G","R","E","Y",00
This code, similar to the code in pt. 1 uses a main loop as the body of the program, though this one is a bit different and also isn't named main. Though this is getting ahead of ourselves the first thing the program does is initialize the screen, it does this by going through the colour names which are stored in memory and printing them to the screen, which was fairly easy to do considering the ROM routine CHROUT can read newline properly allowing the whole thing to be one block of memory.
The main loop for this program is called checkIn, this loop is checking for an input and when it receives it updates the screen accordingly, both up and down inputs work about the same so let's just look at up. When the up arrow in pressed checkIn calls the subroutine up. This subroutine will then check if up is valid (ie: not the top of the screen) and if so it will remove the selection, then select the proper line before changing the screen colour then returning to the checkIn loop.
Now of course up and down both call their own subroutines which I will explain now. flipSelect is a pretty interesting subroutine, it simply flips the high bit of every character in whatever line is in y. which will be the currently selected line. So the first time it is called it flips it off to deselect the line, then the second time it is called it flips it on selecting the new line. The drawScreen subroutine is one we have used a lot in the course so far. It simply takes what is currently in y, as established this will be the current selected element, and it will fill the screen with that colour. We made sure to align the colours on the screen with their places in memory so their y values line up with their colour values. This results in the correct colour being displayed
putting these few simple subroutines together along with the Rom routines and we have a very compact and easy to understand bit of code which can allow you to select a colour and display it on the screen.
; ROM routines
define SCINIT $ff81 ; initialize/clear screen
define CHRIN $ffcf ; input character from keyboard
define CHROUT $ffd2 ; output character to screen
define SCREEN $ffed ; get screen size
define PLOT $fff0 ; get/set cursor coordinates
jsr SCINIT
ldy #$00
initColours:
lda colours,y
beq doneInit
jsr CHROUT
iny
bne initColours
doneInit:
ldy #$00
ldx #$00
CLC
jsr PLOT
SEC
jsr PLOT
jsr flipSelect
checkIn:
SEC
jsr PLOT
jsr CHRIN
cmp #$80
beq up
cmp #$82
bne checkIn
down:
cpy #$0f
beq checkIn
jsr flipSelect
iny
jsr flipSelect
jsr drawScreen
jmp checkIn
up:
cpy #$00
beq checkIn
jsr flipSelect
dey
jsr flipSelect
jsr drawScreen
jmp checkIn
flipSelect:
ldx #$00
CLC
jsr PLOT
SEC
jsr PLOT
flipLoop:
cmp #$20
beq doneFlip
eor #$80
jsr CHROUT
SEC
jsr PLOT
clc
bcc flipLoop
doneFlip:
rts
drawScreen:
tya
pha
lda #$00 ; set pointer at $10 to $0200
sta $10
lda #$02
sta $11
pla
ldx #$06 ; max value for $11
ldy #$00 ; index
drawLoop:
sta ($10),y ; store colour
iny ; increment index
bne drawLoop ; branch until page done
inc $11 ; increment high byte of pointer
cpx $11 ; compare with max value
bne drawLoop ; continue if not done
rts
colours:
dcb "B","L","A","C","K",10
dcb "W","H","I","T","E",10
dcb "R","E","D",10
dcb "C","Y","A","N",10
dcb "P","U","R","P","L","E",10
dcb "G","R","E","E","N",10
dcb "B","L","U","E",10
dcb "Y","E","L","L","O","W",10
dcb "O","R","A","N","G","E",10
dcb "B","R","O","W","N",10
dcb "L","I","G","H","T",95,"R","E","D",10
dcb "D","A","R","K",95,"G","R","E","Y",10
dcb "G","R","E","Y",10
dcb "L","I","G","H","T",95,"G","R","E","E","N",10
dcb "L","I","G","H","T",95,"B","L","U","E",10
dcb "L","I","G","H","T",95,"G","R","E","Y",00
This code, similar to the code in pt. 1 uses a main loop as the body of the program, though this one is a bit different and also isn't named main. Though this is getting ahead of ourselves the first thing the program does is initialize the screen, it does this by going through the colour names which are stored in memory and printing them to the screen, which was fairly easy to do considering the ROM routine CHROUT can read newline properly allowing the whole thing to be one block of memory.
The main loop for this program is called checkIn, this loop is checking for an input and when it receives it updates the screen accordingly, both up and down inputs work about the same so let's just look at up. When the up arrow in pressed checkIn calls the subroutine up. This subroutine will then check if up is valid (ie: not the top of the screen) and if so it will remove the selection, then select the proper line before changing the screen colour then returning to the checkIn loop.
Now of course up and down both call their own subroutines which I will explain now. flipSelect is a pretty interesting subroutine, it simply flips the high bit of every character in whatever line is in y. which will be the currently selected line. So the first time it is called it flips it off to deselect the line, then the second time it is called it flips it on selecting the new line. The drawScreen subroutine is one we have used a lot in the course so far. It simply takes what is currently in y, as established this will be the current selected element, and it will fill the screen with that colour. We made sure to align the colours on the screen with their places in memory so their y values line up with their colour values. This results in the correct colour being displayed
putting these few simple subroutines together along with the Rom routines and we have a very compact and easy to understand bit of code which can allow you to select a colour and display it on the screen.
Lab 4 - Pick Two pt.1
For this lab we were tasked with creating two programs from a list in 6502 assembly code. These tasks ranged in difficulty though some things were made far easier by the introduction of ROM routines which I will explain a bit later. The two tasks which out group decided on were the calculator and the colour selector. We chose these two for a couple of reasons, the largest being that we all felt most confident in out own ability to get done these two tasks over any others. So with our tasks in hand we set out to get an understanding on ROM Routines.
ROM routines are basically snippets of code saved in the memory of the chip, In order to access these you simply need to start a subroutine and the right address and it will run the subroutine as if it were code which you wrote. The Routines given to us did things which previously requited many lines of code now in just one with smart use of the various registers in order to supply input to these routines.
Now before getting to the code for the adder it is important to note that we were never able to get the blinking cursor to work properly, though the rest of the program works for sure. Now here is the full source code which will be explained bellow.
; ROM routines
define SCINIT $ff81 ; initialize/clear screen
define CHRIN $ffcf ; input character from keyboard
define CHROUT $ffd2 ; output character to screen
define SCREEN $ffed ; get screen size
define PLOT $fff0 ; get/set cursor coordinates
define NUMBERA $10;
define NUMBERB $20;
jsr SCINIT
mainLoop:
ldy #$00
jsr char1
jsr input
jsr storeA
ldy #$00
jsr char2
jsr input
jsr storeB
ldy #$00
jsr charR
jsr printAdd
jmp mainLoop
input:
SEC
jsr PLOT
ldx #$15
CLC
jsr PLOT
inLoop:
SEC
jsr PLOT
jsr CHRIN
charCheck:
cmp #$00
beq inLoop
cmp #$81
beq right
cmp #$83
beq left
cmp #$0d
beq next
drawNum:
cmp #$30
bcc inLoop
clc
cmp #$3a
bcs inLoop
jsr CHROUT
SEC
jsr PLOT
cpx #$17
bne inLoop
dex
CLC
jsr PLOT
jmp inLoop
left: cpx #$15
beq inLoop
jsr CHROUT
jmp inLoop
right: cpx #$16
beq inLoop
jsr CHROUT
jmp inLoop
next:
SEC
jsr PLOT
ldx #$15
CLC
jsr PLOT
SEC
jsr PLOT
CLC
SBC #$2F
ASL
ASL
ASL
ASL
PHA
ldx #$16
CLC
jsr PLOT
SEC
jsr PLOT
CLC
SBC #$2F
PHA
ldx #$00
iny
CLC
jsr PLOT
SEC
jsr PLOT
PLA
TAX
PLA
rts
storeA:
sta NUMBERA
txa
eor NUMBERA
sta NUMBERA
rts
storeB:
sta NUMBERB
txa
eor NUMBERB
sta NUMBERB
rts
printAdd:
SEC
jsr PLOT
ldx #$15
CLC
jsr PLOT
SEC
jsr PLOT
SED
lda NUMBERA
adc NUMBERB
CLD
pha
bcc outputAddition
ldx #$14
CLC
jsr PLOT
SEC
jsr PLOT
lda #$31
jsr CHROUT
outputAddition:
pla
pha
LSR
LSR
LSR
LSR
clc
adc #$30
jsr CHROUT
pla
and #$0F
clc
adc #$30
jsr CHROUT
SEC
jsr PLOT
ldx #$00
iny
CLC
jsr PLOT
rts
char1: lda firstDigit,y
beq charRet
jsr CHROUT
iny
bne char1
char2: lda secondDigit,y
beq charRet
jsr CHROUT
iny
bne char2
charR: lda result,y
beq charRet
jsr CHROUT
iny
bne charR
charRet:
rts
firstDigit:
dcb "E","N","T","E","R",32,"F","I","R","S","T",32,"D","I","G","I","T",":",32,32,32,"0","0"
dcb 00
secondDigit:
dcb "E","N","T","E","R",32,"S","E","C","O","N","D",32,"D","I","G","I","T",":",32,32,"0","0"
dcb 00
result:
dcb "R","E","S","U","L","T",":"
dcb 00
The Adder is actually quite a simple program especially thanks to ROM routines. The writing went through a few revisions before settling on the final result, this is due to poor user of subroutines in previous incarnations as well as unfamiliarity with the ROM routines. Now speaking of the ROM routines let's discuss what they are accomplishing for us here.
there are three important ROM routines to making this code as compact as it is, the first if CHROUT, this will spit out a character onto the screen and then move the cursor over one, very simple but extremely useful. It does this by simply checking the accumulator for the value and then putting it at the current cursor location, the next routine used is the compliment to CHROUT, CHRIN. As the name implies CHRIN takes a character input, this input is stored in the accumulator which means it can be used in conjunction with CHROUT to print input to the screen. The last important ROM routines we have is PLOT. This routine has two functions depending on the state of the carry flag. Either it gets the current cursor position and returns the value of the character there, or it sets the current cursor position based on what is currently in x and y.
Using these Tools the code goes through a main loop which utilizes a few subroutines in order to keep it as readable as possible. This main loop is only 12 lines but does all the work of the program vie the subroutines it calls. Let's take a closer look at it.
mainLoop:
ldy #$00
jsr char1
jsr input
jsr storeA
ldy #$00
jsr char2
jsr input
jsr storeB
ldy #$00
jsr charR
jsr printAdd
jmp mainLoop
So the first step is to set y to a known value, this allows char1 to work properly as it will print onto the screen the instructions to enter the first input and the start location is whatever y is. Next we get the first user input. This is done through quite complex code which I will explain further down. following this it stores the first number then it does it all again for the second number. After this it prints the result text followed by doing the actual addition and printing those results, this whole thing then loops allowing the program to keep taking inputs.
Now let's break down a couple of the more important components there, namely input, storeX and printAdd. input is how we get out user input and as we are simply making a calculator only certain characters are allowed we do this by limiting the values that can be read by CHRIN, if it gets anything else we simply ask it to try again. This ensures we are either getting a number, an arrow key, or the enter key. When a character is input, it will also do a check on whether or not the input field is on the second digit in order to ensure you can only write two digits. Next storeA and storeB which take the numbers provided to them and store them in different addresses, fairly simple. Lastly printAdd which does the actual adding. This subroutine switches to decimal mode before adding the two values stored in the previous subroutines together. It then checks to see if there was a carry and if there was it draws a 1 before the number, before then drawing the output of the addition.
All of this put together and we have a working, though not perfect Adder in 6502 assembly.
ROM routines are basically snippets of code saved in the memory of the chip, In order to access these you simply need to start a subroutine and the right address and it will run the subroutine as if it were code which you wrote. The Routines given to us did things which previously requited many lines of code now in just one with smart use of the various registers in order to supply input to these routines.
Now before getting to the code for the adder it is important to note that we were never able to get the blinking cursor to work properly, though the rest of the program works for sure. Now here is the full source code which will be explained bellow.
; ROM routines
define SCINIT $ff81 ; initialize/clear screen
define CHRIN $ffcf ; input character from keyboard
define CHROUT $ffd2 ; output character to screen
define SCREEN $ffed ; get screen size
define PLOT $fff0 ; get/set cursor coordinates
define NUMBERA $10;
define NUMBERB $20;
jsr SCINIT
mainLoop:
ldy #$00
jsr char1
jsr input
jsr storeA
ldy #$00
jsr char2
jsr input
jsr storeB
ldy #$00
jsr charR
jsr printAdd
jmp mainLoop
input:
SEC
jsr PLOT
ldx #$15
CLC
jsr PLOT
inLoop:
SEC
jsr PLOT
jsr CHRIN
charCheck:
cmp #$00
beq inLoop
cmp #$81
beq right
cmp #$83
beq left
cmp #$0d
beq next
drawNum:
cmp #$30
bcc inLoop
clc
cmp #$3a
bcs inLoop
jsr CHROUT
SEC
jsr PLOT
cpx #$17
bne inLoop
dex
CLC
jsr PLOT
jmp inLoop
left: cpx #$15
beq inLoop
jsr CHROUT
jmp inLoop
right: cpx #$16
beq inLoop
jsr CHROUT
jmp inLoop
next:
SEC
jsr PLOT
ldx #$15
CLC
jsr PLOT
SEC
jsr PLOT
CLC
SBC #$2F
ASL
ASL
ASL
ASL
PHA
ldx #$16
CLC
jsr PLOT
SEC
jsr PLOT
CLC
SBC #$2F
PHA
ldx #$00
iny
CLC
jsr PLOT
SEC
jsr PLOT
PLA
TAX
PLA
rts
storeA:
sta NUMBERA
txa
eor NUMBERA
sta NUMBERA
rts
storeB:
sta NUMBERB
txa
eor NUMBERB
sta NUMBERB
rts
printAdd:
SEC
jsr PLOT
ldx #$15
CLC
jsr PLOT
SEC
jsr PLOT
SED
lda NUMBERA
adc NUMBERB
CLD
pha
bcc outputAddition
ldx #$14
CLC
jsr PLOT
SEC
jsr PLOT
lda #$31
jsr CHROUT
outputAddition:
pla
pha
LSR
LSR
LSR
LSR
clc
adc #$30
jsr CHROUT
pla
and #$0F
clc
adc #$30
jsr CHROUT
SEC
jsr PLOT
ldx #$00
iny
CLC
jsr PLOT
rts
char1: lda firstDigit,y
beq charRet
jsr CHROUT
iny
bne char1
char2: lda secondDigit,y
beq charRet
jsr CHROUT
iny
bne char2
charR: lda result,y
beq charRet
jsr CHROUT
iny
bne charR
charRet:
rts
firstDigit:
dcb "E","N","T","E","R",32,"F","I","R","S","T",32,"D","I","G","I","T",":",32,32,32,"0","0"
dcb 00
secondDigit:
dcb "E","N","T","E","R",32,"S","E","C","O","N","D",32,"D","I","G","I","T",":",32,32,"0","0"
dcb 00
result:
dcb "R","E","S","U","L","T",":"
dcb 00
The Adder is actually quite a simple program especially thanks to ROM routines. The writing went through a few revisions before settling on the final result, this is due to poor user of subroutines in previous incarnations as well as unfamiliarity with the ROM routines. Now speaking of the ROM routines let's discuss what they are accomplishing for us here.
there are three important ROM routines to making this code as compact as it is, the first if CHROUT, this will spit out a character onto the screen and then move the cursor over one, very simple but extremely useful. It does this by simply checking the accumulator for the value and then putting it at the current cursor location, the next routine used is the compliment to CHROUT, CHRIN. As the name implies CHRIN takes a character input, this input is stored in the accumulator which means it can be used in conjunction with CHROUT to print input to the screen. The last important ROM routines we have is PLOT. This routine has two functions depending on the state of the carry flag. Either it gets the current cursor position and returns the value of the character there, or it sets the current cursor position based on what is currently in x and y.
Using these Tools the code goes through a main loop which utilizes a few subroutines in order to keep it as readable as possible. This main loop is only 12 lines but does all the work of the program vie the subroutines it calls. Let's take a closer look at it.
mainLoop:
ldy #$00
jsr char1
jsr input
jsr storeA
ldy #$00
jsr char2
jsr input
jsr storeB
ldy #$00
jsr charR
jsr printAdd
jmp mainLoop
So the first step is to set y to a known value, this allows char1 to work properly as it will print onto the screen the instructions to enter the first input and the start location is whatever y is. Next we get the first user input. This is done through quite complex code which I will explain further down. following this it stores the first number then it does it all again for the second number. After this it prints the result text followed by doing the actual addition and printing those results, this whole thing then loops allowing the program to keep taking inputs.
Now let's break down a couple of the more important components there, namely input, storeX and printAdd. input is how we get out user input and as we are simply making a calculator only certain characters are allowed we do this by limiting the values that can be read by CHRIN, if it gets anything else we simply ask it to try again. This ensures we are either getting a number, an arrow key, or the enter key. When a character is input, it will also do a check on whether or not the input field is on the second digit in order to ensure you can only write two digits. Next storeA and storeB which take the numbers provided to them and store them in different addresses, fairly simple. Lastly printAdd which does the actual adding. This subroutine switches to decimal mode before adding the two values stored in the previous subroutines together. It then checks to see if there was a carry and if there was it draws a 1 before the number, before then drawing the output of the addition.
All of this put together and we have a working, though not perfect Adder in 6502 assembly.
Friday, January 31, 2020
Lab 3 - Pong pt.2
After the initial dive into the Pong problem and with a new understanding from the experiments done a tough choice had to be made. And with that I started over. My first goal with this new start was just to get a bouncing ball
The Code for the bouncing ball iteration no longer exists but it worked moving the ball in a two step process, the first step said yes the ball should move, the second said which way it should move. This was accomplished by first splitting the movement into the X and Y and then either incriminating or decrementing depending on which surface the ball last bounced off of. The first version of this as quite basic and did not clear the old ball position and only worked at a 45 degree angle resulting in a diamond being drawn on the screen but it was a start.
The screen not being cleared was beginning to bother me and so I implemented a very simple solution, first I copied the screen clear code from the etch-a-sketch program and had it happen on every game loop.
clear: lda table_low ; clear the screen
sta POINTER
lda table_high
sta POINTER_H
ldy #$00
tya
c_loop: sta (POINTER),y
iny
bne c_loop
inc POINTER_H
ldx POINTER_H
cpx #$06
bne c_loop
second I added a delay so that the ball was drawn for longer than it was not making it so that it was less likely to blink out of existence. This had a few issues but I finally had a single ball bouncing around a screen.
Finally The game was coming together and only needed a few more elements to make it functional, the first and most important being the paddle, for this I added a new collision procedure which checked to see if the ball was on a pixel which counted as being on the paddle and if so to make it bounce. I also made it into more of a game by changing the bottom collision to game over and making it so that the x and y velocity got randomized whenever the paddle was hit. Here is the code for that itteration
; zero-page variable locations
define ROW $20 ; current row
define COL $21 ; current column
define DELTAX $30 ; current Delta X
define DELTAY $31 ; current Delta Y
define BOUNCEX $35 ; checks if X has bounced
define BOUNCEY $36 ; checks if Y has bounced
define VELX $38
define VELY $39
define POINTER $10 ; ptr start of row
define POINTER_H $11
define PADDLEL $40
define PADDLER $41
; constants
define DOT $01 ; dot colour
define PADDLE $07 ; black colour
ldy #$00 ; put help text on screen
print: lda help,y
beq setup
sta $f000,y
iny
bne print
setup: lda #$0f ; set initial ROW,COL
sta ROW
lda #$00
sta COL
lda #$20
sta VELX
lda #$20
sta VELY
lda #$0C
sta PADDLEL
lda #$14
sta PADDLER
draw: lda ROW ; ensure ROW is in range 031
and #$1f
sta ROW
lda COL ; ensure COL is in range 031
and #$1f
sta COL
ldy ROW ; load POINTER with start-of-row
lda table_low,y
sta POINTER
lda table_high,y
sta POINTER_H
ldy COL ; store CURSOR at POINTER plus COL
lda #DOT
sta (POINTER),y
drawPaddle:
ldy #$1f ; load POINTER with start-of-row
lda table_low,y
sta POINTER
lda table_high,y
sta POINTER_H
ldy PADDLEL ; store CURSOR at POINTER plus COL
lda #PADDLE
paddleLoop:
sta (POINTER),y
iny
cpy PADDLER
bne paddleLoop
colidR: lda COL
cmp #$1F
bne colidL
sta BOUNCEY
colidL: lda COL
cmp #$00
bne colidD
sta BOUNCEY
colidD: lda ROW
cmp #$1F
bne colidU
CLC
jmp gameover
colidU: lda ROW
cmp #$00
bne colidP
sta BOUNCEX
colidP: CLC
lda ROW
cmp #$1E
bne ballX
lda COL
cmp PADDLEL
bcc ballX
cmp PADDLER
bcs ballX
sta BOUNCEX
lda $fe ;randomize vel when hitting paddle
cmp #$80 ;ensure vel isn't too high
bcc velx
adc #$81
velx:
sta VELX
lda $fe
sta VELY
ballX: lda VELX
adc DELTAX
sta DELTAX
bcc ballY
CLC
lda BOUNCEX
cmp #$00
bne decROW
incROW: inc ROW
CLC
bcc ballY
decROW: dec ROW
ballY:
lda VELY
adc DELTAY
sta DELTAY
bcc getkey
CLC
lda BOUNCEY
cmp #$00
bne decCOL
incCOL: inc COL
CLC
bcc getkey
decCOL: dec COL
getkey: lda $ff ; get a keystroke
ldx #$00 ; clear out the key buffer
stx $ff
cmp #$83 ; check key == LEFT
bne checkR
ldy PADDLEL
cpy #$00
beq checkR
dec PADDLEL
dec PADDLER
jmp delaya
checkR: cmp #$81 ; check key == RIGHT
bne delaya
ldy PADDLER
cpy #$20
beq delaya
inc PADDLEL
inc PADDLER
delaya: ldy #$00 ; Delay processor so that ball doesn't flash at top of screen
ldx #$00
delay: iny
cpy #$FF
bne delay
ldy #$00
inx
cpx #$06
bne delay
clear: lda table_low ; clear the screen
sta POINTER
lda table_high
sta POINTER_H
ldy #$00
tya
c_loop: sta (POINTER),y
iny
bne c_loop
inc POINTER_H
ldx POINTER_H
cpx #$06
bne c_loop
done: clc ; repeat
jmp draw
gameover:
brk
; these two tables contain the high and low bytes
; of the addresses of the start of each row
table_high:
dcb $02,$02,$02,$02,$02,$02,$02,$02
dcb $03,$03,$03,$03,$03,$03,$03,$03
dcb $04,$04,$04,$04,$04,$04,$04,$04
dcb $05,$05,$05,$05,$05,$05,$05,$05,
table_low:
dcb $00,$20,$40,$60,$80,$a0,$c0,$e0
dcb $00,$20,$40,$60,$80,$a0,$c0,$e0
dcb $00,$20,$40,$60,$80,$a0,$c0,$e0
dcb $00,$20,$40,$60,$80,$a0,$c0,$e0
; help message on character screen
help:
dcb "A","r","r","o","w",32,"k","e","y","s"
dcb 32,"d","r","a","w",32,"/",32,"'","C","'"
dcb 32,"k","e","y",32,"c","l","e","a","r","s"
dcb 00
As you can probably tell there is still some left over fragments from the etch-a-sketch which need to be removed but all and all this code will make a working game of pong with the 6502. It draws and moves a ball, receives keyboard input and move the paddle. This is not where I decided to stop with this program though.
; zero-page variable locations
define ROW $20 ; current row
define COL $21 ; current column
define DELTAX $30 ; current Delta X
define DELTAY $31 ; current Delta Y
define BOUNCEX $35 ; checks if X has bounced
define BOUNCEY $36 ; checks if Y has bounced
define VELX $38
define VELY $39
define POINTER $10 ; ptr start of row
define POINTER_H $11
define PADDLEL $40
define PADDLER $41
define SCORE $24
define HIT $23
; constants
define DOT $01 ; dot colour
define PADDLE $07 ; black colour
ldy #$00 ; put help text on screen
print: lda help,y
beq setup
sta $f000,y
iny
bne print
setup: lda #$0f ; set initial ROW,COL
sta ROW
lda #$00
sta COL
lda #$20
sta VELX
lda #$20
sta VELY
lda #$0B
sta PADDLEL
lda #$15
sta PADDLER
lda #$00
sta SCORE
draw: lda ROW ; ensure ROW is in range 031
and #$1f
sta ROW
lda COL ; ensure COL is in range 031
and #$1f
sta COL
ldy ROW ; load POINTER with start-of-row
lda table_low,y
sta POINTER
lda table_high,y
sta POINTER_H
ldy COL ; store CURSOR at POINTER plus COL
lda #DOT
sta (POINTER),y
drawPaddle:
ldy #$1f ; load POINTER with start-of-row
lda table_low,y
sta POINTER
lda table_high,y
sta POINTER_H
ldy PADDLEL ; store CURSOR at POINTER plus COL
lda #PADDLE
paddleLoop:
sta (POINTER),y
iny
cpy PADDLER
bne paddleLoop
colidR: lda COL
cmp #$1F
bne colidL
sta BOUNCEY
colidL: lda COL
cmp #$00
bne colidD
sta BOUNCEY
colidD: lda ROW
cmp #$1F
bne colidU
CLC
jmp gameover
colidU: lda ROW
cmp #$00
bne colidP
sta BOUNCEX
colidP: CLC
lda ROW
cmp #$1E
bne incScore
lda COL
cmp PADDLEL
bcc incScore
cmp PADDLER
bcs incScore
sta BOUNCEX
inc HIT
lda $fe ;randomize vel when hitting paddle
cmp #$80 ;ensure vel isn't too high
bcc velx
adc #$81
velx:
sta VELX
lda $fe
sta VELY
incScore:
CLC
lda ROW
cmp #$1D
bne delaya
lda HIT
cmp #$00
beq delaya
lda #$00
sta HIT
SED
CLC
lda SCORE
adc #$01
sta SCORE
CLD
delaya: ldy #$00 ; Delay processor to slow down game
ldx #$00
delay: iny
cpy #$FF
bne delay
ldy #$00
inx
cpx #$08
bne delay
ballX: lda VELX
adc DELTAX
sta DELTAX
bcc ballY
ldy ROW ; load POINTER with start-of-row
lda table_low,y
sta POINTER
lda table_high,y
sta POINTER_H
ldy COL ; store CURSOR at POINTER plus COL
lda #00
sta (POINTER),y
CLC
lda BOUNCEX
cmp #$00
bne decROW
incROW: inc ROW
CLC
bcc ballY
decROW: dec ROW
ballY:
lda VELY
adc DELTAY
sta DELTAY
bcc getkey
ldy ROW ; load POINTER with start-of-row
lda table_low,y
sta POINTER
lda table_high,y
sta POINTER_H
ldy COL ; store CURSOR at POINTER plus COL
lda #00
sta (POINTER),y
CLC
lda BOUNCEY
cmp #$00
bne decCOL
incCOL: inc COL
CLC
bcc getkey
decCOL: dec COL
getkey: lda $ff ; get a keystroke
ldx #$00 ; clear out the key buffer
stx $ff
cmp #$83 ; check key == LEFT
bne checkR
ldy PADDLEL
cpy #$00
beq checkR
ldy #$1f ; load POINTER with start-of-row
lda table_low,y
sta POINTER
lda table_high,y
sta POINTER_H
ldy PADDLER ; store CURSOR at POINTER plus COL
dey
lda #00
sta (POINTER),y
dec PADDLEL
dec PADDLER
jmp done
checkR: cmp #$81 ; check key == RIGHT
bne done
ldy PADDLER
cpy #$20
beq done
ldy #$1f ; load POINTER with start-of-row
lda table_low,y
sta POINTER
lda table_high,y
sta POINTER_H
ldy PADDLEL ; store CURSOR at POINTER plus COL
lda #00
sta (POINTER),y
inc PADDLEL
inc PADDLER
done:
ldy #$0
scorePrint:
lda score,y
beq scoreNum
sta $f0F0,y
iny
bne scorePrint
scoreNum:
lda SCORE
and #$F0
LSR
LSR
LSR
LSR
TAY
lda number,y
sta $f0f8
lda SCORE
and #$0F
TAY
lda number,y
sta $f0f9
lda SCORE
clc ; repeat
jmp draw
gameover:
brk
clear: lda table_low ; clear the screen
sta POINTER
lda table_high
sta POINTER_H
ldy #$00
tya
c_loop: sta (POINTER),y
iny
bne c_loop
inc POINTER_H
ldx POINTER_H
cpx #$06
bne c_loop
jmp setup
; these two tables contain the high and low bytes
; of the addresses of the start of each row
table_high:
dcb $02,$02,$02,$02,$02,$02,$02,$02
dcb $03,$03,$03,$03,$03,$03,$03,$03
dcb $04,$04,$04,$04,$04,$04,$04,$04
dcb $05,$05,$05,$05,$05,$05,$05,$05,
table_low:
dcb $00,$20,$40,$60,$80,$a0,$c0,$e0
dcb $00,$20,$40,$60,$80,$a0,$c0,$e0
dcb $00,$20,$40,$60,$80,$a0,$c0,$e0
dcb $00,$20,$40,$60,$80,$a0,$c0,$e0
; help message on character screen
help:
dcb "A","r","r","o","w",32,"k","e","y","s"
dcb 32,"C","o","n","t","r","o","l",32,"p","a","d"
dcb "d","l","e"
dcb 00
score:
dcb "S","C","O","R","E",":",32
dcb 00
number:
dcb "0","1","2","3","4","5","6","7"
dcb "8","9","A","B","C","D","E","F"
dcb 00
This is where I decided to cut off coding for this task with quite a few tweeks and improvments to the code. The First big change is the removing of the flicker, I did this by only removing the ball when it moves and only removing one pixel from the paddle when it mves. Next I added a score feature which was an interesting task to tackel. Firstly I had to find a way to incriment the score which was accomplished by checking first to see in the paddle had been hit and second to see if the ball was off the paddle and if both those things were true the score could be incrimented. Secondly I chose to make the score decimal instead of Hex since that is the number system most people are used to. Luckily through so research I was able to find out about decimal mode on the 6502, which allows numbers to be stored in a byte as two decimal digits taking up 4 bits per digit. thus allowing the score to be properly relayed to the player. All together this made the pong app both easier on the eyes and more enjoyable since progress was tacked.
The process of building this app has furthered my understanding of assembly programming quite a bit. I feel that in gerneral in order to get a grasp for many of the concepts they just have to be played with. Some of the odd quirks and how the computer actually works with the bits is something which is difficult to learn without experienceing it and I feel this task accomplished that.
The Code for the bouncing ball iteration no longer exists but it worked moving the ball in a two step process, the first step said yes the ball should move, the second said which way it should move. This was accomplished by first splitting the movement into the X and Y and then either incriminating or decrementing depending on which surface the ball last bounced off of. The first version of this as quite basic and did not clear the old ball position and only worked at a 45 degree angle resulting in a diamond being drawn on the screen but it was a start.
The screen not being cleared was beginning to bother me and so I implemented a very simple solution, first I copied the screen clear code from the etch-a-sketch program and had it happen on every game loop.
clear: lda table_low ; clear the screen
sta POINTER
lda table_high
sta POINTER_H
ldy #$00
tya
c_loop: sta (POINTER),y
iny
bne c_loop
inc POINTER_H
ldx POINTER_H
cpx #$06
bne c_loop
second I added a delay so that the ball was drawn for longer than it was not making it so that it was less likely to blink out of existence. This had a few issues but I finally had a single ball bouncing around a screen.
Finally The game was coming together and only needed a few more elements to make it functional, the first and most important being the paddle, for this I added a new collision procedure which checked to see if the ball was on a pixel which counted as being on the paddle and if so to make it bounce. I also made it into more of a game by changing the bottom collision to game over and making it so that the x and y velocity got randomized whenever the paddle was hit. Here is the code for that itteration
; zero-page variable locations
define ROW $20 ; current row
define COL $21 ; current column
define DELTAX $30 ; current Delta X
define DELTAY $31 ; current Delta Y
define BOUNCEX $35 ; checks if X has bounced
define BOUNCEY $36 ; checks if Y has bounced
define VELX $38
define VELY $39
define POINTER $10 ; ptr start of row
define POINTER_H $11
define PADDLEL $40
define PADDLER $41
; constants
define DOT $01 ; dot colour
define PADDLE $07 ; black colour
ldy #$00 ; put help text on screen
print: lda help,y
beq setup
sta $f000,y
iny
bne print
setup: lda #$0f ; set initial ROW,COL
sta ROW
lda #$00
sta COL
lda #$20
sta VELX
lda #$20
sta VELY
lda #$0C
sta PADDLEL
lda #$14
sta PADDLER
draw: lda ROW ; ensure ROW is in range 031
and #$1f
sta ROW
lda COL ; ensure COL is in range 031
and #$1f
sta COL
ldy ROW ; load POINTER with start-of-row
lda table_low,y
sta POINTER
lda table_high,y
sta POINTER_H
ldy COL ; store CURSOR at POINTER plus COL
lda #DOT
sta (POINTER),y
drawPaddle:
ldy #$1f ; load POINTER with start-of-row
lda table_low,y
sta POINTER
lda table_high,y
sta POINTER_H
ldy PADDLEL ; store CURSOR at POINTER plus COL
lda #PADDLE
paddleLoop:
sta (POINTER),y
iny
cpy PADDLER
bne paddleLoop
colidR: lda COL
cmp #$1F
bne colidL
sta BOUNCEY
colidL: lda COL
cmp #$00
bne colidD
sta BOUNCEY
colidD: lda ROW
cmp #$1F
bne colidU
CLC
jmp gameover
colidU: lda ROW
cmp #$00
bne colidP
sta BOUNCEX
colidP: CLC
lda ROW
cmp #$1E
bne ballX
lda COL
cmp PADDLEL
bcc ballX
cmp PADDLER
bcs ballX
sta BOUNCEX
lda $fe ;randomize vel when hitting paddle
cmp #$80 ;ensure vel isn't too high
bcc velx
adc #$81
velx:
sta VELX
lda $fe
sta VELY
ballX: lda VELX
adc DELTAX
sta DELTAX
bcc ballY
CLC
lda BOUNCEX
cmp #$00
bne decROW
incROW: inc ROW
CLC
bcc ballY
decROW: dec ROW
ballY:
lda VELY
adc DELTAY
sta DELTAY
bcc getkey
CLC
lda BOUNCEY
cmp #$00
bne decCOL
incCOL: inc COL
CLC
bcc getkey
decCOL: dec COL
getkey: lda $ff ; get a keystroke
ldx #$00 ; clear out the key buffer
stx $ff
cmp #$83 ; check key == LEFT
bne checkR
ldy PADDLEL
cpy #$00
beq checkR
dec PADDLEL
dec PADDLER
jmp delaya
checkR: cmp #$81 ; check key == RIGHT
bne delaya
ldy PADDLER
cpy #$20
beq delaya
inc PADDLEL
inc PADDLER
delaya: ldy #$00 ; Delay processor so that ball doesn't flash at top of screen
ldx #$00
delay: iny
cpy #$FF
bne delay
ldy #$00
inx
cpx #$06
bne delay
clear: lda table_low ; clear the screen
sta POINTER
lda table_high
sta POINTER_H
ldy #$00
tya
c_loop: sta (POINTER),y
iny
bne c_loop
inc POINTER_H
ldx POINTER_H
cpx #$06
bne c_loop
done: clc ; repeat
jmp draw
gameover:
brk
; these two tables contain the high and low bytes
; of the addresses of the start of each row
table_high:
dcb $02,$02,$02,$02,$02,$02,$02,$02
dcb $03,$03,$03,$03,$03,$03,$03,$03
dcb $04,$04,$04,$04,$04,$04,$04,$04
dcb $05,$05,$05,$05,$05,$05,$05,$05,
table_low:
dcb $00,$20,$40,$60,$80,$a0,$c0,$e0
dcb $00,$20,$40,$60,$80,$a0,$c0,$e0
dcb $00,$20,$40,$60,$80,$a0,$c0,$e0
dcb $00,$20,$40,$60,$80,$a0,$c0,$e0
; help message on character screen
help:
dcb "A","r","r","o","w",32,"k","e","y","s"
dcb 32,"d","r","a","w",32,"/",32,"'","C","'"
dcb 32,"k","e","y",32,"c","l","e","a","r","s"
dcb 00
As you can probably tell there is still some left over fragments from the etch-a-sketch which need to be removed but all and all this code will make a working game of pong with the 6502. It draws and moves a ball, receives keyboard input and move the paddle. This is not where I decided to stop with this program though.
; zero-page variable locations
define ROW $20 ; current row
define COL $21 ; current column
define DELTAX $30 ; current Delta X
define DELTAY $31 ; current Delta Y
define BOUNCEX $35 ; checks if X has bounced
define BOUNCEY $36 ; checks if Y has bounced
define VELX $38
define VELY $39
define POINTER $10 ; ptr start of row
define POINTER_H $11
define PADDLEL $40
define PADDLER $41
define SCORE $24
define HIT $23
; constants
define DOT $01 ; dot colour
define PADDLE $07 ; black colour
ldy #$00 ; put help text on screen
print: lda help,y
beq setup
sta $f000,y
iny
bne print
setup: lda #$0f ; set initial ROW,COL
sta ROW
lda #$00
sta COL
lda #$20
sta VELX
lda #$20
sta VELY
lda #$0B
sta PADDLEL
lda #$15
sta PADDLER
lda #$00
sta SCORE
draw: lda ROW ; ensure ROW is in range 031
and #$1f
sta ROW
lda COL ; ensure COL is in range 031
and #$1f
sta COL
ldy ROW ; load POINTER with start-of-row
lda table_low,y
sta POINTER
lda table_high,y
sta POINTER_H
ldy COL ; store CURSOR at POINTER plus COL
lda #DOT
sta (POINTER),y
drawPaddle:
ldy #$1f ; load POINTER with start-of-row
lda table_low,y
sta POINTER
lda table_high,y
sta POINTER_H
ldy PADDLEL ; store CURSOR at POINTER plus COL
lda #PADDLE
paddleLoop:
sta (POINTER),y
iny
cpy PADDLER
bne paddleLoop
colidR: lda COL
cmp #$1F
bne colidL
sta BOUNCEY
colidL: lda COL
cmp #$00
bne colidD
sta BOUNCEY
colidD: lda ROW
cmp #$1F
bne colidU
CLC
jmp gameover
colidU: lda ROW
cmp #$00
bne colidP
sta BOUNCEX
colidP: CLC
lda ROW
cmp #$1E
bne incScore
lda COL
cmp PADDLEL
bcc incScore
cmp PADDLER
bcs incScore
sta BOUNCEX
inc HIT
lda $fe ;randomize vel when hitting paddle
cmp #$80 ;ensure vel isn't too high
bcc velx
adc #$81
velx:
sta VELX
lda $fe
sta VELY
incScore:
CLC
lda ROW
cmp #$1D
bne delaya
lda HIT
cmp #$00
beq delaya
lda #$00
sta HIT
SED
CLC
lda SCORE
adc #$01
sta SCORE
CLD
delaya: ldy #$00 ; Delay processor to slow down game
ldx #$00
delay: iny
cpy #$FF
bne delay
ldy #$00
inx
cpx #$08
bne delay
ballX: lda VELX
adc DELTAX
sta DELTAX
bcc ballY
ldy ROW ; load POINTER with start-of-row
lda table_low,y
sta POINTER
lda table_high,y
sta POINTER_H
ldy COL ; store CURSOR at POINTER plus COL
lda #00
sta (POINTER),y
CLC
lda BOUNCEX
cmp #$00
bne decROW
incROW: inc ROW
CLC
bcc ballY
decROW: dec ROW
ballY:
lda VELY
adc DELTAY
sta DELTAY
bcc getkey
ldy ROW ; load POINTER with start-of-row
lda table_low,y
sta POINTER
lda table_high,y
sta POINTER_H
ldy COL ; store CURSOR at POINTER plus COL
lda #00
sta (POINTER),y
CLC
lda BOUNCEY
cmp #$00
bne decCOL
incCOL: inc COL
CLC
bcc getkey
decCOL: dec COL
getkey: lda $ff ; get a keystroke
ldx #$00 ; clear out the key buffer
stx $ff
cmp #$83 ; check key == LEFT
bne checkR
ldy PADDLEL
cpy #$00
beq checkR
ldy #$1f ; load POINTER with start-of-row
lda table_low,y
sta POINTER
lda table_high,y
sta POINTER_H
ldy PADDLER ; store CURSOR at POINTER plus COL
dey
lda #00
sta (POINTER),y
dec PADDLEL
dec PADDLER
jmp done
checkR: cmp #$81 ; check key == RIGHT
bne done
ldy PADDLER
cpy #$20
beq done
ldy #$1f ; load POINTER with start-of-row
lda table_low,y
sta POINTER
lda table_high,y
sta POINTER_H
ldy PADDLEL ; store CURSOR at POINTER plus COL
lda #00
sta (POINTER),y
inc PADDLEL
inc PADDLER
done:
ldy #$0
scorePrint:
lda score,y
beq scoreNum
sta $f0F0,y
iny
bne scorePrint
scoreNum:
lda SCORE
and #$F0
LSR
LSR
LSR
LSR
TAY
lda number,y
sta $f0f8
lda SCORE
and #$0F
TAY
lda number,y
sta $f0f9
lda SCORE
clc ; repeat
jmp draw
gameover:
brk
clear: lda table_low ; clear the screen
sta POINTER
lda table_high
sta POINTER_H
ldy #$00
tya
c_loop: sta (POINTER),y
iny
bne c_loop
inc POINTER_H
ldx POINTER_H
cpx #$06
bne c_loop
jmp setup
; these two tables contain the high and low bytes
; of the addresses of the start of each row
table_high:
dcb $02,$02,$02,$02,$02,$02,$02,$02
dcb $03,$03,$03,$03,$03,$03,$03,$03
dcb $04,$04,$04,$04,$04,$04,$04,$04
dcb $05,$05,$05,$05,$05,$05,$05,$05,
table_low:
dcb $00,$20,$40,$60,$80,$a0,$c0,$e0
dcb $00,$20,$40,$60,$80,$a0,$c0,$e0
dcb $00,$20,$40,$60,$80,$a0,$c0,$e0
dcb $00,$20,$40,$60,$80,$a0,$c0,$e0
; help message on character screen
help:
dcb "A","r","r","o","w",32,"k","e","y","s"
dcb 32,"C","o","n","t","r","o","l",32,"p","a","d"
dcb "d","l","e"
dcb 00
score:
dcb "S","C","O","R","E",":",32
dcb 00
number:
dcb "0","1","2","3","4","5","6","7"
dcb "8","9","A","B","C","D","E","F"
dcb 00
This is where I decided to cut off coding for this task with quite a few tweeks and improvments to the code. The First big change is the removing of the flicker, I did this by only removing the ball when it moves and only removing one pixel from the paddle when it mves. Next I added a score feature which was an interesting task to tackel. Firstly I had to find a way to incriment the score which was accomplished by checking first to see in the paddle had been hit and second to see if the ball was off the paddle and if both those things were true the score could be incrimented. Secondly I chose to make the score decimal instead of Hex since that is the number system most people are used to. Luckily through so research I was able to find out about decimal mode on the 6502, which allows numbers to be stored in a byte as two decimal digits taking up 4 bits per digit. thus allowing the score to be properly relayed to the player. All together this made the pong app both easier on the eyes and more enjoyable since progress was tacked.
The process of building this app has furthered my understanding of assembly programming quite a bit. I feel that in gerneral in order to get a grasp for many of the concepts they just have to be played with. Some of the odd quirks and how the computer actually works with the bits is something which is difficult to learn without experienceing it and I feel this task accomplished that.
Thursday, January 30, 2020
Lab 3 - Pong pt.1
For this Lab we have begun to build off of our knowledge in 6502 assembly in order to make a more robust program. We were given five options in tasks to do and with very little extra help had to figure out how to achieve an effective result. The five options were to create a bouncing graphic (think dvd logo), to create a numeric display which displayed two digits, to create the game pong, to create a kaleidoscope where one quadrant is mirrored in the other three, lastly and most challenging to draw a line between to points that can be moved around in real time.
Our group chose to work on Pong since it seemed like an enjoyable app to create and at least a couple of us had a bit of a grasp on how they wanted to tackle the problem. We started off by looking at some example code that was provided for us for a fairly unrelated program but it allowed us to get some good ideas for how to create out code (Link to Example Code). This code specifically helped us with Three things.
; zero-page variable locations
define DOTROW $20 ; current row
define DOTCOL $21 ; current column
define DOTDELTAX $30 ; current Delta X
define DOTDELTAY $31 ; current Delta Y
define POINTER $10 ; ptr: start of row
define POINTER_H $11
; constants
define DOT $01 ; dot colour
define CURSOR $04 ; black colour
ldy #$00 ; put help text on screen
print: lda help,y
beq setup
sta $f000,y
iny
bne print
setup: lda #$0f ; set initial ROW,COL
sta DOTROW
lda #$02
sta DOTCOL
lda #$20 ;set angle to 45
sta DOTDELTAX
sta DOTDELTAY
game: lda DOTROW ; ensure ROW is in range 0:31
and #$1f
sta DOTROW
lda DOTCOL ; ensure COL is in range 0:31
and #$1f
sta DOTCOL
ldy DOTROW ; load POINTER with start-of-row
lda table_low,y
sta POINTER
lda table_high,y
sta POINTER_H
pha ; save A
lda #DOT ; set current position to DOT
sta (POINTER),y
pla ; restore A
DotMovA:lda DOTCOL
inc DOTCOL
lda DOTCOL
DotMovB:lda DOTROW
inc DOTROW
lda DOTROW
done: clc ; repeat
bcc game
; these two tables contain the high and low bytes
; of the addresses of the start of each row
table_high:
dcb $02,$02,$02,$02,$02,$02,$02,$02
dcb $03,$03,$03,$03,$03,$03,$03,$03
dcb $04,$04,$04,$04,$04,$04,$04,$04
dcb $05,$05,$05,$05,$05,$05,$05,$05,
table_low:
dcb $00,$20,$40,$60,$80,$a0,$c0,$e0
dcb $00,$20,$40,$60,$80,$a0,$c0,$e0
dcb $00,$20,$40,$60,$80,$a0,$c0,$e0
dcb $00,$20,$40,$60,$80,$a0,$c0,$e0
; help message on character screen
help:
dcb "A","r","r","o","w",32,"k","e","y","s"
dcb 32,"d","r","a","w",32,"/",32,"'","C","'"
dcb 32,"k","e","y",32,"c","l","e","a","r","s"
dcb 00
The code above is greatly unaltered from the etch-a-sketch code. All it does is use that code to draw a line across the screen continuously but It allowed us to learn quite a lot about how to get a ball moving across the screen since it practically is that just without the previous position being removed thus a line is drawn. So from that base it is quite simple to begin to work out how a ball will move properly such as in pong which will be talked about in the next Blog.
Our group chose to work on Pong since it seemed like an enjoyable app to create and at least a couple of us had a bit of a grasp on how they wanted to tackle the problem. We started off by looking at some example code that was provided for us for a fairly unrelated program but it allowed us to get some good ideas for how to create out code (Link to Example Code). This code specifically helped us with Three things.
- How to turn a screen made of pages into coordinates
- How to use those coordinates to draw on the screen
- How to take keyboard input
; zero-page variable locations
define DOTROW $20 ; current row
define DOTCOL $21 ; current column
define DOTDELTAX $30 ; current Delta X
define DOTDELTAY $31 ; current Delta Y
define POINTER $10 ; ptr: start of row
define POINTER_H $11
; constants
define DOT $01 ; dot colour
define CURSOR $04 ; black colour
ldy #$00 ; put help text on screen
print: lda help,y
beq setup
sta $f000,y
iny
bne print
setup: lda #$0f ; set initial ROW,COL
sta DOTROW
lda #$02
sta DOTCOL
lda #$20 ;set angle to 45
sta DOTDELTAX
sta DOTDELTAY
game: lda DOTROW ; ensure ROW is in range 0:31
and #$1f
sta DOTROW
lda DOTCOL ; ensure COL is in range 0:31
and #$1f
sta DOTCOL
ldy DOTROW ; load POINTER with start-of-row
lda table_low,y
sta POINTER
lda table_high,y
sta POINTER_H
pha ; save A
lda #DOT ; set current position to DOT
sta (POINTER),y
pla ; restore A
DotMovA:lda DOTCOL
inc DOTCOL
lda DOTCOL
DotMovB:lda DOTROW
inc DOTROW
lda DOTROW
done: clc ; repeat
bcc game
; these two tables contain the high and low bytes
; of the addresses of the start of each row
table_high:
dcb $02,$02,$02,$02,$02,$02,$02,$02
dcb $03,$03,$03,$03,$03,$03,$03,$03
dcb $04,$04,$04,$04,$04,$04,$04,$04
dcb $05,$05,$05,$05,$05,$05,$05,$05,
table_low:
dcb $00,$20,$40,$60,$80,$a0,$c0,$e0
dcb $00,$20,$40,$60,$80,$a0,$c0,$e0
dcb $00,$20,$40,$60,$80,$a0,$c0,$e0
dcb $00,$20,$40,$60,$80,$a0,$c0,$e0
; help message on character screen
help:
dcb "A","r","r","o","w",32,"k","e","y","s"
dcb 32,"d","r","a","w",32,"/",32,"'","C","'"
dcb 32,"k","e","y",32,"c","l","e","a","r","s"
dcb 00
The code above is greatly unaltered from the etch-a-sketch code. All it does is use that code to draw a line across the screen continuously but It allowed us to learn quite a lot about how to get a ball moving across the screen since it practically is that just without the previous position being removed thus a line is drawn. So from that base it is quite simple to begin to work out how a ball will move properly such as in pong which will be talked about in the next Blog.
Sunday, January 26, 2020
Lab 2 - 6502 Experiments
The next few entries on this blog are going to be detailing our look into learning assembly code on the 6502 processor.
we were provided the following code
we were provided the following code
lda #$00 ; set a pointer at $40 to point to $0200
sta $40
lda #$02
sta $41
lda #$07 ; colour
ldy #$00 ; set index to 0
loop: sta ($40),y ; set pixel
iny ; increment index
bne loop ; continue until done the page
inc $41 ; increment the page
ldx $41 ; get the page
cpx #$06 ; compare with 6
bne loop ; continue until done all pages
This code will Fill the page with the colour Yellow by looping through the each address of a
page and setting it to yellow, and then incriminating the page until the screen is filled.
When we insert the command tya into the code at the start of the loop the screen fills with
strips of colour. This is because that command will transfer the value of y into a, this value
will loop every 16 colours since there are only 16 colour values, and the screen is 32 pixels so
it loops perfectly and lines up. this is also why the colours repeat.
Adding in the lsr command now will shift the bits in the colour to the right, and as such
remove the least significant digit. This results in an effective division by 2 and so the colours
appear twice as thick. Adding more will result in further division and as such further thickening
instead using asl we will multiply by two instead this reduces the unique values which the
colours can be but they remain 1 pixel thick.
Next we will see what happens when we add more iny. This will result in an interesting
change in which the y values skips ahead 5 times each loop. this will miss the esacpe value
and overflow and will continue doing so until the page is filled in an interesting grainy way.
The final experiment which was done in this lab was to see if we could get 4 lines drawn
across the edges of the screen
lda #$00 ; set a pointer at $40 to point to $0200
sta $40
lda #$02
sta $41
lda #$05 ; colour
ldy #$00 ; set index to 0
loopa:
sta ($40),y ; set pixel
iny ; increment index
cpy #$20 ; compare with 32
bne loopa ; continue until done the page
ldy #$00
loops:
CLC
lda #$07
sta ($40),y ; set pixel
TYA
adc #$1f
TAY
lda #$04
sta ($40),y ; set pixel
iny
cpy #$00
bne loops
inc $41 ; increment the page
ldx $41 ; get the page
cpx #$06 ; compare with 6
bne loops ; continue until done all pages
ldy #$E0
lda #$0e
dec $41
loopb:
sta ($40),y ; set pixel
iny ; increment index
cpy #$00 ; compare with 32
bne loopb ; continue until done the page
This code will write 4 lines across the 4 edges of the screen. It does so with 3 loops.
The first loop will loop across the addresses at the top of the screen inserting the colour into
those addresses
the second loop which is also the most involved will insert a pixel into the first address of a
line and then add to the cursor the 31 which brings it to the last pixel of the line, drawing a
different colour and then adding one again to start at the beginning once more. Once it has
gotten to the end of a page it will increment to the next page, resulting in two vertical lines.
lastly now that we are on the last page we can draw the final line at the bottom by starting at
the first pixel on the last line and looping through till the end of the line.
These tree loops result in the three lines being successfully drawn.
Friday, January 17, 2020
Lab 1 - Open Source Research
In my search for open source Software I decided to look into two which I have used in the past and continue to use to this day. Those being Firefox and GIMP.
Firefox being the software I'm currently using to display this page has come a long way in large part thanks to its open source community. There is a vast number of people bug hunting and bug fixing, as well as a fairly understandable code review process. The example which I looked at can be seen here:
https://phabricator.services.mozilla.com/D48202
This is a simple bug fix which was approved back in October of 2019, It was written by a single contributor and reviewed by a single reviewer, being either the module owner or a designated peer, before being accepted onto the main branch.
The other piece of software, an image editing tool known as GIMP also takes the open source approach. To get a contribution added to GIMP, you must like Firefox make a fork then make a merge request. This request is then viewed by a developer at GIMP for review and any tweaks that need to be made will be made as well as receiving community feedback. After the code has been finalized or approved in its current state by the developer, the code is merged.
https://gitlab.gnome.org/GNOME/gimp/merge_requests/195
These two ways of merging contributions are similar but show the difference in scale of the two projects. Firefox most likely receives far more merge requests than GIMP, thus forcing them to spread out their commit privilege to the community where as GIMP can afford to only allow employees to merge.
Firefox being the software I'm currently using to display this page has come a long way in large part thanks to its open source community. There is a vast number of people bug hunting and bug fixing, as well as a fairly understandable code review process. The example which I looked at can be seen here:
https://phabricator.services.mozilla.com/D48202
This is a simple bug fix which was approved back in October of 2019, It was written by a single contributor and reviewed by a single reviewer, being either the module owner or a designated peer, before being accepted onto the main branch.
The other piece of software, an image editing tool known as GIMP also takes the open source approach. To get a contribution added to GIMP, you must like Firefox make a fork then make a merge request. This request is then viewed by a developer at GIMP for review and any tweaks that need to be made will be made as well as receiving community feedback. After the code has been finalized or approved in its current state by the developer, the code is merged.
https://gitlab.gnome.org/GNOME/gimp/merge_requests/195
These two ways of merging contributions are similar but show the difference in scale of the two projects. Firefox most likely receives far more merge requests than GIMP, thus forcing them to spread out their commit privilege to the community where as GIMP can afford to only allow employees to merge.
Subscribe to:
Posts (Atom)