Discussion:
6502 Linux Logo
(too old to reply)
Michael 'AppleWin Debugger Dev'
2017-07-17 08:15:58 UTC
Permalink
Raw Message
I wrote my own 6502 Linux Logo in 703 bytes
https://github.com/Michaelangel007/6502_linux_logo

Features:

* Detects Apple ][, ][+, //e, //e+, //c, //c+
* Detects 48K/64K/128K
* Cleaned up fugly Linux Logo
* Replaced bloated LZSS + data (283 bytes) with simple 2-bit per character (210 bytes)

Inspired from this non-optimized version which has a size of 1,573 ($625) bytes.
* https://github.com/deater/linux_logo
v***@pianoman.cluster.toy
2017-07-17 21:16:41 UTC
Permalink
Raw Message
Post by Michael 'AppleWin Debugger Dev'
I wrote my own 6502 Linux Logo in 703 bytes
https://github.com/Michaelangel007/6502_linux_logo
* Detects Apple ][, ][+, //e, //e+, //c, //c+
* Detects 48K/64K/128K
* Cleaned up fugly Linux Logo
* Replaced bloated LZSS + data (283 bytes) with simple 2-bit per character (210 bytes)
Inspired from this non-optimized version which has a size of 1,573 ($625) bytes.
* https://github.com/deater/linux_logo
Sorry for the delay in responding to this, after initially I got no replies
I hadn't been checking in on c.s.a that regularly.

As I said in the other thread, I would like to stick with LZSS just for
the sake of cross-platform comparison, but I'll definitely go through
this and see what improvements I can carry over.

Is it OK to take code from your implementation if I give you credit?
I didn't see a specific license listed.

I really need to be practicing my 6502 assembly, I'm actually working
on a low-res game in assembly right now which is slowly making progress.
Have too many other job-related things going on right now dividing my time.

Vince
Michael 'AppleWin Debugger Dev'
2017-07-17 21:35:29 UTC
Permalink
Raw Message
Post by v***@pianoman.cluster.toy
Is it OK to take code from your implementation if I give you credit?
I didn't see a specific license listed.
Sure! Updated readme with WTFPL so there are no restrictions.

Feel free to borrow as much/little as you want!
Michael 'AppleWin Debugger Dev'
2017-07-19 17:18:08 UTC
Permalink
Raw Message
On Monday, July 17, 2017 at 2:20:39 PM UTC-7, ***@pianoman.cluster.toy wrote:

Vince

I cleaned up the code paths for the CPU detection -- removed an redundant branch, and redundant STA ROMIN so it a little smaller now.

detect_model
lda MACHINEID1 ; FBB3: $38 = ][, $EA = ][+, $06 = //e //c IIgs
cmp #$38 ; '8'
beq apple_ii

apple_iiplus
pha
jsr AS_HGR ; HGR on Apple ][+ or newer
pla
cmp #$EA ; 'j' apple ][+?
bne apple_iie_iic ; if so keep going

lda #"+"
bne set_apple_ii

; if we get here we're a ii+ or iii in emulation mode
apple_ii
jsr IB_HGR ; HGR on original ][ only!
lda #" " ; "_6502"
set_apple_ii
ldx #"]"
ldy #"["
stx ModType-2
sty ModType-1
sta ModType ; erase last 'e' in 'Apple IIe'

lda #" " ; "_6502"
sta CpuType ; ^^^
ldx #"6"
ldy #"5"
stx CpuType+1
sty CpuType+2

detect_langcard
sta RAMIN ; Detect 16K RAM / Language Card
sta RAMIN ; Read RAM

lda $D000
eor #$FF
sta $D000
cmp $D000
bne apple_ii_48K
eor #$FF
sta $D000

RAM_64K
ldx #"6" ; "64K"
ldy #"4"
bne RAM_size
apple_ii_48K
ldx #"4"
ldy #"8"
RAM_size
lda #" "
sta RamSize ; erase '1' in '128'
stx RamSize+1
sty RamSize+2
bne done_detecting

; Detect //e //e+ //c
apple_iie_iic
lda MACHINEID2 ; FBC0: $00 = //c, $EA = //e, E0 = //e+
beq apple_iic ; check for apple //c
cmp #$E0 ; if we're an Apple IIe (original)
bne RAM_64K ; then use 64K and finish
beq apple_iie_enhanced
apple_iic
lda #"c"
sta CpuType+1
lda MACHINEID3
cmp #$05 ; //c+
bne done_detecting
apple_iie_enhanced ; //c+
ldx #1 ; //e+
jsr ModelPlus ; ^
done_detecting
sta ROMIN ; Turn off Language Card
barrym95838
2017-07-24 17:52:04 UTC
Permalink
Raw Message
Post by Michael 'AppleWin Debugger Dev'
I cleaned up the code paths for the CPU detection -- removed an redundant
branch, and redundant STA ROMIN so it a little smaller now.
clc ; every 8 HGR scanline address
adc #$1c ; is Text Page $04 + $1C = HGR Page $20

For this limited address range, you should be able to save a byte with

eor #$24

instead, right?

Mike B.
Michael 'AppleWin Debugger Dev'
2017-07-24 18:44:47 UTC
Permalink
Raw Message
Post by barrym95838
For this limited address range, you should be able to save a byte with
eor #$24
instead, right?
Indeed! Haven't seen that one in a while! Nice optimization.

One day I'm going to start collating all these tricks. I keep forgetting about half of them!

664 bytes now.
barrym95838
2017-07-24 20:54:29 UTC
Permalink
Raw Message
Post by Michael 'AppleWin Debugger Dev'
One day I'm going to start collating all these tricks. I keep forgetting
about half of them!
664 bytes now.
lda zTxtPtr+0 ; every 8 HGR scanline address
sta zHgrPtr+0 ; is exactly same as Text low byte

This proposal is slightly risky, but have you considered the notion of
eliminating these four bytes of code by merging zHgrPtr into zTxtPtr
and using using zTxtPtr to plot your logo? I think that COUT might
write one char to an address outside the text page before righting
itself, but it shouldn't cause any harm ... maybe it might trigger an
unwanted scroll on return to the command line?

Mike B.
Michael 'AppleWin Debugger Dev'
2017-07-24 21:42:39 UTC
Permalink
Raw Message
Post by barrym95838
This proposal is slightly risky, but have you considered the notion of
eliminating these four bytes of code by merging zHgrPtr into zTxtPtr
and using using zTxtPtr to plot your logo?
I hadn't! Interesting idea.
Post by barrym95838
I think that COUT might
write one char to an address outside the text page before righting
itself, but it shouldn't cause any harm ... maybe it might trigger an
unwanted scroll on return to the command line?
I took another look at the code path. I'm happy to report this is perfectly safe in _this_ instance.

Why?

I call BASCALC @ PrintText before doing _any_ text output so re-using the zTxtPtr as an HGR Ptr works perfectly. :-)

Down to 660 bytes now; GitHub has been updated.
barrym95838
2017-07-27 06:54:38 UTC
Permalink
Raw Message
Post by Michael 'AppleWin Debugger Dev'
Down to 660 bytes now; GitHub has been updated.
I think I may have a way to squeeze another three bytes, but it's
not convenient for me to test it. Please forgive me if it sucks:

; ----------------------------------------------------------------
; Update the Text Address
; Update the HGR scanline Address
inc zCursorY
lda zCursorY
jsr BASCALC

; ----------------------------------------------------------------
; Copy unpacked buffer to 8 HGR scanlines

ldy #39 ; 280/7 = 40 bytes/scanline
lda zTxtPtr+1 ; Translate text page row address to
eor #$24 ; every eighth HGR row address
tax

Copy7x8
lda UnpackAddr,Y
stx zTxtPtr+1
sta (zTxtPtr),Y
inx ; y = y+1
inx ; HGR addr_y+1 = addr_y + $0400
inx
inx
cpx #$40
bcc Copy7x8 ; Loop until 7x8 pixel block is complete
txa
sbc #$20 ; Reset pointer for top of next 7x8 block
tax
lda #0
sta UnpackAddr,Y ; Clear source on last scanline copy
dey
bpl Copy7x8 ; Loop until all 280x8 pixels are copied

ldx zCursorY ; (1) X=14, see (2)
cpx #$14 ; Y=$40 .. $A0, Rows $8..$13 (inclusive)
FitSameByte
sta zDstShift
ldy zSrcOffset ; NOTE: C=0 from CMPs above
rts

I'm pretty sure that you could save about a dozen more bytes by
eliminating the UnpackAddr buffer and unpacking straight to the
top HGR scan line for each row, then ripple-copying that line to
the next seven scan lines, but that's a bit more ambitious than
this attempt.

Mike B.
Michael 'AppleWin Debugger Dev'
2017-07-27 16:24:37 UTC
Permalink
Raw Message
Post by barrym95838
Post by Michael 'AppleWin Debugger Dev'
Down to 660 bytes now; GitHub has been updated.
I think I may have a way to squeeze another three bytes, but it's
not convenient for me to test it.
Actually only able to save 1 byte.

When we are done copying all 8 scan lines, we need to reset the where in the unpack buffer the next byte should go.

lda #0
sta UnpackAddr,Y ; Clear source on last scanline copy
sta zDstOffset ; reset to start of unpack buffer
dey
bpl Copy7x8 ; Loop until all 280x8 pixels are copied

Still, a byte is a byte! Updated git repo.
I love your out-of-the-box thinking! Instead of copying 8 scanline horizontally copying them vertically. :-)
Post by barrym95838
I'm pretty sure that you could save about a dozen more bytes by
eliminating the UnpackAddr buffer and unpacking straight to the
top HGR scan line for each row, then ripple-copying that line to
the next seven scan lines, but that's a bit more ambitious than
this attempt.
I think that might be a win. Currently we use 3 bytes for the absolute indirect:
sta UnpackAddr,Y

This would change to the 2 byte ZP indirect version:
sta (zTxtPtr),Y

I need to juggle the x and y regs around (again) but I think we should still be able to get a net win. I'm swamped today and on the weekend but I might get a chance tomorrow.
barrym95838
2017-08-14 00:59:12 UTC
Permalink
Raw Message
Post by Michael 'AppleWin Debugger Dev'
Actually only able to save 1 byte.
; ------------------------------------------------------------------------

DO CONFIG_PRINT_CPUINFO
txa ; (2) X=$14 from (1) Unpack2Bits
ldx #0 ; This means we have an extra "row" of HGR garbage at Y=160
PrintText ; but we can't see it since we are in mixed mode
jsr BASCALC
ldy #0
CopyTextLine
lda TextLine,X
sta (zTxtPtr),Y
inx ; 3*40 = 120 chars max
iny
cpy #40
bne CopyTextLine

inc zCursorY
lda zCursorY
cmp #$17 ; Rows $14..$16 (inclusive)
bne PrintText

dec zCursorY

; ------------------------------------------------------------------------

If COUT doesn't mess with X, you can let it do most of the dirty work
for you and get rid of the "jsr HOME" at the beginning of "Main":

; ------------------------------------------------------------------------

DO CONFIG_PRINT_CPUINFO
lda #$17 ; bottom line
jsr BASCALC
PrintText
ldx #137 ; -119
CopyTextLine
lda TextLine-137,X
jsr $FDED ; COUT
inx
bne CopyTextLine
rts

; ------------------------------------------------------------------------

What's the deal with that "ModelPlus" subroutine? It looks like
you were going to try something a bit more sophisticated with
your model detection, then backed off. Why don't you just replace
the "jsr ModelPlus" with a simplified in-line version of the same
(I can't see any current need to involve X either)?

Mike B.
Michael 'AppleWin Debugger Dev'
2017-08-17 04:54:35 UTC
Permalink
Raw Message
Post by barrym95838
What's the deal with that "ModelPlus" subroutine?
Left-over code from the ][+, //e+, //c+ ... it is only used for the //e+ and //c+ now.
Post by barrym95838
It looks like
you were going to try something a bit more sophisticated with
your model detection, then backed off.
Correct.
Post by barrym95838
Why don't you just replace
the "jsr ModelPlus" with a simplified in-line version of the same
Indeed it can be inlined.
Post by barrym95838
(I can't see any current need to involve X either)?
Yup, good eye!

Down to 654 now.
Michael 'AppleWin Debugger Dev'
2017-08-17 05:49:00 UTC
Permalink
Raw Message
If COUT doesn't mess with X, you can let it do most of the dirty work for you
Yes, COUT saves X and Y which means we can greatly cleanup the 3 line text printing.
We can also trim the last line: "APPLE //e " down to 26 cols instead of the full 40. :-)
NAK. I like the clear screen when someone types TEXT.

Down to 625 bytes!

James Davis
2017-07-28 01:13:07 UTC
Permalink
Raw Message
Post by barrym95838
I'm pretty sure that you could save about a dozen more bytes by
eliminating the UnpackAddr buffer and unpacking straight to the
top HGR scan line for each row, then ripple-copying that line to
the next seven scan lines, but that's a bit more ambitious than
this attempt.
Hi All,

Michael P., this question is really for you:

Wouldn't ripple-copying take longer? Is one byte worth the time lost?

James Davis
Michael 'AppleWin Debugger Dev'
2017-07-28 01:24:40 UTC
Permalink
Raw Message
James,

1. We are optimizing for (disk) space not run time,
2. Decoding time is measured in milliseconds or less -- no care about a few milliseconds
3. I need a hobby when I'm not working on AppleWin's debugger,
4. The only way to stay a good optimizer is to keep practicing,
5. Practical code golf is a good way to think outside the box,
6. I'm mostly done with this project for now so it isn't taking up much time,
7. Its about the journey and results -- the few minutes spent here and there aren't a big deal.

:-)
James Davis
2017-07-28 03:49:05 UTC
Permalink
Raw Message
Post by Michael 'AppleWin Debugger Dev'
James,
1. We are optimizing for (disk) space not run time,
2. Decoding time is measured in milliseconds or less -- no care about a few milliseconds
3. I need a hobby when I'm not working on AppleWin's debugger,
4. The only way to stay a good optimizer is to keep practicing,
5. Practical code golf is a good way to think outside the box,
6. I'm mostly done with this project for now so it isn't taking up much time,
7. Its about the journey and results -- the few minutes spent here and there aren't a big deal.
:-)
OK Michael,

"Code Golf" is a new term for me. I keep reading all this CG here an GG/CSA2 and have not understood why you all do it until now. Back in the day, I was concerned more with time optimization than with code lengh, always wanting to speed things up, but I tried to keep things short, too; but not to the extent that you all do. You guys are really HARDCORE programmers! I gave that all up 21 years ago when I bought my first IBM/Windows machine. Worst thing I ever did, giving way to, and wasting so long in the Windows World. It's hard remember Applesoft, INT, and 6502 Assembly, anymore.

Sincerely,

James Davis
Zellyn
2017-07-28 14:00:11 UTC
Permalink
Raw Message
"Worst thing I ever did, giving way to, and wasting so long in the Windows World. It's hard remember Applesoft, INT, and 6502 Assembly, anymore.
Trust me, it will come back to you with just a small amount of practice :-)

Welcome (back) to the 6502 side…

Zellyn
Anthony Lawther
2017-07-29 03:52:42 UTC
Permalink
Raw Message
Post by James Davis
Post by Michael 'AppleWin Debugger Dev'
James,
1. We are optimizing for (disk) space not run time,
2. Decoding time is measured in milliseconds or less -- no care about a few milliseconds
3. I need a hobby when I'm not working on AppleWin's debugger,
4. The only way to stay a good optimizer is to keep practicing,
5. Practical code golf is a good way to think outside the box,
6. I'm mostly done with this project for now so it isn't taking up much time,
7. Its about the journey and results -- the few minutes spent here and
there aren't a big deal.
:-)
OK Michael,
"Code Golf" is a new term for me. I keep reading all this CG here an
GG/CSA2 and have not understood why you all do it until now. Back in the
day, I was concerned more with time optimization than with code lengh,
always wanting to speed things up, but I tried to keep things short, too;
but not to the extent that you all do. You guys are really HARDCORE
programmers! I gave that all up 21 years ago when I bought my first
IBM/Windows machine. Worst thing I ever did, giving way to, and wasting
so long in the Windows World. It's hard remember Applesoft, INT, and
6502 Assembly, anymore.
Sincerely,
James Davis
No need for the "GG/"; CSA2 is sufficient and more accurate, especially
given that a significant number of regular contributors don't use Google
Groups to access this Usenet group.

Regards, Anthony.
James Davis
2017-07-29 06:27:25 UTC
Permalink
Raw Message
Post by Anthony Lawther
No need for the "GG/"; CSA2 is sufficient and more accurate, especially
given that a significant number of regular contributors don't use Google
Groups to access this Usenet group.
In MY statement above, "GG/CSA2" refers to where I am reading it from, not where everyone else reading it from!
Anthony Lawther
2017-07-29 10:26:05 UTC
Permalink
Raw Message
Post by James Davis
Post by Anthony Lawther
No need for the "GG/"; CSA2 is sufficient and more accurate, especially
given that a significant number of regular contributors don't use Google
Groups to access this Usenet group.
In MY statement above, "GG/CSA2" refers to where I am reading it from,
not where everyone else reading it from!
Understood. My suggestion is related to the origin of the group, and where
it hosted (Usenet).
Michael 'AppleWin Debugger Dev'
2017-07-29 22:10:11 UTC
Permalink
Raw Message
Post by James Davis
"Code Golf" is a new term for me.
Yeah, optimization keeps getting new words. "Code Golf", "Minification", etc.
Post by James Davis
Back in the day, I was concerned more with time optimization than with code length, always wanting to speed things up, but I tried to keep things short, too; but not to the extent that you all do.
Respecting the _user's time_ is STILL important for modern systems in spite of almost everyone forgetting that.

It why we end up with crap like this: Photoshop takes 7 seconds to display _your_ picture even though computers are over 1,000+ time faster.



But for old systems we tend to focus more on "code density", which usually enables more features.
Post by James Davis
It's hard remember Applesoft, INT, and 6502 Assembly, anymore.
/Oblg. "Memory is the second thing to go. I'd tell you the first but I forgot it!"

Yeah, like any skill, use it or lose it. But a little bit of practice goes a long way.

Thankfully newsgroups like this can be a gold-mine as there are usually enough knowledgeably people around who can answer the question.
Michael 'AppleWin Debugger Dev'
2017-07-18 03:44:08 UTC
Permalink
Raw Message
After optimizing Unpack2Bits, etc. 6502 Linux Logo is down to 682 bytes. :-)
b***@gmail.com
2017-08-11 21:26:25 UTC
Permalink
Raw Message
Post by Michael 'AppleWin Debugger Dev'
I wrote my own 6502 Linux Logo in 703 bytes
https://github.com/Michaelangel007/6502_linux_logo
* Detects Apple ][, ][+, //e, //e+, //c, //c+
* Detects 48K/64K/128K
* Cleaned up fugly Linux Logo
* Replaced bloated LZSS + data (283 bytes) with simple 2-bit per character (210 bytes)
Inspired from this non-optimized version which has a size of 1,573 ($625) bytes.
* https://github.com/deater/linux_logo
I think there is a problem with the kernel version detection code.
barrym95838
2017-08-12 06:05:06 UTC
Permalink
Raw Message
Post by b***@gmail.com
I think there is a problem with the kernel version detection code.
:-)

detect_langcard
sta RAMIN ; Detect 16K RAM / Language Card
sta RAMIN ; Read RAM

lda $D000
eor #$FF
sta $D000
cmp $D000
bne apple_ii_48K
eor #$FF
sta $D000

Is it safe to squeeze four more bytes by replacing this with:

detect_langcard
sta RAMIN ; Detect 16K RAM / Language Card
sta RAMIN ; Read RAM

lda $D000
inc $D000
cmp $D000
beq apple_ii_48K
dec $D000

Mike B.
Michael 'AppleWin Debugger Dev'
2017-08-17 05:00:00 UTC
Permalink
Raw Message
Nice optimization!

650 bytes.
Loading...