Discussion:
6502 Print U16 as decimal
(too old to reply)
John Brooks
2017-07-08 02:09:21 UTC
Permalink
Raw Message
In 2015 I made very compact assembly routines to convert binary words to decimal.

The 65816 binary-to-BCD conversion routine is $11 bytes and is posted here:

https://groups.google.com/forum/#!searchin/comp.sys.apple2.programmer/woz$20hextodec%7Csort:relevance/comp.sys.apple2.programmer/NpfRXsf2T0s/9swgn8FMCwAJ

I found that the 6502 version was much larger and more complex than the 65816 version due to the inefficiency of handling BCD-packed digits.

I ended up avoiding BCD and created a compact 6502 routine which could be configured to:
1) display all digits
2) left justify (by skipping leading zeroes)
3) right justify (by replacing leading zeroes with spaces)
4) print 2 to 5 digit numbers

The configurations range between 53 & 61 bytes. I've also added a 4 byte 'Demo' to each config which prints #$1234 as decimal 4660.

len=$34 Print leading zeroes
0800:A2 12 A9 34 48 8A A2 04 A0 FF 84 3D C8 85 3F 68
0810:85 3E 38 FD 30 08 48 A5 3F FD 34 08 B0 EE 68 98
0820:D0 00 49 B0 20 ED FD A0 00 A5 3E CA F0 F4 10 E2
0830:60 0A 64 E8 10 00 00 03 27

len=$3B Left justify (skip leading zeroes)
0800:A2 12 A9 34 48 8A A2 04 A0 FF 84 3D C8 85 3F 68
0810:85 3E 38 FD 36 08 48 A5 3F FD 3A 08 B0 EE 68 98
0820:D0 04 E6 3D 10 07 49 B0 20 ED FD A0 00 C6 3D A5
0830:3E CA F0 F2 10 DC 60 0A 64 E8 10 00 00 03 27

len=$3D Right justify (spaces instead of leading zeroes)
0800:A2 12 A9 34 48 8A A2 04 A0 FF 84 3D C8 85 3F 68
0810:85 3E 38 FD 38 08 48 A5 3F FD 3C 08 B0 EE 68 98
0820:D0 06 E6 3D 30 02 A9 10 49 B0 20 ED FD A0 00 C6
0830:3D A5 3E CA F0 F2 10 DA 60 0A 64 E8 10 00 00 03 27

I'm interested in making the code smaller if anyone has ideas. Code golf anyone? qkumba?

Below is the source, compatible with Merlin 8, Merlin 16, and Merlin32:

-JB
@JBrooksBSI

1 *-------------------------------
2 * DecPrint - 6502 print 16 bits
3 *
4 * 11/8/2015 by John Brooks
5 *-------------------------------
6 org $800
7
8 Demo ldx #$12 ;X=H
9 lda #$34 ;A=L
10 * Fall into DecPrintU16
11
12 *-------------------------------
13
14 DEC_SKIP0 = 1 ;Set to 1 to skip leading zeroes (left justify)
15 DEC_SPACE0 = 0 ;Set to 1 to print leading spaces (right justify)
16 DEC_DIGITS = 5 ;# of digits to print (2-5)
17 DEC_VARS = $3D
18
19 dum DEC_VARS ;Uses 3 temp bytes (ZP or ABS)
20 DecCtr ds 1 ;Leading zero ctr
21 DecWord ds 2 ;U16 being printed
22 dend
23
24 RomCOut = $FDED
25
26 *-------------------------------
27 * Print U16 as decimal via COUT
28 * IN: X=hi, A=Lo
29 * OUT: X=$FF, Y=$00
30 *-------------------------------
31 DecPrintU16
32 pha
33 txa
34 DecModLen = *+1
35 :MOD ldx #DEC_DIGITS-1
36 ldy #-1
37 sty DecCtr
38
39 :Loop iny
40 sta DecWord+1
41 pla
42 sta DecWord
43
44 :DoDigit sec
45 sbc Power10L-1,x
46 pha
47 lda DecWord+1
48 sbc Power10H-1,x
49 bcs :Loop
50
51 :GotDigit
52 pla
53 tya
54 bne :PrDigit ;Print all non-zero digits
55
56 do DEC_SKIP0
57 inc DecCtr
58 bpl :NoDigit ;Skip leading zeroes
59 else
60 do DEC_SPACE0
61 inc DecCtr
62 bmi :PrDigit ;Print digit if we've seen a non-zero
63 lda #$10 ;Else print a space
64 fin
65 fin
66
67 :PrDigit eor #"0"
68 jsr RomCOut
69 ldy #0
70 :NoDigit
71 do DEC_SKIP0+DEC_SPACE0
72 dec DecCtr
73 fin
74 lda DecWord
75 dex
76 beq :PrDigit
77 bpl :DoDigit
78 rts
79
80 Power10L db <10,<100,<1000,<10000
81 Power10H db >10,>100,>1000,>10000
82
83 lst off
Michael 'AppleWin Debugger Dev'
2017-07-08 03:01:34 UTC
Permalink
Raw Message
Post by John Brooks
In 2015 I made very compact assembly routines to convert binary words to decimal.
Nice hat trick!
qkumba is going to have very hard time optimizing that table-driven approach.

Might want to add two comments for readability:

Line 63

lda #$10 ;Else print a space: $10 ^ $30 = $20

Line 76
beq :PrDigit ;Print last digit
qkumba
2017-07-08 03:54:29 UTC
Permalink
Raw Message
Post by Michael 'AppleWin Debugger Dev'
qkumba is going to have very hard time optimizing that table-driven approach.
And yet...

:PrDigit eor #"0"
jsr RomCOut

->

:PrDigit jsr PRHEXZ ;FDE5
qkumba
2017-07-08 04:02:43 UTC
Permalink
Raw Message
Post by qkumba
:PrDigit jsr PRHEXZ ;FDE5
Not for the right-justified version, though.
John Brooks
2017-07-08 05:18:38 UTC
Permalink
Raw Message
Post by qkumba
Post by qkumba
:PrDigit jsr PRHEXZ ;FDE5
Not for the right-justified version, though.
I cleaned up the source, added comments and moved the configuration-specific code into macros as Bredon intended (shout out to Big Mac).

The macro refactor saved a few bytes in the smallest config, and qkumba's mod saved 2 bytes in the two smallest configs:

len=$2F Print leading zeroes
0800:A2 12 A9 34 48 8A A0 FF A2 04 C8 85 3F 68 85 3E
0810:38 FD 2A 08 48 A5 3F FD 2E 08 B0 EE 68 98 20 E5
0820:FD A0 00 A5 3E CA F0 F6 10 E6 60 0A 64 E8 10 00
0830:00 03 27

len=$39 Left justify (skip leading zeroes)
0800:A2 12 A9 34 48 8A A0 FF 84 3D A2 04 C8 85 3F 68
0810:85 3E 38 FD 34 08 48 A5 3F FD 38 08 B0 EE 68 98
0820:D0 04 E6 3D 10 05 20 E5 FD A0 00 C6 3D A5 3E CA
0830:F0 F4 10 DE 60 0A 64 E8 10 00 00 03 27

len=$3D Right justify (spaces instead of leading zeroes)
0800:A2 12 A9 34 48 8A A0 FF 84 3D A2 04 C8 85 3F 68
0810:85 3E 38 FD 38 08 48 A5 3F FD 3C 08 B0 EE 68 98
0820:D0 06 E6 3D 30 02 A9 10 49 B0 20 ED FD A0 00 C6
0830:3D A5 3E CA F0 F2 10 DA 60 0A 64 E8 10 00 00 03 27

Revised, commented source below.

-JB
@JBrooksBSI

1 *-------------------------------
2 * DecPrint - 6502 print 16 bits
3 *
4 * 11/8/2015 by John Brooks
5 *-------------------------------
6 org $800
7
8 Demo ldx #$12 ;X=H
9 lda #$34 ;A=L
10 * Fall into DecPrintU16
11
12 *-------------------------------
13
14 DEC_SKIP0 = 0 ;Set to 1 to skip leading zeroes (left justify)
15 DEC_SPACE0 = 0 ;Set to 1 to print leading spaces (right justify)
16 DEC_DIGITS = 5 ;# of digits to print (1-5)
17 DEC_VARS = $3D
18
19 dum DEC_VARS ;Uses 3 temp bytes (ZP or ABS)
20 DecCtr ds 1 ;Leading zero ctr
21 DecWord ds 2 ;U16 being printed
22 dend
23
24 RomPrHexZ = $FDE5
25 RomCOut = $FDED
26
27 DEC_INIT mac
28 do DEC_SKIP0+DEC_SPACE0
29 sty DecCtr ;-1 means no non-zeroes seen yet
30 fin
31 <<<
32
33 DEC_ZERO mac
34 do DEC_SKIP0
35 bne :PrDigit ;Print all non-zero digits
36 inc DecCtr ;Mark that a zero was found
37 bpl :NoDigit ;Skip leading zeroes
38 else
39
40 do DEC_SPACE0
41 bne :PrDigit ;Print all non-zero digits
42 inc DecCtr ;Mark that a zero was found
43 bmi :PrDigit ;Print digit if we've seen a non-zero
44 lda #$10 ;Else print a space. $10="0" EOR " "
45 fin
46 fin
47 <<<
48
49 DEC_PRINT mac
50 do DEC_SPACE0
51 eor #"0" ;Print 0-9 or space
52 jsr RomCOut
53 else
54 jsr RomPrHexZ ;qkumba opt
55 fin
56 <<<
57
58 DEC_CTR mac
59 do DEC_SKIP0+DEC_SPACE0
60 dec DecCtr ;Mark that a digit was printed
61 fin
62 <<<
63
64 *-------------------------------
65 * Print U16 as decimal via COUT
66 * IN: X=hi, A=Lo
67 * OUT: X=$FF, Y=$00
68 *-------------------------------
69 DecPrintU16
70 pha ;ArgL
71 txa ;ArgH
72 ldy #0-1 ;Start digit is 0 (-1 for extra iny)
73 DEC_INIT ;Init leading zero ctr if needed
74 DecModLen = *+1
75 :MOD ldx #DEC_DIGITS-1 ;# of digits to print
76
77 :Loop iny ;Inc current digit 0-9
78 sta DecWord+1 ;Store remainderH
79 pla
80 sta DecWord ;Store remainderL
81
82 :DoDigit sec ;Try next larger digit
83 sbc :Pow10L-1,x ;Subtract power-of-10 lo
84 pha ;Save in case digit is good
85 lda DecWord+1
86 sbc :Pow10H-1,x ;Subtract power-of-10 hi
87 bcs :Loop ;If sub didn't borrow, try higher digit
88
89 :GotDigit
90 pla ;Subtract failed. Discard remainder low
91 tya ;A=digit 0-9
92 DEC_ZERO ;Handle leading zero case if needed
93 :PrDigit DEC_PRINT ;Print digit 0-9 (or space if right-justify)
94
95 ldy #0 ;Next smaller digit starts at 0
96 :NoDigit
97 DEC_CTR ;Update leading-zero ctr if needed
98 lda DecWord ;Get remainderL (prepare for subtract)
99 dex ;Now calc value of smaller digit
100 beq :PrDigit ;Last digit prints as-is (no leading-zero logic)
101 bpl :DoDigit ;Calc digits 2-DEC_DIGITS
102 rts
103
104 :Pow10L db <10,<100,<1000,<10000
105 :Pow10H db >10,>100,>1000,>10000
106
107 lst off
p***@gmail.com
2017-07-08 04:07:36 UTC
Permalink
Raw Message
~35 bytes with leading zeros - can't test it, but I think it's good, albeit very inefficient. It works by putting the low byte in counting down the binary will BCD-ing up the dec. I store the dec in 1, 2, 3 from least to most significant so when I print I can use a count down loop with a bne without a compare.

tax
stx $00
lda #0
sta $01
sta $02
sta $03
loop:
cld
ldx $00
bne .1
cpy #0
beq printer
.1 dex
bne .2
dey
.2 sed
stx $00
ldx #1
addloop:
clc
lda $01,x
adc #1
bcc loop
inx
cpx #4
beq loop
bne addloop
printer:
ldy #3
.1 lda $00,y
jsr $fdda
dey
bne .1
rts
qkumba
2017-07-08 04:14:53 UTC
Permalink
Raw Message
This one does not work for me. It prints all zeroes, no matter the input.
Also the expected input is not X:A as the others. It looks like Y:A but even so...
barrym95838
2017-07-08 05:16:47 UTC
Permalink
Raw Message
Post by qkumba
This one does not work for me. It prints all zeroes, no matter the input.
Also the expected input is not X:A as the others. It looks like Y:A but even so...
How about:

300:86 A0 85 A1 A9 00 48 A9 00 A2 10 C9 05 90 02 E9
310:05 26 A0 26 A1 2A CA D0 F2 09 B0 48 A5 A0 05 A1
320:D0 E5 68 20 ED FD 68 D0 FA 60

Left justified, 42 bytes, source available on request.

Mike B.
John Brooks
2017-07-08 06:53:13 UTC
Permalink
Raw Message
Post by barrym95838
Post by qkumba
This one does not work for me. It prints all zeroes, no matter the input.
Also the expected input is not X:A as the others. It looks like Y:A but even so...
300:86 A0 85 A1 A9 00 48 A9 00 A2 10 C9 05 90 02 E9
310:05 26 A0 26 A1 2A CA D0 F2 09 B0 48 A5 A0 05 A1
320:D0 E5 68 20 ED FD 68 D0 FA 60
Left justified, 42 bytes, source available on request.
Mike B.
Very cool Mike. It looks like the routine calculates low-digit to high-digit, storing digits on the stack. I don't get how the constant ROLing preserves upper digits while the lower digits are being calculated though.

I'm interested in the source and more info on the algorithm. I don't think I've run across this approach before.

-JB
@JBrooksBSI
barrym95838
2017-07-08 07:17:15 UTC
Permalink
Raw Message
Post by John Brooks
I'm interested in the source and more info on the algorithm.
I don't think I've run across this approach before.
-JB
@JBrooksBSI
; Output 16-bit unsigned integer to stdout
; by Michael T. Barry 2017.07.07. Free to
; copy, use and modify, but without warranty
org 768
iout:
stx $a0 ; low-order half
sta $a1 ; high-order half
lda #0 ; null delimiter for print
pha ; repeat {
iout2: ; divide by 10
lda #0 ; remainder
ldx #16 ; loop counter
iout3:
cmp #5 ; partial remainder >= 10 (/2)?
bcc iout4
sbc #5 ; yes: update partial
; remainder, set carry
iout4:
rol $a0 ; gradually replace dividend
rol $a1 ; with the quotient
rol ; A is gradually replaced
dex ; with the remainder
bne iout3 ; loop 16 times
ora #$b0 ; convert remainder to ASCII
pha ; stack digits in ascending
lda $a0 ; order ('0' for zero)
ora $a1
bne iout2 ; } until quotient is 0
pla
iout5:
jsr $fded ; print digits in descending
pla ; order until delimiter is
bne iout5 ; encountered
rts

See http://forum.6502.org/viewtopic.php?f=2&t=3051
for some background info. I might be wrong, but I
think that I stumbled into an original idea there.

Mike B.
qkumba
2017-07-08 15:43:09 UTC
Permalink
Raw Message
I confess that I don't understand how it works, but it's beautiful.
Now let's do 40 bytes:

iout:
stx $a0 ; low-order half
sta $a1 ; high-order half
ldy #0 ; counter for print
; repeat {
iout2: ; divide by 10
lda #0 ; remainder
ldx #16 ; loop counter
iout3:
cmp #5 ; partial remainder >= 10 (/2)?
bcc iout4
sbc #5 ; yes: update partial
; remainder, set carry
iout4:
rol $a0 ; gradually replace dividend
rol $a1 ; with the quotient
rol ; A is gradually replaced
dex ; with the remainder
bne iout3 ; loop 16 times
pha ; stack digits in ascending
iny ; increase digits counter
lda $a0 ; order ('0' for zero)
ora $a1
bne iout2 ; } until quotient is 0
iout5:
pla
jsr $fde5 ; print digits
dey ; in descending order
bne iout5 ; while count remains
rts
John Brooks
2017-07-08 16:28:44 UTC
Permalink
Raw Message
Post by qkumba
I confess that I don't understand how it works, but it's beautiful.
stx $a0 ; low-order half
sta $a1 ; high-order half
ldy #0 ; counter for print
; repeat {
iout2: ; divide by 10
lda #0 ; remainder
ldx #16 ; loop counter
cmp #5 ; partial remainder >= 10 (/2)?
bcc iout4
sbc #5 ; yes: update partial
; remainder, set carry
rol $a0 ; gradually replace dividend
rol $a1 ; with the quotient
rol ; A is gradually replaced
dex ; with the remainder
bne iout3 ; loop 16 times
pha ; stack digits in ascending
iny ; increase digits counter
lda $a0 ; order ('0' for zero)
ora $a1
bne iout2 ; } until quotient is 0
pla
jsr $fde5 ; print digits
dey ; in descending order
bne iout5 ; while count remains
rts
It looks like a divide with dual outputs - the remainder is in A and the quotient replaces the original input.

Running it repeatedly effectively strips digits from low to high.

Nicely done Mike!

So ironically, the smallest 6502 decimal conversion so far is a clever implementation of the canonical 6502 divide:

int iter=0;
while (input)
{
digits[iter++] = input % 10;
input = input / 10;
}

:)

-JB
@JBrooksBSI
Michael 'AppleWin Debugger Dev'
2017-07-08 16:47:07 UTC
Permalink
Raw Message
Post by John Brooks
int iter=0;
while (input)
{
digits[iter++] = input % 10;
input = input / 10;
}
That doesn't handle the edge case when input == 0. :-/
Trivial enough to fix. :-)

#include <stdio.h>
#include <stdlib.h>

void printu16( unsigned x )
{
char digits[6];
int len = 0;

do
{
digits[len++] = x % 10;
x /= 10;
}
while( x );

while( len )
putchar( '0' | digits[ --len ] );
}

int main( const int nArg, const char *aArg[] )
{
printu16( nArg > 1 ? strtol(aArg[1],NULL,16) : 0x1234 );
putchar( '\n' );
return 0;
}
Michael 'AppleWin Debugger Dev'
2017-07-08 16:50:27 UTC
Permalink
Raw Message
On Saturday, July 8, 2017 at 9:47:09 AM UTC-7, Michael 'AppleWin Debugger Dev' wrote:

P.S.
I prefer to write the output like this -- just to confuse the C noobs who are left wondering what operator '-->' is. :-)

while( len --> 0 )
putchar( '0' | digits[ len ] );
John Brooks
2017-07-08 18:02:42 UTC
Permalink
Raw Message
Post by qkumba
I confess that I don't understand how it works, but it's beautiful.
stx $a0 ; low-order half
sta $a1 ; high-order half
ldy #0 ; counter for print
; repeat {
iout2: ; divide by 10
lda #0 ; remainder
ldx #16 ; loop counter
cmp #5 ; partial remainder >= 10 (/2)?
bcc iout4
sbc #5 ; yes: update partial
; remainder, set carry
rol $a0 ; gradually replace dividend
rol $a1 ; with the quotient
rol ; A is gradually replaced
dex ; with the remainder
bne iout3 ; loop 16 times
pha ; stack digits in ascending
iny ; increase digits counter
lda $a0 ; order ('0' for zero)
ora $a1
bne iout2 ; } until quotient is 0
pla
jsr $fde5 ; print digits
dey ; in descending order
bne iout5 ; while count remains
rts
40 bytes is too easy. Let's do 39 bytes! (+4 for demo)

0300:A9 12 A2 34 86 A0 85 A1 A9 00 48 A9 00 A8 A2 10
0310:C9 05 90 03 E9 05 C8 26 A0 26 A1 2A CA D0 F1 09
0320:B0 88 10 E6 20 ED FD 68 D0 FA 60

Merlin src below.

-JB
@JBrooksBSI


*-------------------------------
* DecPrint - 6502 print 16 bits
*
* by Mike B (barrym95838) 7/7/2017
*
* Disassembly, comments, and
* optimization by John Brooks 7/8/2017
*-------------------------------
lst off

org $0300

Demo lda #$12
ldx #$34
* Fall into DecPrintU16

*-------------------------------

DEC_VARS = $A0

dum DEC_VARS ;Uses 2 temp bytes (ZP or ABS)
DecWord ds 2 ;U16 being printed
dend

RomCOut = $FDED

*-------------------------------
* Print U16 as decimal via COUT
* IN: A=hi, X=lo
* OUT: A=$00, X=$00, Y=$FF
*-------------------------------
DecPrintU16
stx DecWord
sta DecWord+1
lda #0 ;Flag end-of-digits
:DoDigit pha ;Save previous low-order digit
lda #0 ;Remainder=0
tay ;DivDone=true
ldx #16 ;16-bit divide
:Loop cmp #10/2 ;Calc DecWord/10
bcc :Mul2
sbc #10/2 ;Remove high-order digit & shift 1 into DecWord
iny ;DivDone=false
:Mul2 rol DecWord ;Shift /10 result into DecWord
rol DecWord+1
rol ;Shift bits of input into acc (input mod 10)
dex
bne :Loop ;Continue 16-bit divide
ora #"0" ;Convert mod 10 result to ascii
dey
bpl :DoDigit ;If result of /10 was not zero, do next digit
:Print jsr RomCOut ;Print highest digit
pla ;Pull next-highest ascii digit
bne :Print ;If not end-of-digits $00, print more
rts

lst off
qkumba
2017-07-08 18:24:33 UTC
Permalink
Raw Message
Post by John Brooks
40 bytes is too easy. Let's do 39 bytes! (+4 for demo)
37.

0300:A9 12 A2 34 86 A0 85 A1 A9 FF 48 A9 00 A8 A2 10
0310:C9 05 90 03 E9 05 C8 26 A0 26 A1 2A CA D0 F1 88
0320:10 E8 20 E5 FD 68 10 FA 60
g***@sasktel.net
2017-07-08 19:35:57 UTC
Permalink
Raw Message
Post by qkumba
Post by John Brooks
40 bytes is too easy. Let's do 39 bytes! (+4 for demo)
37.
0300:A9 12 A2 34 86 A0 85 A1 A9 FF 48 A9 00 A8 A2 10
0310:C9 05 90 03 E9 05 C8 26 A0 26 A1 2A CA D0 F1 88
0320:10 E8 20 E5 FD 68 10 FA 60
I have to say that this one bares a few similarities to Woz's last screen calculator.


ASL
TAY
AND #$F0
BPL *+2
ORA #5
BCC *+2
ORA #$A
ASL
ASL
STA $26
TYA
AND #$E
ADC #$10
ASL $26
ROL
STA $27
RTS
Michael 'AppleWin Debugger Dev'
2017-07-08 20:09:43 UTC
Permalink
Raw Message
Topic is U16, but easy enough to extend to S16 version:

0300:A9 FE A2 00 C9 80 90 12
0308:A8 A9 AD 20 ED FD 8A 49
0310:FF 18 69 01 AA 98 49 FF
0318:69 00 86 A0 85 A1 A9 FF
0320:48 A9 00 A8 A2 10 C9 05
0328:90 03 E9 05 C8 26 A0 26
0330:A1 2A CA D0 F1 88 10 E8
0338:20 E5 FD 68 10 FA 60

org $0300

Demo lda #$FE ; 65536 - 65024 = (-)512
ldx #$00
* Fall into DecPrintU16

*-------------------------------

DecWord = $A0
PRHEXZ = $FDE5
RomCOut = $FDED

*-------------------------------
* Print U16 as decimal via COUT
* IN: A=hi, X=lo
*-------------------------------
DecPrintS16
cmp #$80 ;HiLo
bcc DecPrintU16 ;C=(Hi > $7F)
tay ;C=1
lda #"-"
jsr RomCOut ; but destroys C
txa ;Calc 2's complement
eor #$FF
clc
adc #1
tax
tya
eor #$FF
adc #0
; *** Intentional fall into DecPrintU16

*-------------------------------
* Print U16 as decimal via COUT
* IN: A=hi, X=lo
* OUT: A=$00, X=$00, Y=$FF
*-------------------------------
DecPrintU16
stx DecWord
sta DecWord+1
lda #$FF ;Flag end-of-digits
:DoDigit pha ;Save previous low-order digit
lda #0 ;Remainder=0
tay ;DivDone=true
ldx #16 ;16-bit divide
:Loop cmp #10/2 ;Calc DecWord/10
bcc :Mul2
sbc #10/2 ;Remove high-order digit & shift 1 into DecWord
iny ;DivDone=false
:Mul2 rol DecWord ;Shift /10 result into DecWord
rol DecWord+1
rol ;Shift bits of input into acc (input mod 10)
dex
bne :Loop ;Continue 16-bit divide
; ora #"0" ;Convert mod 10 result to ascii
dey
bpl :DoDigit ;If result of /10 was not zero, do next digit
:Print jsr PRHEXZ ;Print highest digit
pla ;Pull next-highest ascii digit
bpl :Print ;If not end-of-digits $FF, print more
rts
Michael J. Mahon
2017-07-08 20:55:14 UTC
Permalink
Raw Message
Post by Michael 'AppleWin Debugger Dev'
0300:A9 FE A2 00 C9 80 90 12
0308:A8 A9 AD 20 ED FD 8A 49
0310:FF 18 69 01 AA 98 49 FF
0318:69 00 86 A0 85 A1 A9 FF
0320:48 A9 00 A8 A2 10 C9 05
0328:90 03 E9 05 C8 26 A0 26
0330:A1 2A CA D0 F1 88 10 E8
0338:20 E5 FD 68 10 FA 60
org $0300
Demo lda #$FE ; 65536 - 65024 = (-)512
ldx #$00
* Fall into DecPrintU16
*-------------------------------
DecWord = $A0
PRHEXZ = $FDE5
RomCOut = $FDED
*-------------------------------
* Print U16 as decimal via COUT
* IN: A=hi, X=lo
*-------------------------------
DecPrintS16
cmp #$80 ;HiLo
bcc DecPrintU16 ;C=(Hi > $7F)
tay ;C=1
lda #"-"
jsr RomCOut ; but destroys C
txa ;Calc 2's complement
eor #$FF
clc
adc #1
tax
tya
eor #$FF
adc #0
; *** Intentional fall into DecPrintU16
*-------------------------------
* Print U16 as decimal via COUT
* IN: A=hi, X=lo
* OUT: A=$00, X=$00, Y=$FF
*-------------------------------
DecPrintU16
stx DecWord
sta DecWord+1
lda #$FF ;Flag end-of-digits
:DoDigit pha ;Save previous low-order digit
lda #0 ;Remainder=0
tay ;DivDone=true
ldx #16 ;16-bit divide
:Loop cmp #10/2 ;Calc DecWord/10
bcc :Mul2
sbc #10/2 ;Remove high-order digit & shift 1 into DecWord
iny ;DivDone=false
:Mul2 rol DecWord ;Shift /10 result into DecWord
rol DecWord+1
rol ;Shift bits of input into acc (input mod 10)
dex
bne :Loop ;Continue 16-bit divide
; ora #"0" ;Convert mod 10 result to ascii
dey
bpl :DoDigit ;If result of /10 was not zero, do next digit
:Print jsr PRHEXZ ;Print highest digit
pla ;Pull next-highest ascii digit
bpl :Print ;If not end-of-digits $FF, print more
rts
So while we're at it, drop the initial "cmp #$80" and move the
conditional branch after the "tay" and change it to a "bpl".

That saves two bytes and two cycles in DecPrintS16.
--
-michael

NadaNet 3.1 for Apple II parallel computing!
Home page: http://michaeljmahon.com

"The wastebasket is our most important design
tool--and it's seriously underused."
John Brooks
2017-07-08 21:51:07 UTC
Permalink
Raw Message
Post by qkumba
Post by John Brooks
40 bytes is too easy. Let's do 39 bytes! (+4 for demo)
37.
0300:A9 12 A2 34 86 A0 85 A1 A9 FF 48 A9 00 A8 A2 10
0310:C9 05 90 03 E9 05 C8 26 A0 26 A1 2A CA D0 F1 88
0320:10 E8 20 E5 FD 68 10 FA 60
Now 35 bytes (+4 for demo code):

0300:A9 12 A2 34 86 A0 85 A1 A9 00 A8 A2 10 C9 05 90
0310:03 E9 05 C8 26 A0 26 A1 2A CA D0 F1 48 A9 FD 48
0320:A9 E1 48 88 10 E2 60

-JB
@JBrooksBSI

*-------------------------------
* DecPrint - 6502 print 16 bits
*
* by Mike B (barrym95838) 7/7/2017
*
* Optimized by J.Brooks & qkumba 7/8/2017
*-------------------------------
lst off
org $0300

dum $A0 ;Uses 2 temp bytes (ZP or ABS)
DecWord ds 2 ;U16 being printed
dend

RomPlaPrHex = $FDE2 ;PLA then PrHexZ

*-------------------------------

Demo lda #$12
ldx #$34
* 4 byte demo falls into DecPrintU16

*-------------------------------
* Print U16 as decimal via COUT
* IN: A=hi, X=lo
* OUT: X=$00, Y=$FF
*-------------------------------
DecPrintU16
stx DecWord
sta DecWord+1

:DoDigit lda #0 ;Remainder=0
tay ;DivDone=true
ldx #16 ;16-bit divide

:Div10 cmp #10/2 ;Calc DecWord/10
bcc :Under10
sbc #10/2 ;Remove high-order digit & shift 1 into DecWord
iny ;DivDone=false
:Under10 rol DecWord ;Shift /10 result into DecWord
rol DecWord+1
rol ;Shift bits of input into acc (input mod 10)
dex
bne :Div10 ;Continue 16-bit divide
pha ;Push low digit 0-9 to print
lda #>RomPlaPrHex-1
pha ;Push address of ROM nibble print
lda #<RomPlaPrHex-1
pha
dey ;Chk DivDone > 0
bpl :DoDigit ;If result of /10 was not zero, do next digit
rts
lst off
s***@gmail.com
2017-07-08 22:10:14 UTC
Permalink
Raw Message
Post by John Brooks
0300:A9 12 A2 34 86 A0 85 A1 A9 00 A8 A2 10 C9 05 90
0310:03 E9 05 C8 26 A0 26 A1 2A CA D0 F1 48 A9 FD 48
0320:A9 E1 48 88 10 E2 60
-JB
@JBrooksBSI
*-------------------------------
* DecPrint - 6502 print 16 bits
*
* by Mike B (barrym95838) 7/7/2017
*
* Optimized by J.Brooks & qkumba 7/8/2017
*-------------------------------
lst off
org $0300
dum $A0 ;Uses 2 temp bytes (ZP or ABS)
DecWord ds 2 ;U16 being printed
dend
RomPlaPrHex = $FDE2 ;PLA then PrHexZ
*-------------------------------
Demo lda #$12
ldx #$34
* 4 byte demo falls into DecPrintU16
*-------------------------------
* Print U16 as decimal via COUT
* IN: A=hi, X=lo
* OUT: X=$00, Y=$FF
*-------------------------------
DecPrintU16
stx DecWord
sta DecWord+1
:DoDigit lda #0 ;Remainder=0
tay ;DivDone=true
ldx #16 ;16-bit divide
:Div10 cmp #10/2 ;Calc DecWord/10
bcc :Under10
sbc #10/2 ;Remove high-order digit & shift 1 into DecWord
iny ;DivDone=false
:Under10 rol DecWord ;Shift /10 result into DecWord
rol DecWord+1
rol ;Shift bits of input into acc (input mod 10)
dex
bne :Div10 ;Continue 16-bit divide
pha ;Push low digit 0-9 to print
lda #>RomPlaPrHex-1
pha ;Push address of ROM nibble print
lda #<RomPlaPrHex-1
pha
dey ;Chk DivDone > 0
bpl :DoDigit ;If result of /10 was not zero, do next digit
rts
lst off
You know what makes it extra useful is that you should be able to leverage the same routine to print octal and binary, just by changing the divisor, or by keeping it (Radix/2) in zero-page somewhere. :-)
John Brooks
2017-07-08 23:41:49 UTC
Permalink
Raw Message
Post by John Brooks
Post by qkumba
Post by John Brooks
40 bytes is too easy. Let's do 39 bytes! (+4 for demo)
37.
0300:A9 12 A2 34 86 A0 85 A1 A9 FF 48 A9 00 A8 A2 10
0310:C9 05 90 03 E9 05 C8 26 A0 26 A1 2A CA D0 F1 88
0320:10 E8 20 E5 FD 68 10 FA 60
0300:A9 12 A2 34 86 A0 85 A1 A9 00 A8 A2 10 C9 05 90
0310:03 E9 05 C8 26 A0 26 A1 2A CA D0 F1 48 A9 FD 48
0320:A9 E1 48 88 10 E2 60
-JB
@JBrooksBSI
*-------------------------------
* DecPrint - 6502 print 16 bits
*
* by Mike B (barrym95838) 7/7/2017
*
* Optimized by J.Brooks & qkumba 7/8/2017
*-------------------------------
lst off
org $0300
dum $A0 ;Uses 2 temp bytes (ZP or ABS)
DecWord ds 2 ;U16 being printed
dend
RomPlaPrHex = $FDE2 ;PLA then PrHexZ
*-------------------------------
Demo lda #$12
ldx #$34
* 4 byte demo falls into DecPrintU16
*-------------------------------
* Print U16 as decimal via COUT
* IN: A=hi, X=lo
* OUT: X=$00, Y=$FF
*-------------------------------
DecPrintU16
stx DecWord
sta DecWord+1
:DoDigit lda #0 ;Remainder=0
tay ;DivDone=true
ldx #16 ;16-bit divide
:Div10 cmp #10/2 ;Calc DecWord/10
bcc :Under10
sbc #10/2 ;Remove high-order digit & shift 1 into DecWord
iny ;DivDone=false
:Under10 rol DecWord ;Shift /10 result into DecWord
rol DecWord+1
rol ;Shift bits of input into acc (input mod 10)
dex
bne :Div10 ;Continue 16-bit divide
pha ;Push low digit 0-9 to print
lda #>RomPlaPrHex-1
pha ;Push address of ROM nibble print
lda #<RomPlaPrHex-1
pha
dey ;Chk DivDone > 0
bpl :DoDigit ;If result of /10 was not zero, do next digit
rts
lst off
Another byte bites the dust.

Now 34 bytes (+4 demo):

0300:A2 12 A9 34 20 4A FF A9 00 A8 A2 10 C9 05 90 03
0310:E9 05 C8 26 45 26 46 2A CA D0 F1 48 A9 FD 48 A9
0320:E1 48 88 10 E2 60

-JB
@JBrooksBSI
r***@gmail.com
2017-07-09 03:23:54 UTC
Permalink
Raw Message
Post by John Brooks
pha ;Push low digit 0-9 to print
lda #>RomPlaPrHex-1
pha ;Push address of ROM nibble print
lda #<RomPlaPrHex-1
pha
Hehe...this is an excellent way of getting out of the right to left display.

Also nice abuse of an unlabeled monitor address. (I would have gone with prhex-2.)

Impressive - and tacky - at the same time!
John Brooks
2017-07-09 03:37:08 UTC
Permalink
Raw Message
Post by r***@gmail.com
Post by John Brooks
pha ;Push low digit 0-9 to print
lda #>RomPlaPrHex-1
pha ;Push address of ROM nibble print
lda #<RomPlaPrHex-1
pha
Hehe...this is an excellent way of getting out of the right to left display.
Also nice abuse of an unlabeled monitor address. (I would have gone with prhex-2.)
Impressive - and tacky - at the same time!
It was only a question of whether I found it before qkumba. He is quite the code golfer.

Did you see my 34 byte version? It has another unorthodox use of the monitor ROM to save a byte.

-JB
@JBrooksBSI
barrym95838
2017-07-09 04:27:43 UTC
Permalink
Raw Message
... It was only a question of whether I found it before qkumba.
He is quite the code golfer.
...
I am not in the same league, but I'm proud that I found a fresh
path which was deserving of your golfing attention. That super-
compact UM/MOD just popped up during one of my rare bursts of
inspiration. If only they weren't so rare and random ... I wish
I could save a few of those up, and use them at a time and place
of my choosing once in a while ...

Mike B.
John Brooks
2017-07-09 05:51:56 UTC
Permalink
Raw Message
Post by barrym95838
... It was only a question of whether I found it before qkumba.
He is quite the code golfer.
...
I am not in the same league, but I'm proud that I found a fresh
path which was deserving of your golfing attention. That super-
compact UM/MOD just popped up during one of my rare bursts of
inspiration. If only they weren't so rare and random ... I wish
I could save a few of those up, and use them at a time and place
of my choosing once in a while ...
Mike B.
As a lifelong programmer ('79 to present), I've written a ton of decimal print routines and have never come across this exact method.

I've seen calls to DIV/MOD routines which are typically huge and slow. The go-to strategies are typically BCD on older CPUs and power-of-10 lookups on newer CPUs (or hardware divide).

I think I may have seen and possibly written a similar in-place-result 'DIV & MOD' loop in the 1980s when I was writing 3D math routines for my flight simulators on the //e & IIGS (Tomahawk), but I just considered the in-place-result to be a minor space savings rather than a compelling feature.

IMO the dual-result 'DIV & MOD' shift/sub loop is not nearly as well known in computer science as the dual-result 'Sin & Cos' commonly used in shader programming.

For the application of printing decimal numbers, your approach fits like a glove. The ability to peel off low digits while retaining the high digits allows a compact recursive-like reprocessing of the input number.

It's not a speed-demon, but I don't see how it can be beat for compact code size.

Nicely done!

-JB
@JBrooksBSI
John Brooks
2017-07-09 06:04:25 UTC
Permalink
Raw Message
Post by barrym95838
... It was only a question of whether I found it before qkumba.
He is quite the code golfer.
...
I am not in the same league, but I'm proud that I found a fresh
path which was deserving of your golfing attention. That super-
compact UM/MOD just popped up during one of my rare bursts of
inspiration. If only they weren't so rare and random ... I wish
I could save a few of those up, and use them at a time and place
of my choosing once in a while ...
Mike B.
Here is my last optimization idea: use the overflow flag to determine when to conclude the divide.

So now it's 33 bytes (+4 for demo):

300:A2 12 A9 34 20 4A FF A9 00 B8 A2 10 C9 05 90 03
310:E9 85 38 26 45 26 46 2A CA D0 F1 48 A9 FD 48 A9 E1 48 70 E3 60

*-------------------------------
* DecPrint - 6502 print 16 bits
* Merlin 8/16/32 assembler
*
* by Michael T. Barry 2017.07.07. Free to
* copy, use and modify, but without warranty
*
* Optimized by J.Brooks & qkumba 7/8/2017
*-------------------------------
lst off
org $0300

ZpDecWord = $45 ;U16 being printed

RomPlaPrHex = $FDE2 ;PLA then PrHexZ
RomSave = $FF4A ;A->$45, X->$46, Y->$47

*-------------------------------

Demo ldx #$12
lda #$34
* 4 byte demo falls into DecPrintU16

*-------------------------------
* Print U16 as decimal via COUT
* IN: A=hi, X=lo
* OUT: X=$00, Y=$FF
*-------------------------------
DecPrintU16
jsr RomSave ;Save A,X to $45,$46
:DoDigit lda #0 ;Remainder=0
clv ;V=0 means div result = 0
ldx #16 ;16-bit divide
:Div10 cmp #10/2 ;Calc ZpDecWord/10
bcc :Under10
sbc #10/2+$80 ;Remove digit & set V=1 to show div result > 0
sec ;Shift 1 into div result
:Under10 rol ZpDecWord ;Shift /10 result into ZpDecWord
rol ZpDecWord+1
rol ;Shift bits of input into acc (input mod 10)
dex
bne :Div10 ;Continue 16-bit divide
pha ;Push low digit 0-9 to print
lda #>RomPlaPrHex-1
pha ;Push address of ROM nibble print
lda #<RomPlaPrHex-1
pha
bvs :DoDigit ;If V=1, result of /10 was > 0 & do next digit
rts
lst off
barrym95838
2017-07-09 07:54:49 UTC
Permalink
Raw Message
Post by John Brooks
300:A2 12 A9 34 20 4A FF A9 00 B8 A2 10 C9 05 90 03
310:E9 85 38 26 45 26 46 2A CA D0 F1 48 A9 FD 48 A9 E1 48 70 E3 60
You didn't even trash Y, AFAICT. I'm not sure, but I think that it
might be the ultimate in 6502 coding bad-assery when it's not even
remotely obvious how to improve an optimization by throwing another
register at it.

Did you put the high-half in $45 and the low half in $46? If so,
do you need to "rol ZpDecWord+1" before you "rol ZpDecWord", or is
it just way past my bed-time?

Mike B.
g***@sasktel.net
2017-07-09 20:55:31 UTC
Permalink
Raw Message
Post by barrym95838
Post by John Brooks
300:A2 12 A9 34 20 4A FF A9 00 B8 A2 10 C9 05 90 03
310:E9 85 38 26 45 26 46 2A CA D0 F1 48 A9 FD 48 A9 E1 48 70 E3 60
You didn't even trash Y, AFAICT. I'm not sure, but I think that it
might be the ultimate in 6502 coding bad-assery when it's not even
remotely obvious how to improve an optimization by throwing another
register at it.
Did you put the high-half in $45 and the low half in $46? If so,
do you need to "rol ZpDecWord+1" before you "rol ZpDecWord", or is
it just way past my bed-time?
Mike B.
Personally I am confused by the whole thing when since ROM routines are being used:

300:A2 34 A9 12 4C 24 ED

300:LDX #$34
LDA #$12
JMP $ED24

does exactly the same thing in only 3 bytes (+4 for demo)
g***@sasktel.net
2017-07-09 21:04:44 UTC
Permalink
Raw Message
Post by g***@sasktel.net
Post by barrym95838
Post by John Brooks
300:A2 12 A9 34 20 4A FF A9 00 B8 A2 10 C9 05 90 03
310:E9 85 38 26 45 26 46 2A CA D0 F1 48 A9 FD 48 A9 E1 48 70 E3 60
You didn't even trash Y, AFAICT. I'm not sure, but I think that it
might be the ultimate in 6502 coding bad-assery when it's not even
remotely obvious how to improve an optimization by throwing another
register at it.
Did you put the high-half in $45 and the low half in $46? If so,
do you need to "rol ZpDecWord+1" before you "rol ZpDecWord", or is
it just way past my bed-time?
Mike B.
300:A2 34 A9 12 4C 24 ED
300:LDX #$34
LDA #$12
JMP $ED24
does exactly the same thing in only 3 bytes (+4 for demo)
For anyone wanting a quick 3-way conversion, hex, decimal, binary.

Here is my 16-bit ampersand utility.

Just launch at the applesoft prompt, then type:

]&$FFFF or,
]&$65535 or,
]&%1111111111111111

9000: a9 10 8d f6 03 a9 90 8d f7 03 60 00 00 00 00 00
9010: a2 01 a0 02 c9 24 d0 1d bd 00 02 f0 0b 09 80 9d
9020: 00 02 20 b1 00 e8 d0 f0 20 a7 ff a6 3e a5 3f 20
9030: 24 ed 4c 50 90 c9 23 d0 3c 20 b1 00 a9 a4 20 ed
9040: fd 20 67 dd 20 52 e7 85 3f a6 50 86 3e 20 41 f9
9050: a2 01 20 8e fd a2 11 06 3e 26 3f b0 03 a9 b0 2c
9060: a9 b1 2c a9 a0 20 ed fd ca e0 09 f0 f6 e8 ca d0
9070: e6 4c 95 d9 60 c9 25 d0 fb a9 00 85 3e 85 3f a2
9080: 02 bd 00 02 f0 0f 18 29 01 f0 01 38 26 3e 26 3f
9090: e8 e0 20 90 ec 20 8e fd a9 a3 20 ed fd a6 3e a5
90a0: 3f 20 24 ed 20 8e fd a9 a4 20 ed fd a6 3e a5 3f
90b0: 20 41 f9 20 8e fd 4c 95 d9 00 00 00 00 00 00 00
barrym95838
2017-07-09 22:01:37 UTC
Permalink
Raw Message
Post by g***@sasktel.net
300:A2 34 A9 12 4C 24 ED
300:LDX #$34
LDA #$12
JMP $ED24
does exactly the same thing in only 3 bytes (+4 for demo)
I can't speak for the others, but I suffer from a very specific
form of mental illness, and even though I'm not very good at it,
6502 code golfing temporarily relieves some of my symptoms.

I offer the following compromise, in 38 bytes, using only COUT
and two bytes of zero-page. It should be easily adaptable to
any 65xx-based system which doesn't have that fat hog $ED24 or
any cool undocumented monitor entry points. It can be easily
adapted to print in bases 2, 4, 6, and 8 as well, and any even
numbered base from 12 to 36 with just a few more bytes of code.

300:86 A0 85 A1 A9 00 48 A9 00 B8 A0 10 C9 05 90 03
310:E9 85 38 26 A0 26 A1 2A 88 D0 F1 09 B0 70 E7 20
320:ED FD 68 D0 FA 60

Mike B.
qkumba
2017-07-10 02:24:26 UTC
Permalink
Raw Message
Post by g***@sasktel.net
300:A2 34 A9 12 4C 24 ED
300:LDX #$34
LDA #$12
JMP $ED24
does exactly the same thing in only 3 bytes (+4 for demo)
Except when it doesn't.
You are assuming that Applesoft is present. We are not.
Michael 'AppleWin Debugger Dev'
2017-07-10 04:06:10 UTC
Permalink
Raw Message
Post by g***@sasktel.net
300:A2 34 A9 12 4C 24 ED
300:LDX #$34
LDA #$12
JMP $ED24
does exactly the same thing in only 3 bytes (+4 for demo)
Except ...

1. It requires Applesoft to have been initialized,
2. Which in turn stomps over memory all over the place,
3. Ergo its memory usage is non-deterministic
4. Calling Applesoft COLD start or WARM start @ $E000 or $E003 respectively _doesn't_ return to the caller -- HOW does one initialize Applesoft properly and yet allow LINPRT to be called safely?
5. How could you call this in a boot sector?
6. Its D-O-G slow, and inefficient.

Digit peeling doesn't have any of these disadvantages and "just works."
The goal here is to minimize coupling and maximize locality.

It is always good to have different, orthogonal tools in the proverbial toolbox.

For a different problem LINPRT might make more sense.
g***@sasktel.net
2017-07-10 05:52:27 UTC
Permalink
Raw Message
Post by Michael 'AppleWin Debugger Dev'
Post by g***@sasktel.net
300:A2 34 A9 12 4C 24 ED
300:LDX #$34
LDA #$12
JMP $ED24
does exactly the same thing in only 3 bytes (+4 for demo)
Except ...
1. It requires Applesoft to have been initialized,
2. Which in turn stomps over memory all over the place,
3. Ergo its memory usage is non-deterministic
5. How could you call this in a boot sector?
6. Its D-O-G slow, and inefficient.
Digit peeling doesn't have any of these disadvantages and "just works."
The goal here is to minimize coupling and maximize locality.
It is always good to have different, orthogonal tools in the proverbial toolbox.
For a different problem LINPRT might make more sense.
Applesoft does not have to be initialized, just the I/0 vectors ($36.39) set up to use $ED24, which answers questions 1, 4 and 5.

As far as speed goes, I am trying to find a use for this highly efficient tool that requires a lot of converting hex digits and printing them to the screen in decimal format where speed would be a factor.

What are some general use scenarios that could be used with this utility?
Michael J. Mahon
2017-07-10 07:06:02 UTC
Permalink
Raw Message
Post by Michael 'AppleWin Debugger Dev'
Post by g***@sasktel.net
300:A2 34 A9 12 4C 24 ED
300:LDX #$34
LDA #$12
JMP $ED24
does exactly the same thing in only 3 bytes (+4 for demo)
Except ...
1. It requires Applesoft to have been initialized,
2. Which in turn stomps over memory all over the place,
3. Ergo its memory usage is non-deterministic
respectively _doesn't_ return to the caller -- HOW does one initialize
Applesoft properly and yet allow LINPRT to be called safely?
5. How could you call this in a boot sector?
6. Its D-O-G slow, and inefficient.
Digit peeling doesn't have any of these disadvantages and "just works."
The goal here is to minimize coupling and maximize locality.
It is always good to have different, orthogonal tools in the proverbial toolbox.
For a different problem LINPRT might make more sense.
The way you retain control after jumping to the Applesoft cold-start
address is to set the COUT vector to branch to your return point.

When Applesoft completes its initialization, it tries to print a "]"
prompt, and that's where you grab control back.
--
-michael - NadaNet 3.1 and AppleCrate II: http://michaeljmahon.com
Antoine Vignau
2017-07-09 07:56:32 UTC
Permalink
Raw Message
You, guys, are crazy!

Why write a so small routine when we have 64KB of RAM space ;-)

Antoine
Michael 'AppleWin Debugger Dev'
2017-07-09 13:37:47 UTC
Permalink
Raw Message
Post by Antoine Vignau
You, guys, are crazy!
Why write a so small routine when we have 64KB of RAM space ;-)
/sarcasm Oh hush with your 32 GB of system ram. Oh wait, that's my i7 dev box. =P :-)

Does this mean we can coin a new quote?

"64KB aught to be enough for anybody" -- Antoine

:-)
Antoine Vignau
2017-07-09 21:25:28 UTC
Permalink
Raw Message
Post by Michael 'AppleWin Debugger Dev'
Post by Antoine Vignau
You, guys, are crazy!
Why write a so small routine when we have 64KB of RAM space ;-)
/sarcasm Oh hush with your 32 GB of system ram. Oh wait, that's my i7 dev box. =P :-)
Does this mean we can coin a new quote?
"64KB aught to be enough for anybody" -- Antoine
:-)
Yeah, you got it right! Thank you, I am so honored :-)
Antoine
Anthony Lawther
2017-07-09 21:59:39 UTC
Permalink
Raw Message
Post by Michael 'AppleWin Debugger Dev'
Post by Antoine Vignau
You, guys, are crazy!
Why write a so small routine when we have 64KB of RAM space ;-)
/sarcasm Oh hush with your 32 GB of system ram. Oh wait, that's my i7 dev box. =P :-)
Does this mean we can coin a new quote?
"64KB aught to be enough for anybody" -- Antoine
:-)
Michael,

I grit my teeth in silence at your use of then instead of than when used
for comparison, but I can't let this slide: there are multiple definitions
of aught but not one of them means the same as ought.
Michael 'AppleWin Debugger Dev'
2017-07-09 23:59:38 UTC
Permalink
Raw Message
Post by Anthony Lawther
Post by Michael 'AppleWin Debugger Dev'
"64KB aught to be enough for anybody" -- Antoine
Michael,
I grit my teeth in silence at your use of then instead of than when used
for comparison,
I don't see any use "then" or "than" in any of my posts in _this_ thread ... but if I have buggered them up in _other_ threads then yup, guilty as charged -- I can never seem to remember which is which -- even though I keep looking at the Oatmean's cheatsheet from time to time.
http://theoatmeal.com/comics/misspelling
Post by Anthony Lawther
there are multiple definitions
of aught but not one of them means the same as ought.
Ah, TIL'd about ought vs aught. I always assumed the meant the same thing. Mea culpa. :-/

Bloody English and its billion homonyms already. :-)
Anthony Lawther
2017-07-10 13:14:42 UTC
Permalink
Raw Message
Post by Michael 'AppleWin Debugger Dev'
Post by Anthony Lawther
Post by Michael 'AppleWin Debugger Dev'
"64KB aught to be enough for anybody" -- Antoine
Michael,
I grit my teeth in silence at your use of then instead of than when used
for comparison,
I don't see any use "then" or "than" in any of my posts in _this_ thread
... but if I have buggered them up in _other_ threads then yup, guilty as
charged -- I can never seem to remember which is which -- even though I
keep looking at the Oatmean's cheatsheet from time to time.
http://theoatmeal.com/comics/misspelling
Post by Anthony Lawther
there are multiple definitions
of aught but not one of them means the same as ought.
Ah, TIL'd about ought vs aught. I always assumed the meant the same thing. Mea culpa. :-/
Bloody English and its billion homonyms already. :-)
I'm a self confessed spelling and grammar nerd, but I'm often reminded that
I'm not supposed to hold others to the same high standard.

I'm glad I helped you learn today, and you didn't take offense.

Now back to 6502 assembly discussions :-)
James Davis
2017-07-10 19:30:34 UTC
Permalink
Raw Message
I'm a self confessed spelling and grammar nerd, ....
Ditto that! So am I.

For all those who have trouble remembering whether to use 'then' or 'than,' think: "'then' it will happen, rather 'than,' 'then' it will not happen." And: "this rather 'than' that." In other words, equate 'then' with something 'happening' and 'than' with 'rather than' (comparing two things).

Definitions:

Then:

1. then - adv
1 : at that time
2 : soon after that : next
3 : in addition : besides
4 : in that case 5 : consequently

2. then - n : that time <since ~>

3. then - adj : existing or acting at that time
<the then attorney general>

Than:

1. than - conj
1 — used after a comparative adjective or adverb to introduce the second part of a comparison expressing inequality <older than I am>
2 — used after other or a word of similar meaning to express a difference of kind, manner, or identity <adults other than parents>

2. than - prep : in comparison with <older than me>

(c)2000 Zane Publishing, Inc. and Merriam-Webster, Incorporated. All rights reserved
Michael 'AppleWin Debugger Dev'
2017-07-10 20:55:14 UTC
Permalink
Raw Message
Post by James Davis
I'm a self confessed spelling and grammar nerd, ....
Ditto that! So am I.
TL:DR;

then - compare time
than - compare things
James Davis
2017-07-10 22:32:51 UTC
Permalink
Raw Message
Post by Michael 'AppleWin Debugger Dev'
Post by James Davis
I'm a self confessed spelling and grammar nerd, ....
Ditto that! So am I.
TL:DR;
then - compare time
than - compare things
OK, Michael. So now that you really understand it, you won't make that mistake again, right?
Michael 'AppleWin Debugger Dev'
2017-07-11 02:46:38 UTC
Permalink
Raw Message
Post by James Davis
OK, Michael. So now that you really understand it, you won't make that mistake again, right?
Rather than make promises that I have no hope of living up to I will remain silent then to give false impressions that I am cured. :-)

And yes, it is ironic if I made any above. :-)
Harry Potter
2017-07-10 14:54:00 UTC
Permalink
Raw Message
Post by John Brooks
300:A2 12 A9 34 20 4A FF A9 00 B8 A2 10 C9 05 90 03
310:E9 85 38 26 45 26 46 2A CA D0 F1 48 A9 FD 48 A9 E1 48 70 E3 60
Uhh...may I use your code in my CBM/Apple2SimpleIO libraries? I tested it out and got good results. If I may, how would you like to be credited?
John Brooks
2017-07-11 00:29:12 UTC
Permalink
Raw Message
Post by Harry Potter
Post by John Brooks
300:A2 12 A9 34 20 4A FF A9 00 B8 A2 10 C9 05 90 03
310:E9 85 38 26 45 26 46 2A CA D0 F1 48 A9 FD 48 A9 E1 48 70 E3 60
Uhh...may I use your code in my CBM/Apple2SimpleIO libraries? I tested it out and got good results. If I may, how would you like to be credited?
Yes, it is a public release with no restrictions. There are multiple authors, so credit Mike, qkumba, and me, or credit this CSA2 thread "6502 Print U16 as decimal" so interested readers can learn more.

-JB
@JBrooksBSI
barrym95838
2017-07-11 00:43:13 UTC
Permalink
Raw Message
Post by John Brooks
Yes, it is a public release with no restrictions. There are multiple
authors, so credit Mike, qkumba, and me, or credit this CSA2 thread
"6502 Print U16 as decimal" so interested readers can learn more.
-JB
@JBrooksBSI
I second that.

Mike B.
Harry Potter
2017-07-11 13:29:15 UTC
Permalink
Raw Message
Post by John Brooks
Yes, it is a public release with no restrictions. There are multiple authors, so credit Mike, qkumba, and me, or credit this CSA2 thread "6502 Print U16 as decimal" so interested readers can learn more.
Done! :)
Harry Potter
2017-07-11 14:02:40 UTC
Permalink
Raw Message
Does anybody here want me to post *my* version of the U16Print routine here? :)
Michael 'AppleWin Debugger Dev'
2017-07-11 18:30:02 UTC
Permalink
Raw Message
Post by Harry Potter
Does anybody here want me to post *my* version of the U16Print routine here? :)
No.
v***@pianoman.cluster.toy
2017-07-09 20:09:51 UTC
Permalink
Raw Message
If anyone wants to continue optimizing 6502 code for size, I should
point out my "ll_asm" code density project.

http://www.deater.net/weave/vmwprod/asm/ll/ll.html

where I optimize the same program for size in assembly language on 30+
architectures. The code most of interest is LZSS decompression, but
integer printing and strcat() are involved as well.

I get a lot of complaints about my lack of assembly skills. Now that
I finally got m68k reoptimized (after a lot of badgering by
Amiga/m68k diehards) the next most complained about arch is 6502.

I also have some partially done 65c816 code, but I was developing it
on a SNES not on a GS.

Vince
barrym95838
2017-07-13 15:23:44 UTC
Permalink
Raw Message
Post by v***@pianoman.cluster.toy
If anyone wants to continue optimizing 6502 code for size, I should
point out my "ll_asm" code density project.
http://www.deater.net/weave/vmwprod/asm/ll/ll.html
where I optimize the same program for size in assembly language on 30+
architectures. The code most of interest is LZSS decompression, but
integer printing and strcat() are involved as well.
...
Thanks for reminding me, Vince. There was a murmur about you over on
6502.org about a year or so ago, and I started to do a rewrite. I was
making decent progress, but multiple distractions caused me to push it
to the back burner, and it finally fell off the back of my stove. I
will try to retrieve it and finish, but the distractions keep piling
up at an alarming rate, so I can't even begin to guess about an ETA.

http://forum.6502.org/viewtopic.php?f=1&t=3044&hilit=weaver

Mike B.
John Brooks
2017-07-13 17:08:32 UTC
Permalink
Raw Message
Post by barrym95838
Post by v***@pianoman.cluster.toy
If anyone wants to continue optimizing 6502 code for size, I should
point out my "ll_asm" code density project.
http://www.deater.net/weave/vmwprod/asm/ll/ll.html
where I optimize the same program for size in assembly language on 30+
architectures. The code most of interest is LZSS decompression, but
integer printing and strcat() are involved as well.
...
Thanks for reminding me, Vince. There was a murmur about you over on
6502.org about a year or so ago, and I started to do a rewrite. I was
making decent progress, but multiple distractions caused me to push it
to the back burner, and it finally fell off the back of my stove. I
will try to retrieve it and finish, but the distractions keep piling
up at an alarming rate, so I can't even begin to guess about an ETA.
http://forum.6502.org/viewtopic.php?f=1&t=3044&hilit=weaver
Mike B.
Hi Vince. I took a quick look at the linux logo project and there's a lot of room to improve it, as Mike mentioned.

I see that the current smallest exe is 68k. Around 1988, while working at Datasoft, we were creating the same game on 6502 and 68k computers and ran friendly competitions among the assembly programs to see which version would have the smallest code.

We found that for game logic (if/else heavy) and ascii/byte operations, the 6502 code would be 0.75x to 0.5x the size of 68k code.

For parts of the games which required 16 bit or higher math or ptrs, the 6502 would be 1.5x to 4x larger than 68k.

Anyway, I'm pretty sure the 6502 version of ll could be much smaller than the current 68k version, if coded 'the 6502 way'.

I'm pretty sure the ascii art of the logo and the 6502 code to unpack and print it as text on the Apple II would fit in about 256 bytes if the compression was switched to RLE and an efficient 6502 implementation was used.

If I get time I'll make an example as these refactors are easier to see than describe.

-JB
Michael 'AppleWin Debugger Dev'
2017-07-14 16:12:39 UTC
Permalink
Raw Message
Post by John Brooks
Hi Vince. I took a quick look at the linux logo project and there's a lot of room to improve it, as Mike mentioned.
That's a bit of an understatement. :-)

The whole HGR y to address calculation takes up $40 bytes which can be simplified down to $1B bytes.

File: no-os/6502_apple/ll_6502.s
Func: y_to_addr
Lines: 585-642

Replace with Woz's code:
ASL ;A--BCDEFGH0
TAX ;TAX...TXA could be TAY...TYA
AND #$F0 ;A--BCDE0000
BPL _1 ;B=0
ORA #$05 ;A--BCDE0B0B
_1 BCC _2 ;A-0
ORA #$0A ;A--BCDEABAB
_2 ASL ;B--CDEABAB0
ASL ;C--DEABAB00
STA YADDRL
TXA ;C--BCDEFGH0
AND #$0E ;C--0000FGH0
ADC #$10 ;O--OOxxFGH0 ; HPAG2 = $10 for base $2000, $20 for base $4000
ASL YADDRL ;D--00xxFGHC GBASL=EABAB000
ROL ;0--0xxFGHCD
STA YADDRH
RTS


= How it works =

The psuedo code to convert an y coordinate to an HGR address is:

return 0x2000 + (y&7)*0x400 + ((y/8)&7)*0x80 + (y/64)*0x28;


That is, given:

y = 0 ... 191

Or in binary

y = abcdefgh

Determine the start of the HGR scanline address it corresponds to.


We can break the terms of the address calculation into 3 parts:

(y/64)*0x28
y = abcdefgh
y/64 = 000000ab
y>>6 = 000000ab
ab * $28 = 0aba b000
00 = $00 = 0000_0000
01 = $28 = 0010_1000
10 = $50 = 0101_0000
11 = $78 = 0111_1000

(y%8)*0x400
y = abcdefgh
y%8 = 00000fgh
y&7 = 00000fgh
fgh * $400 = 000f gh00 00000000
000 = $0000 = 0000_0000 00000000
001 = $0400 = 0000_0100 00000000
010 = $0800 = 0000_1000 00000000
011 = $0C00 = 0000_1100 00000000
100 = $1000 = 0001_0000 00000000
101 = $1400 = 0001_0100 00000000
110 = $1800 = 0001_1000 00000000
111 = $1C00 = 0001_1100 00000000

((y/8)&7)*0x80;
y = abcdefgh
y/8 = 000abcde
&7 = 111
= 00000cde
cde * $80 = cd_e000_0000
000 = $000 = 00_0000_0000
001 = $080 = 00_1000_0000
010 = $100 = 01_0000_0000
011 = $180 = 01_1000_0000
100 = $200 = 10_0000_0000
101 = $280 = 10_1000_0000
110 = $300 = 11_0000_0000
111 = $380 = 11_1000_0000

Our address is of the form:
addr = 0000_0000 0aba b000 ((y/64) )*0x028
addr = 000f gh00 0000_0000 ((y%8) )*0x400
addr = 0000 00cd e000 0000 ((y/8)&7)*0x080
addr = 000f ghcd eaba b000


Reference:
https://github.com/AppleWin/AppleWin/wiki
Michael 'AppleWin Debugger Dev'
2017-07-14 16:17:47 UTC
Permalink
Raw Message
Also, the old code, y_to_addr:, requires Y to be zero prior to calling due the STY:


;==================================================
; y_to_addr - convert y value to address in mem
;==================================================
; this is needlessly complicated. Blame Steve Wozniak
; apparently it was a clever hack to avoid the need
; for dedicated memory refresh circuitry
...
sty YADDRL
Michael J. Mahon
2017-07-14 16:39:27 UTC
Permalink
Raw Message
Post by Michael 'AppleWin Debugger Dev'
;==================================================
; y_to_addr - convert y value to address in mem
;==================================================
; this is needlessly complicated. Blame Steve Wozniak
; apparently it was a clever hack to avoid the need
; for dedicated memory refresh circuitry
...
sty YADDRL
It's a lot more than refresh circuitry--he also avoided all DRAM refresh
interference with the processor, which is crucial in permitting
deterministic processor timing.
--
-michael - NadaNet 3.1 and AppleCrate II: http://michaeljmahon.com
s***@gmail.com
2017-07-14 19:21:24 UTC
Permalink
Raw Message
Post by John Brooks
Post by barrym95838
Post by v***@pianoman.cluster.toy
If anyone wants to continue optimizing 6502 code for size, I should
point out my "ll_asm" code density project.
http://www.deater.net/weave/vmwprod/asm/ll/ll.html
where I optimize the same program for size in assembly language on 30+
architectures. The code most of interest is LZSS decompression, but
integer printing and strcat() are involved as well.
...
Thanks for reminding me, Vince. There was a murmur about you over on
6502.org about a year or so ago, and I started to do a rewrite. I was
making decent progress, but multiple distractions caused me to push it
to the back burner, and it finally fell off the back of my stove. I
will try to retrieve it and finish, but the distractions keep piling
up at an alarming rate, so I can't even begin to guess about an ETA.
http://forum.6502.org/viewtopic.php?f=1&t=3044&hilit=weaver
Mike B.
Hi Vince. I took a quick look at the linux logo project and there's a lot of room to improve it, as Mike mentioned.
I see that the current smallest exe is 68k. Around 1988, while working at Datasoft, we were creating the same game on 6502 and 68k computers and ran friendly competitions among the assembly programs to see which version would have the smallest code.
We found that for game logic (if/else heavy) and ascii/byte operations, the 6502 code would be 0.75x to 0.5x the size of 68k code.
For parts of the games which required 16 bit or higher math or ptrs, the 6502 would be 1.5x to 4x larger than 68k.
Anyway, I'm pretty sure the 6502 version of ll could be much smaller than the current 68k version, if coded 'the 6502 way'.
I'm pretty sure the ascii art of the logo and the 6502 code to unpack and print it as text on the Apple II would fit in about 256 bytes if the compression was switched to RLE and an efficient 6502 implementation was used.
If I get time I'll make an example as these refactors are easier to see than describe.
-JB
To provide a "fair" comparison, your "refactoring" should use the same LZSS input as the versions for the other CPUs.
John Brooks
2017-07-14 20:57:51 UTC
Permalink
Raw Message
Post by s***@gmail.com
Post by John Brooks
Post by barrym95838
Post by v***@pianoman.cluster.toy
If anyone wants to continue optimizing 6502 code for size, I should
point out my "ll_asm" code density project.
http://www.deater.net/weave/vmwprod/asm/ll/ll.html
where I optimize the same program for size in assembly language on 30+
architectures. The code most of interest is LZSS decompression, but
integer printing and strcat() are involved as well.
...
Thanks for reminding me, Vince. There was a murmur about you over on
6502.org about a year or so ago, and I started to do a rewrite. I was
making decent progress, but multiple distractions caused me to push it
to the back burner, and it finally fell off the back of my stove. I
will try to retrieve it and finish, but the distractions keep piling
up at an alarming rate, so I can't even begin to guess about an ETA.
http://forum.6502.org/viewtopic.php?f=1&t=3044&hilit=weaver
Mike B.
Hi Vince. I took a quick look at the linux logo project and there's a lot of room to improve it, as Mike mentioned.
I see that the current smallest exe is 68k. Around 1988, while working at Datasoft, we were creating the same game on 6502 and 68k computers and ran friendly competitions among the assembly programs to see which version would have the smallest code.
We found that for game logic (if/else heavy) and ascii/byte operations, the 6502 code would be 0.75x to 0.5x the size of 68k code.
For parts of the games which required 16 bit or higher math or ptrs, the 6502 would be 1.5x to 4x larger than 68k.
Anyway, I'm pretty sure the 6502 version of ll could be much smaller than the current 68k version, if coded 'the 6502 way'.
I'm pretty sure the ascii art of the logo and the 6502 code to unpack and print it as text on the Apple II would fit in about 256 bytes if the compression was switched to RLE and an efficient 6502 implementation was used.
If I get time I'll make an example as these refactors are easier to see than describe.
-JB
To provide a "fair" comparison, your "refactoring" should use the same LZSS input as the versions for the other CPUs.
I would if LZSS was an efficient codec for this use-case, but unfortunately it is not. The ascii art logo is short-length-RLE in nature which is ill-suited to LZSS's design of handling large files of english text with repeating words and phrases.

My philosophy is rather than do garbage-in, garbage-out, solve the problem the best way possible and then let the other platforms see how much they can improve to match or beat the 'optimal' version.

-JB
@JBrooksBSI
Michael J. Mahon
2017-07-14 22:40:56 UTC
Permalink
Raw Message
Post by John Brooks
Post by s***@gmail.com
Post by John Brooks
Post by barrym95838
Post by v***@pianoman.cluster.toy
If anyone wants to continue optimizing 6502 code for size, I should
point out my "ll_asm" code density project.
http://www.deater.net/weave/vmwprod/asm/ll/ll.html
where I optimize the same program for size in assembly language on 30+
architectures. The code most of interest is LZSS decompression, but
integer printing and strcat() are involved as well.
...
Thanks for reminding me, Vince. There was a murmur about you over on
6502.org about a year or so ago, and I started to do a rewrite. I was
making decent progress, but multiple distractions caused me to push it
to the back burner, and it finally fell off the back of my stove. I
will try to retrieve it and finish, but the distractions keep piling
up at an alarming rate, so I can't even begin to guess about an ETA.
http://forum.6502.org/viewtopic.php?f=1&t=3044&hilit=weaver
Mike B.
Hi Vince. I took a quick look at the linux logo project and there's a
lot of room to improve it, as Mike mentioned.
I see that the current smallest exe is 68k. Around 1988, while working
at Datasoft, we were creating the same game on 6502 and 68k computers
and ran friendly competitions among the assembly programs to see which
version would have the smallest code.
We found that for game logic (if/else heavy) and ascii/byte operations,
the 6502 code would be 0.75x to 0.5x the size of 68k code.
For parts of the games which required 16 bit or higher math or ptrs,
the 6502 would be 1.5x to 4x larger than 68k.
Anyway, I'm pretty sure the 6502 version of ll could be much smaller
than the current 68k version, if coded 'the 6502 way'.
I'm pretty sure the ascii art of the logo and the 6502 code to unpack
and print it as text on the Apple II would fit in about 256 bytes if
the compression was switched to RLE and an efficient 6502 implementation was used.
If I get time I'll make an example as these refactors are easier to see than describe.
-JB
To provide a "fair" comparison, your "refactoring" should use the same
LZSS input as the versions for the other CPUs.
I would if LZSS was an efficient codec for this use-case, but
unfortunately it is not. The ascii art logo is short-length-RLE in nature
which is ill-suited to LZSS's design of handling large files of english
text with repeating words and phrases.
My philosophy is rather than do garbage-in, garbage-out, solve the
problem the best way possible and then let the other platforms see how
much they can improve to match or beat the 'optimal' version.
-JB
@JBrooksBSI
Exactly.

When doing cross-platform, cross-architecture comparisons, only the
function need remain constant. The best algorithms and data
representations are architecture-specific (unless the external data
representation is a part of the specification, e.g.: GIF).
--
-michael - NadaNet 3.1 and AppleCrate II: http://michaeljmahon.com
Michael 'AppleWin Debugger Dev'
2017-07-17 08:26:39 UTC
Permalink
Raw Message
Post by John Brooks
I would if LZSS was an efficient codec for this use-case, but unfortunately it is not. The ascii art logo is short-length-RLE in nature which is ill-suited to LZSS's design of handling large files of english text with repeating words and phrases.
Indeed. LZSS is horribly inefficient for this data set.

I replaced the bloated LZSS + data (283 bytes) with simple 2-bit per character data (70 chars * 12 rows) = 210 bytes packed.

For decompression unpack 2 bits to 4 bits (2 pixels)

I'm pretty sure the decompression could be optimized. Maybe if qkumba is bored he can take a look. :-)
https://github.com/Michaelangel007/6502_linux_logo/blob/master/linuxlogo.s#L233-L334
David Schmenk
2017-07-17 14:06:05 UTC
Permalink
Raw Message
Post by Michael 'AppleWin Debugger Dev'
Post by John Brooks
I would if LZSS was an efficient codec for this use-case, but unfortunately it is not. The ascii art logo is short-length-RLE in nature which is ill-suited to LZSS's design of handling large files of english text with repeating words and phrases.
Indeed. LZSS is horribly inefficient for this data set.
I replaced the bloated LZSS + data (283 bytes) with simple 2-bit per character data (70 chars * 12 rows) = 210 bytes packed.
For decompression unpack 2 bits to 4 bits (2 pixels)
I'm pretty sure the decompression could be optimized. Maybe if qkumba is bored he can take a look. :-)
https://github.com/Michaelangel007/6502_linux_logo/blob/master/linuxlogo.s#L233-L334
This was the same approach I took for the Apple 1 30th birthday slideshow: 10 40x23 slides for hours of entertainment in only 3.5 K. No doubt it could have been smaller, but you can't beat the simplicity.
qkumba
2017-07-17 16:59:12 UTC
Permalink
Raw Message
Post by Michael 'AppleWin Debugger Dev'
I'm pretty sure the decompression could be optimized. Maybe if qkumba is bored he can take a look. :-)
Without focusing on the algorithm in Unpack2Bits, here are some quick suggestions:

lda zUnpackBits
and #3 ; A=000000ba
pha
asl ; A=00000ba0
asl ; A=0000ba00
sta zMask
pla
ora zMask ; A=0000baba

->

lda zUnpackBits
asl ; A=00000ba0
asl ; A=0000ba00
eor zUnpackBits ;A=0000xxyy
and #$0c ; A=0000xx00
eor zUnpackBits ;A=0000baba

--

NoShiftSherlock
cmp #$80 ; msb of byte0 set?
rol zMask ; shift in to lsb of byte1

ldx zDstShift ; x={0,1,2} + 4 < 7
cpx #3 ; all bits fit into dest byte?

ldx zSaveX
ora #$80

->

NoShiftSherlock
asl ; msb of byte0 set?
rol zMask ; shift in to lsb of byte1

sec
ror

ldx zDstShift ; x={0,1,2} + 4 < 7
cpx #3 ; all bits fit into dest byte?

ldx zSaveX

--

Draw8Rows
ldy #0
CopyScanLine
lda UnpackAddr,Y
sta (zHgrPtr),Y

cpx #0 ; Clear source on last scanline copy
bne CopyNextByte
txa
sta UnpackAddr,Y
CopyNextByte
iny
cpy #40 ; 280/7 = 40 bytes/scanline
bne CopyScanLine

->

Draw8Rows
ldy #39
CopyScanLine
lda UnpackAddr,Y
sta (zHgrPtr),Y

txa ; Clear source on last scanline copy
bne CopyNextByte
sta UnpackAddr,Y
CopyNextByte
dey ; 280/7 = 40 bytes/scanline
bpl CopyScanLine

--

dex
bpl Draw8Rows

ldy zSaveY

lda zCursorY
cmp #$14 ; Y=$40 .. $A0, Rows $8..$13 (inclusive)
bcs OuputDone

ldx #0
stx zSaveX
beq LineNotDone

->

stx zSaveX
dex
bpl Draw8Rows

ldy zSaveY

lda zCursorY
cmp #$14 ; Y=$40 .. $A0, Rows $8..$13 (inclusive)
bcs OuputDone

bcc LineNotDone

--

LineNotDone
stx zDstShift
;;clc ;clear already in all paths
Michael 'AppleWin Debugger Dev'
2017-07-17 21:17:23 UTC
Permalink
Raw Message
Post by qkumba
Post by Michael 'AppleWin Debugger Dev'
I'm pretty sure the decompression could be optimized. Maybe if qkumba is bored he can take a look. :-)
lda zUnpackBits
asl ; A=00000ba0
asl ; A=0000ba00
eor zUnpackBits ;A=0000xxyy
and #$0c ; A=0000xx00
eor zUnpackBits ;A=0000baba
Sorry, my comments were incomplete!
This needs a trailing AND #$F
I've updated the annotation of A reg

lda zUnpackBits
asl ; A=?????ba0
asl ; A=????ba00
eor zUnpackBits ; A=????xxyy
and #%00001100 ; A=????xx00
eor zUnpackBits ; A=????baba
and #%00001111 ; A=0000baba
Post by qkumba
--
NoShiftSherlock
Nice compact way to set the MSB :-)
Post by qkumba
--
Draw8Rows
Ah, yes, count down instead of up to save an redundant CPY
Post by qkumba
--
stx zSaveX
dex
bpl Draw8Rows
Sweet trick of setting X=0 on last iteration.
Definitely going to have to remember that one.
Post by qkumba
--
LineNotDone
stx zDstShift
We need to restore X, so we need:

dex
bpl Draw8Rows
inx
:
ldx zSaveX
Post by qkumba
;;clc ;clear already in all paths
Nice eyes!


Down to 696 bytes -- again ;-)
(I added a pretty-print of the models.)
Michael 'AppleWin Debugger Dev'
2017-07-18 20:40:11 UTC
Permalink
Raw Message
Thanks for all the help Peter!

I'm constantly impressed and amazed at your ability to squeeze out a few more bytes. I guess that's one for the CV's. Hobbies: 6502 optimization / minification. :-)

Down to 681 bytes after juggling the X and Y regs in Unpack2Bits. I'm very happy with the size. Since I don't see any more obvious minification / code golf opportunities (the few I tried didn't pan out) and need to get to my other projects this is "good enough."

Michael
qkumba
2017-07-19 15:46:37 UTC
Permalink
Raw Message
Post by Michael 'AppleWin Debugger Dev'
Thanks for all the help Peter!
you're welcome.
Post by Michael 'AppleWin Debugger Dev'
this is "good enough."
In case you return to it:

pha
cmp #$38 ;
bne apple_iiplus

apple_ii
pla
jsr IB_HGR ; HGR on ][
beq apple_ii_normal ; always, ends with BNE $D01B RTS

apple_iiplus

->

cmp #$38 ;
bne apple_iiplus

apple_ii
jsr IB_HGR ; HGR on ][
beq apple_ii_normal ; always, ends with BNE $D01B RTS

apple_iiplus
pha

--

RAM_64K
;; ldx #"6" ;x is already "6"

--

lda #0
tay
tax ; SrcShift=0

->

ldx #0
ldy #0
A is destroyed right after the STAs anyway.

--

ldy zDstShift ; which 280 px column is next pixel writing to?
beq NoShiftSherlock
MakeShiftMask
asl
rol zMask
dey
bne MakeShiftMask
NoShiftSherlock

asl ; msb of byte0 set?
rol zMask ; shift in to lsb of byte1

->

ldy zDstShift ; which 280 px column is next pixel writing to?
MakeShiftMask
asl ; msb of byte0 set?
rol zMask ; shift in to lsb of byte1
dey
bpl MakeShiftMask

--

lda zDstShift ; x={0,1,2} + 4 < 7
;; clc ;carry is cleared by ror after asl above
adc #4
cmp #7 ; all bits fit into dest byte?
bcc FitSameByte
Michael 'AppleWin Debugger Dev'
2017-07-19 17:15:30 UTC
Permalink
Raw Message
:-)
Post by qkumba
cmp #$38 ;
bne apple_iiplus
apple_ii
jsr IB_HGR ; HGR on ][
beq apple_ii_normal ; always, ends with BNE $D01B RTS
apple_iiplus
pha
Ah, yes, no point doing PHA before it is needed.
Post by qkumba
--
RAM_64K
;; ldx #"6" ;x is already "6"
NAK.

Code path for //e (original) is:

apple_iiplus
apple_iie
- branches back up to RAM_64K.

I tried moving the LDX #6 above but the IB_HGR and AS_HGR trash X -- need to see if Y is available ...
Post by qkumba
--
ldx #0
ldy #0
A is destroyed right after the STAs anyway.
Same size -- but ooooh, the LDX #0 can be completely removed now since it is set in NextSrcShift !
Post by qkumba
->
ldy zDstShift ; which 280 px column is next pixel writing to?
MakeShiftMask
asl ; msb of byte0 set?
rol zMask ; shift in to lsb of byte1
dey
bpl MakeShiftMask
Ah, yes! That extra ASL ROL was bugging me. Don't know why I didn't think of the BPL for an extra loop iteration.
Post by qkumba
--
lda zDstShift ; x={0,1,2} + 4 < 7
;; clc ;carry is cleared by ror after asl above
Nice eyes!

667 bytes now.

v***@pianoman.cluster.toy
2017-07-17 21:10:12 UTC
Permalink
Raw Message
Post by John Brooks
I would if LZSS was an efficient codec for this use-case, but
unfortunately it is not. The ascii art logo is short-length-RLE in
nature which is ill-suited to LZSS's design of handling large files of
english text with repeating words and phrases.
Interesting. When I originally started on this project 10+ years ago,
I used RLE but it turned out that LZSS won by a large margin at least
on x86. The uncompress code was smaller for RLE but the data input
was much smaller.

Maybe that doesn't hold on 6502. This is code I wrote a while ago and
my 6502 assembly skills have never been that great, things fall apart
anytime I need to deal with values wider than 8-bits.

Although the current top size decrease is amazing, the purpose of
the project long ago has morphed from "how small can I print an ascii
art logo" to "how small can I run lzss, followed by some text parsing
of a file read from disk".

Vince
Michael 'AppleWin Debugger Dev'
2017-07-17 21:32:54 UTC
Permalink
Raw Message
Post by v***@pianoman.cluster.toy
Although the current top size decrease is amazing, the purpose of
the project long ago has morphed from "how small can I print an ascii
art logo" to "how small can I run lzss, followed by some text parsing
of a file read from disk".
Vince
1. Since we're using compression / packing anyways is it "cheating" if we store a packed 2-bits/char instead of the ANSI text? :-)

As Mike Acton would say in his 3 Big Lies -- Lie #1 Software is the Platform.


No, the *Platform* IS the problem. The problem is:
"How do we store in an efficient manner and display the Linux Logo on an Apple 2?"

The first issue we run into is there is no 80-columns on the Apple ][ and ][+.

The second issue is how do store the HGR data efficiently?

This is a different problem from an x86 where no one cares if we waste 80*12 = 960 bytes. Storing the data in an optimal format per platform is a valid solution. Why waste time+space storing it compressed in ASCII when we will only show in HGR mode on the Apple 2??

2. I added 48K / 64K detection. Feel free to borrow.

i.e.
detect_langcard
sta RAMIN ; Detect 16K RAM / Language Card
sta RAMIN ; Read RAM

lda $D000
eor #$FF
sta $D000
cmp $D000
bne apple_ii_48K
eor #$FF
sta $D000

RAM_64K
ldx #"6"
ldy #"4"
bne RAM_size
apple_ii_48K
ldx #"4"
ldy #"8"
RAM_size

done_detecting
sta ROMIN ; Turn off Language Card
sta ROMIN ; if it was probed


3. I dropped the disk read because on x86 we could simply call INT 13h to read a Track/Sector which isn't exactly "fair."
https://en.wikipedia.org/wiki/INT_13H

It wouldn't be _that_ hard to re-use the P5 firmware $C600 code to read a Sector of data for the Apple 2 version.
Michael J. Mahon
2017-07-17 22:15:16 UTC
Permalink
Raw Message
Post by v***@pianoman.cluster.toy
Post by John Brooks
I would if LZSS was an efficient codec for this use-case, but
unfortunately it is not. The ascii art logo is short-length-RLE in
nature which is ill-suited to LZSS's design of handling large files of
english text with repeating words and phrases.
Interesting. When I originally started on this project 10+ years ago,
I used RLE but it turned out that LZSS won by a large margin at least
on x86. The uncompress code was smaller for RLE but the data input
was much smaller.
Maybe that doesn't hold on 6502. This is code I wrote a while ago and
my 6502 assembly skills have never been that great, things fall apart
anytime I need to deal with values wider than 8-bits.
Although the current top size decrease is amazing, the purpose of
the project long ago has morphed from "how small can I print an ascii
art logo" to "how small can I run lzss, followed by some text parsing
of a file read from disk".
Vince
Of course, the compression ratio does not depend on the processor, but only
on the input data and the compression algorithm.

I suspect that the statistics of the input data have changed significantly
from when you did your early experiments.

There is no such thing as a universally "best" compression algorithm. As
the input changes, the best compression algorithm changes.
--
-michael - NadaNet 3.1 and AppleCrate II: http://michaeljmahon.com
g***@sasktel.net
2017-07-18 05:22:19 UTC
Permalink
Raw Message
Post by Michael J. Mahon
There is no such thing as a universally "best" compression algorithm. As
the input changes, the best compression algorithm changes.
Is it not also possible that the "best compression algorithm" can also change according to the input data size?

i.e. when using LZ4 to compress text, I found it was not worth compressing text if the input size was 10 blocks or less, but at 20 blocks I was getting about 40% compression, at 30 blocks was averaging over 50%, and one 60 block text file got close to 75% compression.

So the larger the file the better the compression, whereas at smaller sizes, another compressor would be better.

Back when I started playing with LZ4, for me, LZ4 has now become pretty close to being a universal compressor.

Have tried it on system files, hi-res and dbl hi-res graphics and, text files, all with some pretty good results.
g***@sasktel.net
2017-07-18 20:59:30 UTC
Permalink
Raw Message
Post by Michael J. Mahon
There is no such thing as a universally "best" compression algorithm. As
the input changes, the best compression algorithm changes.
Never mind, just re-read that and it had a different meaning when I read it the first time. :)
Loading...