Discussion:
Sieve of Eratosthenes benchmark
(too old to reply)
John Brooks
2018-03-18 18:56:21 UTC
Permalink
I wrote an optimized Apple IIGS program to calculate all the prime numbers up to 16834. Finding primes via the SoE method is a common CPU benchmark and I wanted to see how fast the GS could compute primes.

The fastest previous versions for the 6502/65c02/65802/65816 was 140 usec @ 1 MHz.

This version is over 2x faster, finding the 1899 primes in ~59 usec @ 1 MHz, and ~24 usec @ 2.8 MHz.

The program is 245 bytes long, consisting of the sieve routine at 192 bytes, and a 53 byte test program which runs the sieve 100 times and prints out the number of primes found: 1899.

00/2000:A2 64 18 FB C2 30 86 40 20 35 20 C6 40 D0 F9 38
00/2010:FB 20 3A FF 18 FB C2 30 78 BA 9B A9 00 60 1B 7B
00/2020:AB D0 01 1A BA 10 F9 BB 9A 4B AB AA EB 58 FB 20
00/2030:24 ED 4C 8E FD 0B 3B 8D EC 20 A9 FF 7F 1B A2 4E
00/2040:00 86 42 7B 1A AA EB A8 7B 3A 5A 8B 48 48 0B 5A
00/2050:DA DA 5A DA 5A 5A DA DA 5A 48 0B 48 DA 0B 48 DA
00/2060:DA 5A 48 5A 5A DA 0B 5A 48 5A 48 DA DA 5A DA DA
00/2070:5A DA 5A 48 DA 0B 5A 48 0B 48 DA 0B 5A DA 48 C6
00/2080:42 D0 C7 7A 0B 0B E2 14 A9 00 20 5B A9 05 60 1B
00/2090:A9 3C 60 A0 0A 85 E2 C8 84 AE 84 B3 84 B8 84 BD
00/20A0:84 C2 84 C7 84 CC 84 D1 C2 10 BA 1B 08 69 00 00
00/20B0:1B 08 69 00 00 1B 08 69 00 00 1B 08 69 00 00 1B
00/20C0:08 69 00 00 1B 08 69 00 00 1B 08 69 00 00 1B 08
00/20D0:69 00 00 10 D6 9A E2 10 C8 10 04 C8 C8 30 0C 98
00/20E0:0A 69 00 00 85 E2 AB D0 F2 80 AC A9 00 00 1B 2B
00/20F0:4B AB C2 34 60

The sieve routine at $2035 is called in native 65816 mode with 16-bit AXY, B=0, D=0.

Enjoy.

-JB
@JBrooksBSI

00/2000: A2 64 LDX #64
00/2002: 18 CLC
00/2003: FB XCE
00/2004: C2 30 REP #30
00/2006: 86 40 STX 40
00/2008: 20 35 20 JSR 2035
00/200B: C6 40 DEC 40
00/200D: D0 F9 BNE 2008 {-07}
00/200F: 38 SEC
00/2010: FB XCE
00/2011: 20 3A FF JSR FF3A
00/2014: 18 CLC
00/2015: FB XCE
00/2016: C2 30 REP #30
00/2018: 78 SEI
00/2019: BA TSX
00/201A: 9B TXY
00/201B: A9 00 60 LDA #6000
00/201E: 1B TCS
00/201F: 7B TDC
00/2020: AB PLB
00/2021: D0 01 BNE 2024 {+01}
00/2023: 1A INC
00/2024: BA TSX
00/2025: 10 F9 BPL 2020 {-07}
00/2027: BB TYX
00/2028: 9A TXS
00/2029: 4B PHK
00/202A: AB PLB
00/202B: AA TAX
00/202C: EB XBA
00/202D: 58 CLI
00/202E: FB XCE
00/202F: 20 24 ED JSR ED24
00/2032: 4C 8E FD JMP FD8E
00/2035: 0B PHD
00/2036: 3B TSC
00/2037: 8D EC 20 STA 20EC
00/203A: A9 FF 7F LDA #7FFF
00/203D: 1B TCS
00/203E: A2 4E 00 LDX #004E
00/2041: 86 42 STX 42
00/2043: 7B TDC
00/2044: 1A INC
00/2045: AA TAX
00/2046: EB XBA
00/2047: A8 TAY
00/2048: 7B TDC
00/2049: 3A DEC
00/204A: 5A PHY
00/204B: 8B PHB
00/204C: 48 PHA
00/204D: 48 PHA
00/204E: 0B PHD
00/204F: 5A PHY
00/2050: DA PHX
00/2051: DA PHX
00/2052: 5A PHY
00/2053: DA PHX
00/2054: 5A PHY
00/2055: 5A PHY
00/2056: DA PHX
00/2057: DA PHX
00/2058: 5A PHY
00/2059: 48 PHA
00/205A: 0B PHD
00/205B: 48 PHA
00/205C: DA PHX
00/205D: 0B PHD
00/205E: 48 PHA
00/205F: DA PHX
00/2060: DA PHX
00/2061: 5A PHY
00/2062: 48 PHA
00/2063: 5A PHY
00/2064: 5A PHY
00/2065: DA PHX
00/2066: 0B PHD
00/2067: 5A PHY
00/2068: 48 PHA
00/2069: 5A PHY
00/206A: 48 PHA
00/206B: DA PHX
00/206C: DA PHX
00/206D: 5A PHY
00/206E: DA PHX
00/206F: DA PHX
00/2070: 5A PHY
00/2071: DA PHX
00/2072: 5A PHY
00/2073: 48 PHA
00/2074: DA PHX
00/2075: 0B PHD
00/2076: 5A PHY
00/2077: 48 PHA
00/2078: 0B PHD
00/2079: 48 PHA
00/207A: DA PHX
00/207B: 0B PHD
00/207C: 5A PHY
00/207D: DA PHX
00/207E: 48 PHA
00/207F: C6 42 DEC 42
00/2081: D0 C7 BNE 204A {-39}
00/2083: 7A PLY
00/2084: 0B PHD
00/2085: 0B PHD
00/2086: E2 14 SEP #14
00/2088: A9 00 20 LDA #2000
00/208B: 5B TCD
00/208C: A9 05 60 LDA #6005
00/208F: 1B TCS
00/2090: A9 3C 60 LDA #603C
00/2093: A0 0A LDY #0A
00/2095: 85 E2 STA E2
00/2097: C8 INY
00/2098: 84 AE STY AE
00/209A: 84 B3 STY B3
00/209C: 84 B8 STY B8
00/209E: 84 BD STY BD
00/20A0: 84 C2 STY C2
00/20A2: 84 C7 STY C7
00/20A4: 84 CC STY CC
00/20A6: 84 D1 STY D1
00/20A8: C2 10 REP #10
00/20AA: BA TSX
00/20AB: 1B TCS
00/20AC: 08 PHP
00/20AD: 69 00 00 ADC #0000
00/20B0: 1B TCS
00/20B1: 08 PHP
00/20B2: 69 00 00 ADC #0000
00/20B5: 1B TCS
00/20B6: 08 PHP
00/20B7: 69 00 00 ADC #0000
00/20BA: 1B TCS
00/20BB: 08 PHP
00/20BC: 69 00 00 ADC #0000
00/20BF: 1B TCS
00/20C0: 08 PHP
00/20C1: 69 00 00 ADC #0000
00/20C4: 1B TCS
00/20C5: 08 PHP
00/20C6: 69 00 00 ADC #0000
00/20C9: 1B TCS
00/20CA: 08 PHP
00/20CB: 69 00 00 ADC #0000
00/20CE: 1B TCS
00/20CF: 08 PHP
00/20D0: 69 00 00 ADC #0000
00/20D3: 10 D6 BPL 20AB {-2A}
00/20D5: 9A TXS
00/20D6: E2 10 SEP #10
00/20D8: C8 INY
00/20D9: 10 04 BPL 20DF {+04}
00/20DB: C8 INY
00/20DC: C8 INY
00/20DD: 30 0C BMI 20EB {+0C}
00/20DF: 98 TYA
00/20E0: 0A ASL
00/20E1: 69 00 00 ADC #0000
00/20E4: 85 E2 STA E2
00/20E6: AB PLB
00/20E7: D0 F2 BNE 20DB {-0E}
00/20E9: 80 AC BRA 2097 {-54}
00/20EB: A9 00 00 LDA #0000
00/20EE: 1B TCS
00/20EF: 2B PLD
00/20F0: 4B PHK
00/20F1: AB PLB
00/20F2: C2 34 REP #34
00/20F4: 60 RTS
Antoine Vignau
2018-03-18 20:24:54 UTC
Permalink
You are crazy, John ;-)
b***@gmail.com
2018-03-18 20:57:30 UTC
Permalink
Yes, crazy in a totally great, awesome way!

I love these optimized benchmark programs. How about giving us a native 6502/6502c version where we can specify the number of times the sieve is run, and therefore, the ultimate number of primes desired? My rusty old skills are not up the the task, although I think it is another great exercise in demonstrating speed of execution vs size of code issue, and here, of course, we're going for ultimate speed in calculating the sieve.
Antoine Vignau
2018-03-18 22:01:40 UTC
Permalink
See https://rosettacode.org/wiki/Sieve_of_Eratosthenes
Antoine Vignau
2018-03-18 22:04:14 UTC
Permalink
In John's code, all # must be read as #$. Hex code, not decimal,
av
John Brooks
2018-03-19 00:24:14 UTC
Permalink
Post by John Brooks
I wrote an optimized Apple IIGS program to calculate all the prime numbers up to 16834. Finding primes via the SoE method is a common CPU benchmark and I wanted to see how fast the GS could compute primes.
The program is 245 bytes long, consisting of the sieve routine at 192 bytes, and a 53 byte test program which runs the sieve 100 times and prints out the number of primes found: 1899.
00/2000:A2 64 18 FB C2 30 86 40 20 35 20 C6 40 D0 F9 38
00/2010:FB 20 3A FF 18 FB C2 30 78 BA 9B A9 00 60 1B 7B
00/2020:AB D0 01 1A BA 10 F9 BB 9A 4B AB AA EB 58 FB 20
00/2030:24 ED 4C 8E FD 0B 3B 8D EC 20 A9 FF 7F 1B A2 4E
00/2040:00 86 42 7B 1A AA EB A8 7B 3A 5A 8B 48 48 0B 5A
00/2050:DA DA 5A DA 5A 5A DA DA 5A 48 0B 48 DA 0B 48 DA
00/2060:DA 5A 48 5A 5A DA 0B 5A 48 5A 48 DA DA 5A DA DA
00/2070:5A DA 5A 48 DA 0B 5A 48 0B 48 DA 0B 5A DA 48 C6
00/2080:42 D0 C7 7A 0B 0B E2 14 A9 00 20 5B A9 05 60 1B
00/2090:A9 3C 60 A0 0A 85 E2 C8 84 AE 84 B3 84 B8 84 BD
00/20A0:84 C2 84 C7 84 CC 84 D1 C2 10 BA 1B 08 69 00 00
00/20B0:1B 08 69 00 00 1B 08 69 00 00 1B 08 69 00 00 1B
00/20C0:08 69 00 00 1B 08 69 00 00 1B 08 69 00 00 1B 08
00/20D0:69 00 00 10 D6 9A E2 10 C8 10 04 C8 C8 30 0C 98
00/20E0:0A 69 00 00 85 E2 AB D0 F2 80 AC A9 00 00 1B 2B
00/20F0:4B AB C2 34 60
The sieve routine at $2035 is called in native 65816 mode with 16-bit AXY, B=0, D=0.
Enjoy.
-JB
@JBrooksBSI
00/2000: A2 64 LDX #64
00/2002: 18 CLC
00/2003: FB XCE
00/2004: C2 30 REP #30
00/2006: 86 40 STX 40
00/2008: 20 35 20 JSR 2035
00/200B: C6 40 DEC 40
00/200D: D0 F9 BNE 2008 {-07}
00/200F: 38 SEC
00/2010: FB XCE
00/2011: 20 3A FF JSR FF3A
00/2014: 18 CLC
00/2015: FB XCE
00/2016: C2 30 REP #30
00/2018: 78 SEI
00/2019: BA TSX
00/201A: 9B TXY
00/201B: A9 00 60 LDA #6000
00/201E: 1B TCS
00/201F: 7B TDC
00/2020: AB PLB
00/2021: D0 01 BNE 2024 {+01}
00/2023: 1A INC
00/2024: BA TSX
00/2025: 10 F9 BPL 2020 {-07}
00/2027: BB TYX
00/2028: 9A TXS
00/2029: 4B PHK
00/202A: AB PLB
00/202B: AA TAX
00/202C: EB XBA
00/202D: 58 CLI
00/202E: FB XCE
00/202F: 20 24 ED JSR ED24
00/2032: 4C 8E FD JMP FD8E
00/2035: 0B PHD
00/2036: 3B TSC
00/2037: 8D EC 20 STA 20EC
00/203A: A9 FF 7F LDA #7FFF
00/203D: 1B TCS
00/203E: A2 4E 00 LDX #004E
00/2041: 86 42 STX 42
00/2043: 7B TDC
00/2044: 1A INC
00/2045: AA TAX
00/2046: EB XBA
00/2047: A8 TAY
00/2048: 7B TDC
00/2049: 3A DEC
00/204A: 5A PHY
00/204B: 8B PHB
00/204C: 48 PHA
00/204D: 48 PHA
00/204E: 0B PHD
00/204F: 5A PHY
00/2050: DA PHX
00/2051: DA PHX
00/2052: 5A PHY
00/2053: DA PHX
00/2054: 5A PHY
00/2055: 5A PHY
00/2056: DA PHX
00/2057: DA PHX
00/2058: 5A PHY
00/2059: 48 PHA
00/205A: 0B PHD
00/205B: 48 PHA
00/205C: DA PHX
00/205D: 0B PHD
00/205E: 48 PHA
00/205F: DA PHX
00/2060: DA PHX
00/2061: 5A PHY
00/2062: 48 PHA
00/2063: 5A PHY
00/2064: 5A PHY
00/2065: DA PHX
00/2066: 0B PHD
00/2067: 5A PHY
00/2068: 48 PHA
00/2069: 5A PHY
00/206A: 48 PHA
00/206B: DA PHX
00/206C: DA PHX
00/206D: 5A PHY
00/206E: DA PHX
00/206F: DA PHX
00/2070: 5A PHY
00/2071: DA PHX
00/2072: 5A PHY
00/2073: 48 PHA
00/2074: DA PHX
00/2075: 0B PHD
00/2076: 5A PHY
00/2077: 48 PHA
00/2078: 0B PHD
00/2079: 48 PHA
00/207A: DA PHX
00/207B: 0B PHD
00/207C: 5A PHY
00/207D: DA PHX
00/207E: 48 PHA
00/207F: C6 42 DEC 42
00/2081: D0 C7 BNE 204A {-39}
00/2083: 7A PLY
00/2084: 0B PHD
00/2085: 0B PHD
00/2086: E2 14 SEP #14
00/2088: A9 00 20 LDA #2000
00/208B: 5B TCD
00/208C: A9 05 60 LDA #6005
00/208F: 1B TCS
00/2090: A9 3C 60 LDA #603C
00/2093: A0 0A LDY #0A
00/2095: 85 E2 STA E2
00/2097: C8 INY
00/2098: 84 AE STY AE
00/209A: 84 B3 STY B3
00/209C: 84 B8 STY B8
00/209E: 84 BD STY BD
00/20A0: 84 C2 STY C2
00/20A2: 84 C7 STY C7
00/20A4: 84 CC STY CC
00/20A6: 84 D1 STY D1
00/20A8: C2 10 REP #10
00/20AA: BA TSX
00/20AB: 1B TCS
00/20AC: 08 PHP
00/20AD: 69 00 00 ADC #0000
00/20B0: 1B TCS
00/20B1: 08 PHP
00/20B2: 69 00 00 ADC #0000
00/20B5: 1B TCS
00/20B6: 08 PHP
00/20B7: 69 00 00 ADC #0000
00/20BA: 1B TCS
00/20BB: 08 PHP
00/20BC: 69 00 00 ADC #0000
00/20BF: 1B TCS
00/20C0: 08 PHP
00/20C1: 69 00 00 ADC #0000
00/20C4: 1B TCS
00/20C5: 08 PHP
00/20C6: 69 00 00 ADC #0000
00/20C9: 1B TCS
00/20CA: 08 PHP
00/20CB: 69 00 00 ADC #0000
00/20CE: 1B TCS
00/20CF: 08 PHP
00/20D0: 69 00 00 ADC #0000
00/20D3: 10 D6 BPL 20AB {-2A}
00/20D5: 9A TXS
00/20D6: E2 10 SEP #10
00/20D8: C8 INY
00/20D9: 10 04 BPL 20DF {+04}
00/20DB: C8 INY
00/20DC: C8 INY
00/20DD: 30 0C BMI 20EB {+0C}
00/20DF: 98 TYA
00/20E0: 0A ASL
00/20E1: 69 00 00 ADC #0000
00/20E4: 85 E2 STA E2
00/20E6: AB PLB
00/20E7: D0 F2 BNE 20DB {-0E}
00/20E9: 80 AC BRA 2097 {-54}
00/20EB: A9 00 00 LDA #0000
00/20EE: 1B TCS
00/20EF: 2B PLD
00/20F0: 4B PHK
00/20F1: AB PLB
00/20F2: C2 34 REP #34
00/20F4: 60 RTS
.
Post by John Brooks
I think it is another great exercise in demonstrating speed of execution vs size of code issue
Below is a version of sieve without unrolled code or self-modified code.

The sieve routine is just 66 bytes at $2035. The test code is 53 bytes. The code is simpler to read, but is 2x slower: ~125 usec to calculate the 1899 primes up to 16384.

00/2000:A2 64 18 FB C2 30 86 40 20 35 20 C6 40 D0 F9 38
00/2010:FB 20 3A FF 18 FB C2 30 78 BA 9B A9 00 60 1B 7B
00/2020:AB D0 01 1A BA 10 F9 BB 9A 4B AB AA EB 58 FB 20
00/2030:24 ED 4C 8E FD 3B 85 44 A2 FF 7F 9A A9 AA 0A E2
00/2040:14 DA 0B 3A D0 FB 0B 68 A9 02 60 1B A9 0C 60 85
00/2050:48 A0 04 C2 10 C8 84 46 BA 1B 08 65 46 10 FA 9A
00/2060:88 C8 C8 98 0A 65 48 30 07 85 48 AB D0 F3 80 E5
00/2070:A5 44 1B 4B AB 58 60
Post by John Brooks
In John's code, all # must be read as #$. Hex code, not decimal
Yes, these listings are from the GS monitor, so all numbers are in hex

-JB
@JBrooksBSI

00/2000: A2 64 LDX #64
00/2002: 18 CLC
00/2003: FB XCE
00/2004: C2 30 REP #30
00/2006: 86 40 STX 40
00/2008: 20 35 20 JSR 2035
00/200B: C6 40 DEC 40
00/200D: D0 F9 BNE 2008 {-07}
00/200F: 38 SEC
00/2010: FB XCE
00/2011: 20 3A FF JSR FF3A
00/2014: 18 CLC
00/2015: FB XCE
00/2016: C2 30 REP #30
00/2018: 78 SEI
00/2019: BA TSX
00/201A: 9B TXY
00/201B: A9 00 60 LDA #6000
00/201E: 1B TCS
00/201F: 7B TDC
00/2020: AB PLB
00/2021: D0 01 BNE 2024 {+01}
00/2023: 1A INC
00/2024: BA TSX
00/2025: 10 F9 BPL 2020 {-07}
00/2027: BB TYX
00/2028: 9A TXS
00/2029: 4B PHK
00/202A: AB PLB
00/202B: AA TAX
00/202C: EB XBA
00/202D: 58 CLI
00/202E: FB XCE
00/202F: 20 24 ED JSR ED24
00/2032: 4C 8E FD JMP FD8E
00/2035: 3B TSC
00/2036: 85 44 STA 44
00/2038: A2 FF 7F LDX #7FFF
00/203B: 9A TXS
00/203C: A9 AA 0A LDA #0AAA
00/203F: E2 14 SEP #14
00/2041: DA PHX
00/2042: 0B PHD
00/2043: 3A DEC
00/2044: D0 FB BNE 2041 {-05}
00/2046: 0B PHD
00/2047: 68 PLA
00/2048: A9 02 60 LDA #6002
00/204B: 1B TCS
00/204C: A9 0C 60 LDA #600C
00/204F: 85 48 STA 48
00/2051: A0 04 C2 LDY #C204
00/2054: 10 C8 BPL 201E {-38}
00/2056: 84 46 STY 46
00/2058: BA TSX
00/2059: 1B TCS
00/205A: 08 PHP
00/205B: 65 46 ADC 46
00/205D: 10 FA BPL 2059 {-06}
00/205F: 9A TXS
00/2060: 88 DEY
00/2061: C8 INY
00/2062: C8 INY
00/2063: 98 TYA
00/2064: 0A ASL
00/2065: 65 48 ADC 48
00/2067: 30 07 BMI 2070 {+07}
00/2069: 85 48 STA 48
00/206B: AB PLB
00/206C: D0 F3 BNE 2061 {-0D}
00/206E: 80 E5 BRA 2055 {-1B}
00/2070: A5 44 LDA 44
00/2072: 1B TCS
00/2073: 4B PHK
00/2074: AB PLB
00/2075: 58 CLI
00/2076: 60 RTS
barrym95838
2018-03-19 06:38:57 UTC
Permalink
Post by John Brooks
I wrote an optimized Apple IIGS program to calculate all the prime numbers
up to 16834. Finding primes via the SoE method is a common CPU benchmark
and I wanted to see how fast the GS could compute primes.
I know you're pretty darn good, but getting a 1 MHz anything to find 1899
anythings in ~59 microseconds is pure sorcery. Unless you really meant to
say "msec" ... ;-)

Mike B.
James Davis
2018-03-19 07:42:47 UTC
Permalink
Post by barrym95838
Post by John Brooks
I wrote an optimized Apple IIGS program to calculate all the prime numbers
up to 16834. Finding primes via the SoE method is a common CPU benchmark
and I wanted to see how fast the GS could compute primes.
I know you're pretty darn good, but getting a 1 MHz anything to find 1899
anythings in ~59 microseconds is pure sorcery. Unless you really meant to
say "msec" ... ;-)
Mike B.
He is talking usec, You-Seconds; they are totally subjective, like 1-mississippi, 2-mississippi, 3-mississippi, etc., or at the speed of Your sub-articulation. ;-D & LOL!
barrym95838
2018-03-19 15:25:43 UTC
Permalink
Post by James Davis
Post by barrym95838
Post by John Brooks
I wrote an optimized Apple IIGS program to calculate all the prime numbers
up to 16834. Finding primes via the SoE method is a common CPU benchmark
and I wanted to see how fast the GS could compute primes.
I know you're pretty darn good, but getting a 1 MHz anything to find 1899
anythings in ~59 microseconds is pure sorcery. Unless you really meant to
say "msec" ... ;-)
Mike B.
He is talking usec, You-Seconds; they are totally subjective, like 1-
mississippi, 2-mississippi, 3-mississippi, etc., or at the speed of Your
sub-articulation. ;-D & LOL!
... and while we're doing a bit of friendly trolling, please observe:

10 INPUT "PRIME NUMBER UPPER SEARCH LIMIT: ";L: DIM F%(L)
20 FOR N = 2 TO SQR (L): IF F%(N) THEN 40
30 FOR K = N * N TO L STEP N:F%(K) = 1: NEXT
40 NEXT :K = 0: FOR N = 2 TO L: IF F%(N) = 0 THEN K = K + 1: REM PRINT N,
50 NEXT : PRINT "TOTAL OF "K

My version is about 4000 times slower, but because it takes its time it was
able to find the 1900th prime that John's speed demon completely missed!

Mike B.
James Davis
2018-03-19 16:45:02 UTC
Permalink
Post by barrym95838
Post by James Davis
Post by barrym95838
Post by John Brooks
I wrote an optimized Apple IIGS program to calculate all the prime numbers
up to 16834. Finding primes via the SoE method is a common CPU benchmark
and I wanted to see how fast the GS could compute primes.
The fastest previous versions for the 6502/65c02/65802/65816 was 140 usec
@ 1 MHz.
I know you're pretty darn good, but getting a 1 MHz anything to find 1899
anythings in ~59 microseconds is pure sorcery. Unless you really meant to
say "msec" ... ;-)
Mike B.
He is talking usec, You-Seconds; they are totally subjective, like 1-
mississippi, 2-mississippi, 3-mississippi, etc., or at the speed of Your
sub-articulation. ;-D & LOL!
10 INPUT "PRIME NUMBER UPPER SEARCH LIMIT: ";L: DIM F%(L)
20 FOR N = 2 TO SQR (L): IF F%(N) THEN 40
30 FOR K = N * N TO L STEP N:F%(K) = 1: NEXT
40 NEXT :K = 0: FOR N = 2 TO L: IF F%(N) = 0 THEN K = K + 1: REM PRINT N,
50 NEXT : PRINT "TOTAL OF "K
My version is about 4000 times slower, but because it takes its time it was
able to find the 1900th prime that John's speed demon completely missed!
Mike B.
Back in the day, after reading an article by WOZ about generating prime numbers using the Sweet16 interpreter, I wrote an Applesoft program to do it and print them out. Or, maybe WOZ wrote it and I just copied it. I don't rightly remember. It used the "&" vector to hook into the WOZ's Sweet16 interpreter routine, IIRC. It generated a list of 6,140 primes from 3 to 60,923 before I stopped it. It ran for at least 1/2 hour or maybe more. I do not know if it was using this algorithm or not, though.
barrym95838
2018-03-19 21:38:48 UTC
Permalink
Post by James Davis
Back in the day, after reading an article by WOZ about generating prime
numbers using the Sweet16 interpreter, I wrote an Applesoft program to do
it and print them out. Or, maybe WOZ wrote it and I just copied it. I
don't rightly remember. It used the "&" vector to hook into the WOZ's
Sweet16 interpreter routine, IIRC. It generated a list of 6,140 primes
from 3 to 60,923 before I stopped it. It ran for at least 1/2 hour or
maybe more. I do not know if it was using this algorithm or not, though.
AIUI, it's difficult for Sweet16 and Applesoft to get along with each
other for very long, since they both like to stomp on $0000..$001f. Or
maybe it's not as bad as I imagine ... I was never brave enough to try.

Mike B.
Gregory Schroeder
2018-03-21 22:56:33 UTC
Permalink
Post by barrym95838
Post by James Davis
Post by barrym95838
Post by John Brooks
I wrote an optimized Apple IIGS program to calculate all the prime numbers
up to 16834. Finding primes via the SoE method is a common CPU benchmark
and I wanted to see how fast the GS could compute primes.
The fastest previous versions for the 6502/65c02/65802/65816 was 140 usec
@ 1 MHz.
I know you're pretty darn good, but getting a 1 MHz anything to find 1899
anythings in ~59 microseconds is pure sorcery. Unless you really meant to
say "msec" ... ;-)
Mike B.
He is talking usec, You-Seconds; they are totally subjective, like 1-
mississippi, 2-mississippi, 3-mississippi, etc., or at the speed of Your
sub-articulation. ;-D & LOL!
10 INPUT "PRIME NUMBER UPPER SEARCH LIMIT: ";L: DIM F%(L)
20 FOR N = 2 TO SQR (L): IF F%(N) THEN 40
30 FOR K = N * N TO L STEP N:F%(K) = 1: NEXT
40 NEXT :K = 0: FOR N = 2 TO L: IF F%(N) = 0 THEN K = K + 1: REM PRINT N,
50 NEXT : PRINT "TOTAL OF "K
My version is about 4000 times slower, but because it takes its time it was
able to find the 1900th prime that John's speed demon completely missed!
Mike B.
I use this program in GSoft and the upper limit over 16377 gives me an overflow error.
barrym95838
2018-03-22 05:17:12 UTC
Permalink
Post by Gregory Schroeder
I use this program in GSoft and the upper limit over 16377 gives me an overflow error.
Hmmm ... I'm running on AppleWin with DOS 3.3 ... not really sure what
"GSoft" is ...

]POKE33,33:LIST:POKE33,40

10 INPUT "PRIME NUMBER UPPER SEARCH LIMIT: ";L: DIM F%(L)
20 FOR N = 2 TO SQR (L): IF F%(N) THEN 40
30 FOR K = N * N TO L STEP N:F%(K) = 1: NEXT
40 NEXT :K = 0: FOR N = 2 TO L: IF F%(N) = 0 THEN K = K + 1: REM PRINT N,
50 NEXT : PRINT "TOTAL OF "K

]RUN
PRIME NUMBER UPPER SEARCH LIMIT: 18077

?OUT OF MEMORY ERROR IN 30
]RUN
PRIME NUMBER UPPER SEARCH LIMIT: 18076
TOTAL OF 2071

]?FRE(0)
2

Mike B.
John Brooks
2018-03-22 11:36:19 UTC
Permalink
Post by barrym95838
Post by Gregory Schroeder
I use this program in GSoft and the upper limit over 16377 gives me an overflow error.
Hmmm ... I'm running on AppleWin with DOS 3.3 ... not really sure what
"GSoft" is ...
]POKE33,33:LIST:POKE33,40
10 INPUT "PRIME NUMBER UPPER SEARCH LIMIT: ";L: DIM F%(L)
20 FOR N = 2 TO SQR (L): IF F%(N) THEN 40
30 FOR K = N * N TO L STEP N:F%(K) = 1: NEXT
40 NEXT :K = 0: FOR N = 2 TO L: IF F%(N) = 0 THEN K = K + 1: REM PRINT N,
50 NEXT : PRINT "TOTAL OF "K
]RUN
PRIME NUMBER UPPER SEARCH LIMIT: 18077
?OUT OF MEMORY ERROR IN 30
]RUN
PRIME NUMBER UPPER SEARCH LIMIT: 18076
TOTAL OF 2071
]?FRE(0)
2
Mike B.
Hi Mike.

Some ideas for Applesoft SoE:

1) Try replacing the F% array with direct peek/pokes of memory = 1 byte per search number. This cuts memory use in half compared to 16-bit F% words.

2) My SoE code omits all even numbers from the search array, doing that in basic would cut memory use in half again.

3) Try Beagle Basic which has very fast integer math and can use GS or Ramworks memory for larger arrays.

Fun stuff!
-JB
@JBrooksBSI
I am Rob
2018-03-22 22:59:05 UTC
Permalink
Post by barrym95838
Post by Gregory Schroeder
I use this program in GSoft and the upper limit over 16377 gives me an overflow error.
Hmmm ... I'm running on AppleWin with DOS 3.3 ... not really sure what
"GSoft" is ...
]POKE33,33:LIST:POKE33,40
10 INPUT "PRIME NUMBER UPPER SEARCH LIMIT: ";L: DIM F%(L)
20 FOR N = 2 TO SQR (L): IF F%(N) THEN 40
30 FOR K = N * N TO L STEP N:F%(K) = 1: NEXT
40 NEXT :K = 0: FOR N = 2 TO L: IF F%(N) = 0 THEN K = K + 1: REM PRINT N,
50 NEXT : PRINT "TOTAL OF "K
]RUN
PRIME NUMBER UPPER SEARCH LIMIT: 18077
?OUT OF MEMORY ERROR IN 30
]RUN
PRIME NUMBER UPPER SEARCH LIMIT: 18076
TOTAL OF 2071
]?FRE(0)
2
Mike B.
This routine skips all even numbers and should run a little faster

10 INPUT "PRIME NUMBER UPPER SEARCH LIMIT: ";L: DIM F%(L / 2)
20 FOR N = 3 TO SQR (L) STEP 2:K% = N / 2: IF F%(K%) THEN NEXT : GOTO 50
30 FOR J = N * N TO L STEP N:K% = J / 2: IF J / 2 = K% THEN NEXT : NEXT : GOTO 50
40 F%(K%) = 1: NEXT : NEXT
50 K = 0: FOR N = 3 TO L / 2: IF F%(N) = 0 THEN K = K + 1
60 NEXT : PRINT "TOTAL OF "K + 1: PRINT "2 ";
65 FOR I = 1 TO 256: IF F%(I) = 1 THEN NEXT : END
70 PRINT (F%(I) = 0) * I * 2 + 1" ";: NEXT
I am Rob
2018-03-30 21:02:32 UTC
Permalink
Here are some Basic comparisons
All tests were done with Sweet16 at 1 Mhz

* Integer Basic - 154 secs
* time matches with this website
* http://www.txbobsc.com/aal/1981/aal8110.html

10 S=8191: DIM F(8191): N=0
20 FOR I=0 TO S: F(I)=1: NEXT I
30 FOR I=0 TO S: IF F(I)=0 THEN 80
40 P=I+I+3 : IF P>130 THEN 70 : K=I+P
50 IF K<S THEN F(K)=0: K=K+P: IF K<S THEN 50
70 N=N+1: REM PRINT P;" ";
80 NEXT I
90 PRINT: PRINT N;" PRIMES": END


* AS - without calculating even numbers
* using all FP variables - 128 secs
* max limit is 14443 using FP array DIM F(14443/2)
* using integer variables - 134 secs
* max limit is 18077 when using integer array DIM F%(18077/2)

10 L = 8191: DIM F%(L / 2)
20 FOR N = 3 TO SQR (L) STEP 2:K% = N / 2: IF F%(K%) THEN NEXT : GOTO 42
30 FOR J = N * N TO L STEP N:K% = J / 2: IF J / 2 = K% THEN NEXT : NEXT : GOTO 42
40 F%(K%) = 1: NEXT : NEXT
42 PRINT CHR$ (7)
50 PRINT "2 ";: FOR N = 1 TO 500: IF F%(N) = 0 THEN K = K + 1: PRINT N * 2 + 1" ";
60 NEXT : PRINT: PRINT "Total Primes: "K + 1


* Applesoft - 63 secs for Sweet16, 74 secs for Applewin
* max FP array is 7221 so unable to use FP to find all primes

10 L = 8191: DIM F%(L):A = 1
20 FOR N = 2 TO SQR (L): IF F%(N) THEN NEXT : GOTO 42
30 FOR K = N * N TO L STEP N:F%(K) = A: NEXT
40 NEXT
42 PRINT CHR$ (7):L = 500
45 K = 0: FOR N = 2 TO L: IF F%(N) = 0 THEN K = K + A: PRINT N" ";
50 NEXT : PRINT : PRINT "Total Primes: "K


Result: Sweet16 has a slightly faster 1 Mhz than Applewin
Now to test on a real machine to see which one is closer to authentic

Sweet16 will be compared with a real IIGS and Applewin to a real IIe. Will post back here with some results.
John Brooks
2018-03-31 03:42:24 UTC
Permalink
Post by I am Rob
Here are some Basic comparisons
All tests were done with Sweet16 at 1 Mhz
* Integer Basic - 154 secs
* time matches with this website
* http://www.txbobsc.com/aal/1981/aal8110.html
10 S=8191: DIM F(8191): N=0
20 FOR I=0 TO S: F(I)=1: NEXT I
30 FOR I=0 TO S: IF F(I)=0 THEN 80
40 P=I+I+3 : IF P>130 THEN 70 : K=I+P
50 IF K<S THEN F(K)=0: K=K+P: IF K<S THEN 50
70 N=N+1: REM PRINT P;" ";
80 NEXT I
90 PRINT: PRINT N;" PRIMES": END
* AS - without calculating even numbers
* using all FP variables - 128 secs
* max limit is 14443 using FP array DIM F(14443/2)
* using integer variables - 134 secs
* max limit is 18077 when using integer array DIM F%(18077/2)
10 L = 8191: DIM F%(L / 2)
20 FOR N = 3 TO SQR (L) STEP 2:K% = N / 2: IF F%(K%) THEN NEXT : GOTO 42
30 FOR J = N * N TO L STEP N:K% = J / 2: IF J / 2 = K% THEN NEXT : NEXT : GOTO 42
40 F%(K%) = 1: NEXT : NEXT
42 PRINT CHR$ (7)
50 PRINT "2 ";: FOR N = 1 TO 500: IF F%(N) = 0 THEN K = K + 1: PRINT N * 2 + 1" ";
60 NEXT : PRINT: PRINT "Total Primes: "K + 1
* Applesoft - 63 secs for Sweet16, 74 secs for Applewin
* max FP array is 7221 so unable to use FP to find all primes
10 L = 8191: DIM F%(L):A = 1
20 FOR N = 2 TO SQR (L): IF F%(N) THEN NEXT : GOTO 42
30 FOR K = N * N TO L STEP N:F%(K) = A: NEXT
40 NEXT
42 PRINT CHR$ (7):L = 500
45 K = 0: FOR N = 2 TO L: IF F%(N) = 0 THEN K = K + A: PRINT N" ";
50 NEXT : PRINT : PRINT "Total Primes: "K
Result: Sweet16 has a slightly faster 1 Mhz than Applewin
Now to test on a real machine to see which one is closer to authentic
Sweet16 will be compared with a real IIGS and Applewin to a real IIe. Will post back here with some results.
Last week I did a quick compiled Applesoft (Beagle Basic) version that took about 9 seconds at 1 MHz IIRC. I'll try to clean it up this weekend and post it too.

-JB
@JBrooksBSI
John Brooks
2018-04-01 08:05:04 UTC
Permalink
This Applesoft version of sieve takes 19.8 seconds @ 1 MHz, and 464 bytes:

5 F = 6 * 4096:D = F + 2
10 FOR I = 1 TO 15: READ W
15 FOR J = 1 TO 7: POKE D,W: POKE D + 105,W
20 IF W > 127 THEN W = W - 128
25 W = W + W:D = D + 1: NEXT J,I
30 FOR I = 1 TO 3: READ D,L,H
35 POKE D,L: POKE D + 1,H: NEXT I
40 FOR I = 1 TO 38: POKE 171,0: POKE 172,0: CALL - 6700: NEXT
45 FOR I = 0 TO 3: POKE F + I,128: NEXT
50 SQ = 9 * 9 / 2 + F:S = 9 / 2 + F
55 FOR N = 11 TO 127 STEP 2
60 SQ = SQ - 2 + N + N:S = S + 1
65 IF PEEK (S) < 128 THEN NEXT : GOTO 75
70 FOR D = SQ TO 32767 STEP N: POKE D,0: NEXT : NEXT
75 PRINT CHR$ (7)"1) 2":C = 2: FOR I = 1 TO 8191
80 IF PEEK (F + I) > 127 THEN PRINT C") "I + I + 1:C = C + 1
85 NEXT
90 DATA 26,166,44,146,150,40,138,180,36,154,50,44,152,182,12
95 DATA 0,210,0,1,2,96,113,212,96
save sieve.bas

This Beagle Basic version of sieve takes 2.6 seconds @ 1 MHz, and 412 bytes compiled:

5 FLAGS% = 6 * 4096:DST% = FLAGS% + 2
10 FOR I = 1 TO 15: READ WHEEL%
15 FOR J = 1 TO 7: POKE DST%,WHEEL%: POKE DST% + 105,WHEEL%
20 IF WHEEL% > 127 THEN WHEEL% = WHEEL% - 128
25 WHEEL% = WHEEL% + WHEEL%:DST% = DST% + 1: NEXT J,I
30 FOR I = 1 TO 3: READ DST%,VL%,VH%
35 POKE DST%,VL%: POKE DST% + 1,VH%: NEXT I
40 FOR I = 1 TO 38: POKE 171,0: POKE 172,3: CALL - 6700: NEXT
45 FOR I = 0 TO 3: POKE FLAGS% + I,128: NEXT
50 SQ% = 9 * 9 / 2 + FLAGS%:SRC% = 9 / 2 + FLAGS%
55 FOR NUM% = 11 TO 127 STEP 2
60 SQ% = SQ% - 2 + NUM% + NUM%:SRC% = SRC% + 1
65 IF PEEK (SRC%) < 128 THEN NEXT : GOTO 75
70 FOR DST% = SQ% TO 32767 STEP NUM%: POKE DST%,0: NEXT : NEXT
75 PRINT CHR$ (7)"1) 2":C = 2: FOR I = 1 TO 8191
80 IF PEEK (FLAGS% + I) > 127 THEN PRINT C") "I + I + 1:C = C + 1
85 NEXT
90 DATA 26,166,44,146,150,40,138,180,36,154,50,44,152,182,12
95 DATA 768,210,0,769,2,96,113,212,96
save sieve

NOTES:
5 Wheel2 (odd) flags are stored at $6000-$7FFF. wheel105*2*38 is 8190 bytes (+2=8192)
10 Init 2x wheel105 via 15 DATA bytes
15 Each DATA byte inits 7 flags (7 * 15 = 105). Store 2x wheels
20 Strip high bit
25 Add to itself to shift left one bit
30 Set up a string descriptor at $300 and set string dest ptr to FLAGS%+105+105
35 Poke a 16-bit word into memory
40 Init the flags array via 38x 210-byte string copies using descriptor at $300
45 Mark first 4x flags as primes (2,3,5,7)
50 Init SQ = previous NUM squared. SRC = ptr to previous NUM
55 Check odd numbers 11 to 127 for primes
60 Update NUM squared and the ptr to the FLAGS num to check
65 If FLAGS entry for this odd NUM is not prime, then try the next
70 Found a prime. Zero flags from SQ to the end
75 Beep to show calc is finished. Then print 2 as the only even prime
80 For each prime, print count & prime
85 Loop through all flag bytes
90 Wheel105 constants: 3*5*7 bits stored 7 bits per byte
95 String descriptor at $300=len:210,ptr:$6002. String dest ptr @ ZP $71 = $60D4

Note: The string descriptor is at $00-$02 in the Applesoft sieve.

To compile the Beagle Basic version:
1) Run Compiler.System to launch the Beagle Basic interpreter
2) "-compiler" to load the basic compiler
3) "compile sieve,sieve.com" to compile applesoft into Beagle Basic tokenized format
4) "-compiler.system" to reload the interpreter, freeing memory used by the compiler
5) "-sieve.com" to run the compiled sieve via the Beagle Basic interpreter

Have fun,
-JB
@JBrooksBSI
barrym95838
2018-04-01 20:05:00 UTC
Permalink
Post by John Brooks
5 F = 6 * 4096:D = F + 2
10 FOR I = 1 TO 15: READ W
15 FOR J = 1 TO 7: POKE D,W: POKE D + 105,W
20 IF W > 127 THEN W = W - 128
25 W = W + W:D = D + 1: NEXT J,I
30 FOR I = 1 TO 3: READ D,L,H
35 POKE D,L: POKE D + 1,H: NEXT I
40 FOR I = 1 TO 38: POKE 171,0: POKE 172,0: CALL - 6700: NEXT
45 FOR I = 0 TO 3: POKE F + I,128: NEXT
50 SQ = 9 * 9 / 2 + F:S = 9 / 2 + F
55 FOR N = 11 TO 127 STEP 2
60 SQ = SQ - 2 + N + N:S = S + 1
65 IF PEEK (S) < 128 THEN NEXT : GOTO 75
70 FOR D = SQ TO 32767 STEP N: POKE D,0: NEXT : NEXT
75 PRINT CHR$ (7)"1) 2":C = 2: FOR I = 1 TO 8191
80 IF PEEK (F + I) > 127 THEN PRINT C") "I + I + 1:C = C + 1
85 NEXT
90 DATA 26,166,44,146,150,40,138,180,36,154,50,44,152,182,12
95 DATA 0,210,0,1,2,96,113,212,96
save sieve.bas
I'd like to move the sieve buffer into a hi-res range so I could view
a "live animation" of the routine's progress, but it looks to be a
bit more work than I can afford at the moment ... the "honey-do" list
is looming large, and must take priority!

P.S. I know that the "wheel methods" are great for improving performance,
but they seem to be a bit "shady" IMO. Heck, what's to stop a person
from "wheeling" just about everything, and actually "sieving" next to
nothing? The next step after that is 1,900 PRINT statements, right?

Mike B.
Antoine Vignau
2018-04-01 21:28:55 UTC
Permalink
Post by barrym95838
Post by John Brooks
5 F = 6 * 4096:D = F + 2
10 FOR I = 1 TO 15: READ W
15 FOR J = 1 TO 7: POKE D,W: POKE D + 105,W
20 IF W > 127 THEN W = W - 128
25 W = W + W:D = D + 1: NEXT J,I
30 FOR I = 1 TO 3: READ D,L,H
35 POKE D,L: POKE D + 1,H: NEXT I
40 FOR I = 1 TO 38: POKE 171,0: POKE 172,0: CALL - 6700: NEXT
45 FOR I = 0 TO 3: POKE F + I,128: NEXT
50 SQ = 9 * 9 / 2 + F:S = 9 / 2 + F
55 FOR N = 11 TO 127 STEP 2
60 SQ = SQ - 2 + N + N:S = S + 1
65 IF PEEK (S) < 128 THEN NEXT : GOTO 75
70 FOR D = SQ TO 32767 STEP N: POKE D,0: NEXT : NEXT
75 PRINT CHR$ (7)"1) 2":C = 2: FOR I = 1 TO 8191
80 IF PEEK (F + I) > 127 THEN PRINT C") "I + I + 1:C = C + 1
85 NEXT
90 DATA 26,166,44,146,150,40,138,180,36,154,50,44,152,182,12
95 DATA 0,210,0,1,2,96,113,212,96
save sieve.bas
I'd like to move the sieve buffer into a hi-res range so I could view
a "live animation" of the routine's progress, but it looks to be a
bit more work than I can afford at the moment ... the "honey-do" list
is looming large, and must take priority!
P.S. I know that the "wheel methods" are great for improving performance,
but they seem to be a bit "shady" IMO. Heck, what's to stop a person
from "wheeling" just about everything, and actually "sieving" next to
nothing? The next step after that is 1,900 PRINT statements, right?
Mike B.
2 HGR
5 F = 2 * 4096 : D = F + 2
...should do the trick,
av
barrym95838
2018-04-01 22:20:28 UTC
Permalink
Post by Antoine Vignau
2 HGR
5 F = 2 * 4096 : D = F + 2
...should do the trick,
av
Yeah, I thought the same, until I saw that "32767" in line 70.
That "CALL - 6700" in line 40 may have mysterious side-effects also.

Mike B.
Antoine Vignau
2018-04-01 22:34:11 UTC
Permalink
Post by barrym95838
Post by Antoine Vignau
2 HGR
5 F = 2 * 4096 : D = F + 2
...should do the trick,
av
Yeah, I thought the same, until I saw that "32767" in line 70.
That "CALL - 6700" in line 40 may have mysterious side-effects also.
Mike B.
The two lines show poor results, that's right.

-6700 = $E5D4 which is a ROM routine that copies strings, it is called MOVINS. On entry, you have the source pointer at $AB..$AC (poke 171...), and on exit, the next string pointed at $71..$72 (113..114)

av
barrym95838
2018-04-01 22:26:46 UTC
Permalink
Post by John Brooks
...
40 FOR I = 1 TO 38: POKE 171,0: POKE 172,0: CALL - 6700: NEXT
...
...
...
John, could you explain why you're poking 172,0 for Applesoft and
172,3 for Beagle BASIC? Typo?

Mike B.
John Brooks
2018-04-01 23:15:22 UTC
Permalink
Post by barrym95838
Post by John Brooks
5 F = 6 * 4096:D = F + 2
10 FOR I = 1 TO 15: READ W
15 FOR J = 1 TO 7: POKE D,W: POKE D + 105,W
20 IF W > 127 THEN W = W - 128
25 W = W + W:D = D + 1: NEXT J,I
30 FOR I = 1 TO 3: READ D,L,H
35 POKE D,L: POKE D + 1,H: NEXT I
40 FOR I = 1 TO 38: POKE 171,0: POKE 172,0: CALL - 6700: NEXT
45 FOR I = 0 TO 3: POKE F + I,128: NEXT
50 SQ = 9 * 9 / 2 + F:S = 9 / 2 + F
55 FOR N = 11 TO 127 STEP 2
60 SQ = SQ - 2 + N + N:S = S + 1
65 IF PEEK (S) < 128 THEN NEXT : GOTO 75
70 FOR D = SQ TO 32767 STEP N: POKE D,0: NEXT : NEXT
75 PRINT CHR$ (7)"1) 2":C = 2: FOR I = 1 TO 8191
80 IF PEEK (F + I) > 127 THEN PRINT C") "I + I + 1:C = C + 1
85 NEXT
90 DATA 26,166,44,146,150,40,138,180,36,154,50,44,152,182,12
95 DATA 0,210,0,1,2,96,113,212,96
save sieve.bas
I'd like to move the sieve buffer into a hi-res range so I could view
a "live animation" of the routine's progress, but it looks to be a
bit more work than I can afford at the moment ... the "honey-do" list
is looming large, and must take priority!
P.S. I know that the "wheel methods" are great for improving performance,
but they seem to be a bit "shady" IMO. Heck, what's to stop a person
from "wheeling" just about everything, and actually "sieving" next to
nothing? The next step after that is 1,900 PRINT statements, right?
Mike B.
.
Post by barrym95838
I'd like to move the sieve buffer into a hi-res range so I could view
a "live animation" of the routine's progress
Good idea. Here is an Applesoft version that stores the 8192 flag bytes on hires page 2 $4000-$5FFF. Press a key after the beep to switch to text mode and print out the 1900 primes.

5 F = 4 * 4096:D = F + 2: HGR2
10 FOR I = 1 TO 15: READ W
15 FOR J = 1 TO 7: POKE D,W: POKE D + 105,W
20 IF W > 127 THEN W = W - 128
25 W = W + W:D = D + 1: NEXT J,I
30 FOR I = 1 TO 3: READ D,L,H
35 POKE D,L: POKE D + 1,H: NEXT I
40 FOR I = 1 TO 38: POKE 171,0: POKE 172,0: CALL - 6700: NEXT
45 FOR I = 0 TO 3: POKE F + I,128: NEXT
50 SQ = 9 * 9 / 2 + F:S = 9 / 2 + F
55 FOR N = 11 TO 127 STEP 2
60 SQ = SQ - 2 + N + N:S = S + 1
65 IF PEEK (S) < 128 THEN NEXT : GOTO 75
70 FOR D = SQ TO 24576 STEP N: POKE D,0: NEXT : NEXT
75 PRINT CHR$ (7)"1) 2":C = 2: GET A$: TEXT : FOR I = 1 TO 8191
80 IF PEEK (F + I) > 127 THEN PRINT C") "I + I + 1:C = C + 1
85 NEXT
90 DATA 26,166,44,146,150,40,138,180,36,154,50,44,152,182,12
95 DATA 0,210,0,1,2,64,113,212,64
Post by barrym95838
P.S. I know that the "wheel methods" are great for improving performance,
but they seem to be a bit "shady" IMO. Heck, what's to stop a person
from "wheeling" just about everything, and actually "sieving" next to
nothing?
Yes, the ultimate speed up is to just precompute all the primes. In practice, all the 'fast SoE' routines use wheel optimizations, particularly wheel2 to halve the memory size.

It's less of a win to use wheel optimizations above 105 as 3*5*7*11 is wheel1155 and 3*5*7*11*13 is wheel15015, so it becomes prohibitively large to store the wheel data.
Post by barrym95838
John, could you explain why you're poking 172,0 for Applesoft and 172,3 for Beagle BASIC?
Sure. The Applesoft interpreter zeroes the string descriptor ZP ptr at $AB/$AC, so the descriptor must be stored at $00-$02. But the Beagle Basic interpreter uses ZP $00, so the string descriptor is moved to $300 for BB.

-JB
@JBrooksBSI
qkumba
2018-04-02 00:34:20 UTC
Permalink
I have no opportunity to try at the moment, but at a glance, it looks like the flags init code in the assembler version could be preset in the code, since the original code at $208x is never executed from that address.
John Brooks
2018-04-02 03:33:51 UTC
Permalink
Post by qkumba
I have no opportunity to try at the moment, but at a glance, it looks like the flags init code in the assembler version could be preset in the code, since the original code at $208x is never executed from that address.
The entire sieve routine runs from zero page. The 7x STA abs,Y at $207E-$2096 are self-modified at $2058 (high byte) and $2066 (low byte). Those 7x stores init the 8K flags array at $6000-$7FFF.

The self-mod addresses are updated every ~128x INY (low adr mod only if no page cross) using a single lazy-update-check every 7x flags.

BTW: The disasm post above incorrectly duplicated lines $209E and $20C3. Ignore the 2nd line.

-JB
jbr...@blueshiftinc.com
2023-11-25 19:09:59 UTC
Permalink
Here are two new versions which also display the prime numbers.

This one prints primes less than 2048. All primes will fit on one 80x24 screen:
Sieve65.2K
8100:A2 9B BD 79 81 95 39 CA D0 F8 A0 00 A2 0E A9 76
8110:20 6F 00 A9 B2 20 77 81 A9 A0 20 77 81 A0 00 A9
8120:03 84 02 C8 F8 18 D0 30 AA A5 02 69 00 85 02 8A
8130:C8 D0 25 EE 5A 81 10 20 D8 60 D8 A6 02 F0 04 8A
8140:20 65 81 A5 01 20 65 81 A9 A0 20 77 81 A5 01 F8
8150:18 69 02 B0 D3 C8 F0 DB BE 00 7C 10 F4 85 01 30
8160:D9 E8 30 10 60 85 00 4A 4A 4A 4A 20 72 81 A5 00
8170:29 0F F0 ED 09 B0 AA 4C ED FD C2 2C A6 4A 4C A6
8180:88 6C 22 CA 64 A4 CA 68 86 85 71 85 75 85 79 85
8190:7D 85 81 85 85 85 89 86 70 E8 86 74 E8 86 78 E8
81A0:86 7C E8 86 80 E8 86 84 E8 86 88 A2 0E B5 3A 99
81B0:00 7C 0A 99 01 7C 0A 99 02 7C 0A 99 03 7C 0A 99
81C0:04 7C 0A 99 05 7C 0A 99 06 7C 98 18 69 07 A8 CA
81D0:10 DB 0A 90 D6 4A A8 A5 70 49 80 AA 30 B9 A5 71
81E0:69 01 10 A5 A0 0B A2 7C A9 3C 85 CD E6 AF 2C 04
81F0:7C 10 15 86 C7 84 C0 86 BE 18 85 BD 8C 00 FF 69
8200:00 90 F7 E8 10 F1 A2 00 C8 98 C8 0A 69 00 90 DA
8210:E8 10 D7 60

This version prints primes less than $F000 (61,440), with one prime per line.
Sieve65.66K
8100:A2 9B BD 85 81 95 39 CA D0 F8 A0 00 A2 0E A9 76
8110:20 6F 00 A9 B2 20 83 81 A9 8D 20 83 81 A0 00 A9
8120:03 84 02 84 03 C8 F8 18 D0 3A AA A5 02 69 00 85
8130:02 90 03 E6 03 18 8A C8 D0 2A EE 66 81 10 25 D8
8140:60 D8 A6 03 F0 04 8A 20 80 81 A5 02 20 71 81 A5
8150:01 20 71 81 A9 8D 20 83 81 A5 01 F8 18 69 02 B0
8160:C9 C8 F0 D6 BE 00 08 10 F4 85 01 30 D4 E8 30 10
8170:60 85 00 4A 4A 4A 4A 20 7E 81 A5 00 29 0F F0 ED
8180:09 B0 AA 4C ED FD C2 2C A6 4A 4C A6 88 6C 22 CA
8190:64 A4 CA 68 86 85 71 85 75 85 79 85 7D 85 81 85
81A0:85 85 89 86 70 E8 86 74 E8 86 78 E8 86 7C E8 86
81B0:80 E8 86 84 E8 86 88 A2 0E B5 3A 99 00 08 0A 99
81C0:01 08 0A 99 02 08 0A 99 03 08 0A 99 04 08 0A 99
81D0:05 08 0A 99 06 08 98 18 69 07 A8 CA 10 DB 0A 90
81E0:D6 4A A8 A5 70 49 80 AA 30 B9 A5 71 69 01 10 A5
81F0:A0 0B A2 08 A9 3C 85 CD E6 AF 2C 04 08 10 15 86
8200:C7 84 C0 86 BE 18 85 BD 8C 00 FF 69 00 90 F7 E8
8210:10 F1 A2 00 C8 98 C8 0A 69 00 90 DA E8 10 D7 60

Below is the Merlin-16 source listing.

Enjoy,
-JB

*-------------------------------
* Sieve of Eratosthenes for Apple II
* Merlin-16 v3.5.1 assembler
* 11/25/2023 by John Brooks
*-------------------------------
* Primes less than 2048 in 276 code bytes
* 17,165 cycle prime calc (8110:4C to time)
* 75,812 cycle prime w/print (8177:60 to time)
*-------------------------------
* Primes less than 66,140 in 288 code bytes
* 913,120 cycle prime calc (8110:4C to time)
* 2,118,067 cycle prime w/print (8183:60 to time)
*-------------------------------
lst on ; Merlin assembler: generate assembly listing

org $8100 ; Code execution starts at $8100

Sieve2048 equ 1 ; 0 = 66K primes, 1 = 2K primes

ZpCodeOrg equ $3a ; zero page address where Prime calc code runs

*.................
do Sieve2048
PrimeRange equ 2048 ; check integers less than 2048 for primes
OddRange equ PrimeRange/2 ; check 1024 odd integers
Flags equ $8000-OddRange ; 1024 byte-per-odd flags array ends at $8000

BcdTmp equ $00 ; holds BCD low digit while printing high digit
NumAsBcd equ $01 ; 4-digit BCD of odd-integers during print

SpaceChar equ " " ; display a space between primes
*.................
else
PrimeRange equ $f000 ; check integers less than 66,140 for primes
OddRange equ PrimeRange/2 ; check $7800 odd integers
Flags equ $8000-OddRange ; byte-per-odd flags array at $800-$8000

BcdTmp equ $00 ; holds BCD low digit while printing high digit
NumAsBcd equ $01 ; 5-digit BCD of odd-integers during print

SpaceChar equ $8D ; display one prime per line
fin
*.................

mx %11 ; Merlin: 8-bit mem/acc, 8-bit xy regs
Sieve
ldx #SieveEnd-SieveZP+1 ; Num bytes to copy to ZP. +1 for X=0 exit
CopyToZp
lda RelocToZP-1,x
sta ZpCodeOrg-1,x
dex
bne CopyToZp

ldy #0 ; y: Flags index == 0
ldx #15-1 ; x: wheel constants index
lda #%01110110 ; initial primes = .,3,5,7,.,11,13,.
jsr SetFlag1 ; find all primes
*-------------------------------
DispPrimes
lda #"2" ; display the single even prime: 2
jsr Cout
lda #SpaceChar
jsr Cout

ldy #$00
lda #$03 ; first BCD number checked is == $0003
*.................
do Sieve2048
sty NumAsBcd+1
*.................
else
sty NumAsBcd+1
sty NumAsBcd+2
fin
*.................
iny ; y: start checking number 3, Flags index (3/2 == 1)
sed ; enable 6502 BCD mode
clc ; c=0 assumed in loop
bne DispChk ; always

DispNextBcdH
tax ; save BcdL
lda NumAsBcd+1 ; BcdH++
adc #0
sta NumAsBcd+1
*.................
do Sieve2048
txa ; restore BcdL
*.................
else
bcc DispBcd5Ok
inc NumAsBcd+2
clc
DispBcd5Ok
txa ; restore BcdL
fin
*.................
iny ; check PtrL++
bne DispChk

DispNextPtrH
inc DispChk+2 ; check PtrH++
bpl DispChk
DispExit
cld ; disable BCD mode
rts

DispBcd
cld ; disable BCD mode during Cout print
*.................
do Sieve2048
ldx NumAsBcd+1 ; set X<$80 to skip printing leading zero digits beq DispSkip00
txa ; non-zero in top two BCD digits, print them
*.................
else
ldx NumAsBcd+2 ; set X<$80 to skip printing leading zero digits
beq DispNoBcd5
txa
jsr DispDigit
DispNoBcd5
lda NumAsBcd+1
fin
*.................
jsr DispByte
DispSkip00
lda NumAsBcd ; print low two BCD digits
jsr DispByte
lda #SpaceChar ; print space
jsr Cout

lda NumAsBcd ; acc: BcdL
sed ; 6502 BCD mode enabled
clc ; loop assumes c=0

DispNext
adc #2 ; check BcdL += 2
bcs DispNextBcdH
iny ; check PtrL++
beq DispNextPtrH

DispChk ldx Flags,y ; self-mod PtrH
bpl DispNext ; branch if not prime
sta NumAsBcd ; save acc:BCD for printing
bmi DispBcd ; always

*-------------------------------
DispZero
inx ; x: > 128 if a non-zero digit has printed
bmi DispDigit
rts ; skip leading zeroes
*-------------------------------
DispByte
sta BcdTmp ; save BCD low digit
lsr ; shift BCD high to low
lsr
lsr
lsr
jsr ChkZero
lda BcdTmp ; get low digit
and #$0f
ChkZero
beq DispZero
DispDigit
ora #"0" ; make ascii 0-9
tax ; disable zero skipping for the rest of the number

Cout jmp $fded ; Apple II ROM character output routine
*-------------------------------

RelocToZP
org ZpCodeOrg
SieveZP

* wheel of primes for odd integers less than 2*3*5*7 (210 integers)
* stored as 15 * 7 bits (105 bits for the odd integers < 2*3*5*7)
Wheel210
db %11000010 ; 197,199,...,...,...,...,209,0
db %00101100 ; ...,...,187,...,191,193,...,0
db %10100110 ; 169,...,173,...,...,179,181,0
db %01001010 ; ...,157,...,...,163,...,167,0
db %01001100 ; ...,143,...,...,149,151,...,0
db %10100110 ; 127,...,131,...,...,137,139,0
db %10001000 ; 113,...,...,...,121,...,...,0
db %01101100 ; ...,101,103,...,107,109,...,0
db %00100010 ; ...,..., 89,...,...,..., 97,0
db %11001010 ; 71, 73,...,..., 79,..., 83,0
db %01100100 ; ..., 59, 61,...,..., 67,...,0
db %10100100 ; 43,..., 47,...,..., 53,...,0
db %11001010 ; 29, 31,...,..., 37,..., 41,0
db %01101000 ; ..., 17, 19,..., 23,...,...,0
db %10000110 ; 1,...,...,...,..., 11, 13,0

*-------------------------------

SetFlagPtrH ; self-mod writes to the Flags array (PtrH)
sta SetFlag1+2
sta SetFlag2+2
sta SetFlag3+2
sta SetFlag4+2
sta SetFlag5+2
sta SetFlag6+2
sta SetFlag7+2
SetFlagPtrL ; self-mod writes to the Flags array (PtrL)
stx SetFlag1+1
inx
stx SetFlag2+1
inx
stx SetFlag3+1
inx
stx SetFlag4+1
inx
stx SetFlag5+1
inx
stx SetFlag6+1
inx
stx SetFlag7+1

*-------------------------------

DoWheel210
ldx #15-1 ; load 15 7-bit wheel constants
DoWheelByte
lda Wheel210,x ; acc: 7 bits of wheel constants
; set bit 7 of Flags: 1=check for prime, 0=not prime
SetFlag1 sta Flags,y
asl
SetFlag2 sta Flags+1,y
asl
SetFlag3 sta Flags+2,y
asl
SetFlag4 sta Flags+3,y
asl
SetFlag5 sta Flags+4,y
asl
SetFlag6 sta Flags+5,y
asl
SetFlag7 sta Flags+6,y

tya ; y: Flags index += 7
clc
adc #7
tay
dex ; x: next wheel constant
bpl DoWheelByte ; loop for 15 wheel bytes
asl
bcc DoWheel210 ; loop while y:FlagsIndex < 128

lsr ; y: Flags index &= $7F to avoid y overflow
tay
lda SetFlag1+1 ; Flags PtrL += $80 to avoid y overflow
eor #$80
tax
bmi SetFlagPtrL ; Update 7 Flags PtrL

lda SetFlag1+2 ; PtrH++
adc #1 ; C=0 from lsr above
bpl SetFlagPtrH ; Update 7 Flags PtrL & PtrH

*-------------------------------
* Check Flags starting at number 11. Wheel210 has excluded multiples of 3,5,7
* Start excluding Flags at Prime squared, 11*11
ldy #11 ; y: Prime check = 11
ldx #>11*11/2+Flags ; xa: Prime squared = 11^2. Div2 for only-odd
lda #<11*11/2+Flags

ChkPrime
sta ModSq+1 ; save acc
inc ModChkPtr+1 ; ++FlagsPtr
ModChkPtr bit 11/2+Flags-1 ;acc: OddPrimeFlag
bpl ChkNext ; branch if not prime, ie < 128

*-------------------------------
* Exclude multiples of the found prime
* Start at prime^2 (ptr in xa)

stx ModExcOk+1 ; save x:Flags PtrH
sty ModExcInc+1 ; set stride to exclude Flags
SetExcPtrH
stx ModExcPtr+2 ; set PtrH
clc ; assumes c=0 in loop
ExcLoop
sta ModExcPtr+1 ; set PtrL
ModExcPtr sty $ff00 ; exclude Flags entry via bit7=0 (y always < 128)
ModExcInc adc #0 ; step to next Flags entry to exclude
bcc ExcLoop ; exclude all flags in the page
inx ; PtrH++
bpl SetExcPtrH ; set PtrH
ModExcOk ldx #0 ; restore x: prime^2 PtrH

*-------------------------------

ChkNext
iny ; y: prime chk += 1
tya
iny ; y: prime chk += 1
asl ; incrementally update prime^2
ModSq adc #0 ; add to old xa:prime^2 ptr
bcc ChkPrime
inx ; PtrH++
bpl ChkPrime
rts ; All primes found when prime^2 ptr >= $8000
SieveEnd

lst off ; Merlin: disable listing the entire symbol table
qkumba
2023-11-25 22:04:38 UTC
Permalink
ldx #SieveEnd-SieveZP+1 ; Num bytes to copy to ZP. +1 for X=0 exit
CopyToZp
lda RelocToZP-1,x
sta ZpCodeOrg-1,x
dex
bne CopyToZp

ldy #0 ; y: Flags index == 0

->

ldy #SieveEnd-SieveZP+1 ; Num bytes to copy to ZP. +1 for X=0 exit
CopyToZp
ldx RelocToZP-1,y
stx ZpCodeOrg-1,y
dey
bne CopyToZp

; ldy #0 ; y: Flags index == 0

John Brooks
2018-03-19 17:45:38 UTC
Permalink
Post by barrym95838
Post by John Brooks
I wrote an optimized Apple IIGS program to calculate all the prime numbers
up to 16834. Finding primes via the SoE method is a common CPU benchmark
and I wanted to see how fast the GS could compute primes.
I know you're pretty darn good, but getting a 1 MHz anything to find 1899
anythings in ~59 microseconds is pure sorcery. Unless you really meant to
say "msec" ... ;-)
Mike B.
Good catch! Yes, the reported sieve times are in ms milliseconds. Got stuck in cycle-counting usec mode...
Post by barrym95838
My version is about 4000 times slower, but because it takes its time it was
able to find the 1900th prime that John's speed demon completely missed!

Optimized SoE calculators usually report zero-based prime counts. The first prime, 2, is typically implied as it is the only prime that is even. The calculation logic considers only odd numbers.

The following basic program will print out the prime numbers calculated by either the fast or small SoE routines:

10 HOME
20 PRINT "0) 2"
30 CTR = 1
40 FOR I = 1 TO 8191
50 IF PEEK (24576 + I) THEN NEXT : END
60 PRINT CTR") "I + I + 1:CTR = CTR + 1
70 NEXT

The GS listing of the small SoE routine had a misalignment at $2051. Here is a correctly aligned listing:

00/2035: 3B TSC
00/2036: 85 44 STA 44
00/2038: A2 FF 7F LDX #7FFF
00/203B: 9A TXS
00/203C: A9 AA 0A LDA #0AAA
00/203F: E2 14 SEP #14
00/2041: DA PHX
00/2042: 0B PHD
00/2043: 3A DEC
00/2044: D0 FB BNE 2041 {-05}
00/2046: 0B PHD
00/2047: 68 PLA
00/2048: A9 02 60 LDA #6002
00/204B: 1B TCS
00/204C: A9 0C 60 LDA #600C
00/204F: 85 48 STA 48
00/2051: A0 04 LDY #04
00/2053: C2 10 REP #10
00/2055: C8 INY
00/2056: 84 46 STY 46
00/2058: BA TSX
00/2059: 1B TCS
00/205A: 08 PHP
00/205B: 65 46 ADC 46
00/205D: 10 FA BPL 2059 {-06}
00/205F: 9A TXS
00/2060: 88 DEY
00/2061: C8 INY
00/2062: C8 INY
00/2063: 98 TYA
00/2064: 0A ASL
00/2065: 65 48 ADC 48
00/2067: 30 07 BMI 2070 {+07}
00/2069: 85 48 STA 48
00/206B: AB PLB
00/206C: D0 F3 BNE 2061 {-0D}
00/206E: 80 E5 BRA 2055 {-1B}
00/2070: A5 44 LDA 44
00/2072: 1B TCS
00/2073: 4B PHK
00/2074: AB PLB
00/2075: 58 CLI
00/2076: 60 RTS

-JB
@JBrooksBSI
Chris Rupnik
2018-03-21 23:29:16 UTC
Permalink
Post by John Brooks
I wrote an optimized Apple IIGS program to calculate all the prime numbers up to 16834. Finding primes via the SoE method is a common CPU benchmark and I wanted to see how fast the GS could compute primes.
The program is 245 bytes long, consisting of the sieve routine at 192 bytes, and a 53 byte test program which runs the sieve 100 times and prints out the number of primes found: 1899.
00/2000:A2 64 18 FB C2 30 86 40 20 35 20 C6 40 D0 F9 38
00/2010:FB 20 3A FF 18 FB C2 30 78 BA 9B A9 00 60 1B 7B
00/2020:AB D0 01 1A BA 10 F9 BB 9A 4B AB AA EB 58 FB 20
00/2030:24 ED 4C 8E FD 0B 3B 8D EC 20 A9 FF 7F 1B A2 4E
00/2040:00 86 42 7B 1A AA EB A8 7B 3A 5A 8B 48 48 0B 5A
00/2050:DA DA 5A DA 5A 5A DA DA 5A 48 0B 48 DA 0B 48 DA
00/2060:DA 5A 48 5A 5A DA 0B 5A 48 5A 48 DA DA 5A DA DA
00/2070:5A DA 5A 48 DA 0B 5A 48 0B 48 DA 0B 5A DA 48 C6
00/2080:42 D0 C7 7A 0B 0B E2 14 A9 00 20 5B A9 05 60 1B
00/2090:A9 3C 60 A0 0A 85 E2 C8 84 AE 84 B3 84 B8 84 BD
00/20A0:84 C2 84 C7 84 CC 84 D1 C2 10 BA 1B 08 69 00 00
00/20B0:1B 08 69 00 00 1B 08 69 00 00 1B 08 69 00 00 1B
00/20C0:08 69 00 00 1B 08 69 00 00 1B 08 69 00 00 1B 08
00/20D0:69 00 00 10 D6 9A E2 10 C8 10 04 C8 C8 30 0C 98
00/20E0:0A 69 00 00 85 E2 AB D0 F2 80 AC A9 00 00 1B 2B
00/20F0:4B AB C2 34 60
The sieve routine at $2035 is called in native 65816 mode with 16-bit AXY, B=0, D=0.
Enjoy.
-JB
@JBrooksBSI
00/2000: A2 64 LDX #64
00/2002: 18 CLC
00/2003: FB XCE
00/2004: C2 30 REP #30
00/2006: 86 40 STX 40
00/2008: 20 35 20 JSR 2035
00/200B: C6 40 DEC 40
00/200D: D0 F9 BNE 2008 {-07}
00/200F: 38 SEC
00/2010: FB XCE
00/2011: 20 3A FF JSR FF3A
00/2014: 18 CLC
00/2015: FB XCE
00/2016: C2 30 REP #30
00/2018: 78 SEI
00/2019: BA TSX
00/201A: 9B TXY
00/201B: A9 00 60 LDA #6000
00/201E: 1B TCS
00/201F: 7B TDC
00/2020: AB PLB
00/2021: D0 01 BNE 2024 {+01}
00/2023: 1A INC
00/2024: BA TSX
00/2025: 10 F9 BPL 2020 {-07}
00/2027: BB TYX
00/2028: 9A TXS
00/2029: 4B PHK
00/202A: AB PLB
00/202B: AA TAX
00/202C: EB XBA
00/202D: 58 CLI
00/202E: FB XCE
00/202F: 20 24 ED JSR ED24
00/2032: 4C 8E FD JMP FD8E
00/2035: 0B PHD
00/2036: 3B TSC
00/2037: 8D EC 20 STA 20EC
00/203A: A9 FF 7F LDA #7FFF
00/203D: 1B TCS
00/203E: A2 4E 00 LDX #004E
00/2041: 86 42 STX 42
00/2043: 7B TDC
00/2044: 1A INC
00/2045: AA TAX
00/2046: EB XBA
00/2047: A8 TAY
00/2048: 7B TDC
00/2049: 3A DEC
00/204A: 5A PHY
00/204B: 8B PHB
00/204C: 48 PHA
00/204D: 48 PHA
00/204E: 0B PHD
00/204F: 5A PHY
00/2050: DA PHX
00/2051: DA PHX
00/2052: 5A PHY
00/2053: DA PHX
00/2054: 5A PHY
00/2055: 5A PHY
00/2056: DA PHX
00/2057: DA PHX
00/2058: 5A PHY
00/2059: 48 PHA
00/205A: 0B PHD
00/205B: 48 PHA
00/205C: DA PHX
00/205D: 0B PHD
00/205E: 48 PHA
00/205F: DA PHX
00/2060: DA PHX
00/2061: 5A PHY
00/2062: 48 PHA
00/2063: 5A PHY
00/2064: 5A PHY
00/2065: DA PHX
00/2066: 0B PHD
00/2067: 5A PHY
00/2068: 48 PHA
00/2069: 5A PHY
00/206A: 48 PHA
00/206B: DA PHX
00/206C: DA PHX
00/206D: 5A PHY
00/206E: DA PHX
00/206F: DA PHX
00/2070: 5A PHY
00/2071: DA PHX
00/2072: 5A PHY
00/2073: 48 PHA
00/2074: DA PHX
00/2075: 0B PHD
00/2076: 5A PHY
00/2077: 48 PHA
00/2078: 0B PHD
00/2079: 48 PHA
00/207A: DA PHX
00/207B: 0B PHD
00/207C: 5A PHY
00/207D: DA PHX
00/207E: 48 PHA
00/207F: C6 42 DEC 42
00/2081: D0 C7 BNE 204A {-39}
00/2083: 7A PLY
00/2084: 0B PHD
00/2085: 0B PHD
00/2086: E2 14 SEP #14
00/2088: A9 00 20 LDA #2000
00/208B: 5B TCD
00/208C: A9 05 60 LDA #6005
00/208F: 1B TCS
00/2090: A9 3C 60 LDA #603C
00/2093: A0 0A LDY #0A
00/2095: 85 E2 STA E2
00/2097: C8 INY
00/2098: 84 AE STY AE
00/209A: 84 B3 STY B3
00/209C: 84 B8 STY B8
00/209E: 84 BD STY BD
00/20A0: 84 C2 STY C2
00/20A2: 84 C7 STY C7
00/20A4: 84 CC STY CC
00/20A6: 84 D1 STY D1
00/20A8: C2 10 REP #10
00/20AA: BA TSX
00/20AB: 1B TCS
00/20AC: 08 PHP
00/20AD: 69 00 00 ADC #0000
00/20B0: 1B TCS
00/20B1: 08 PHP
00/20B2: 69 00 00 ADC #0000
00/20B5: 1B TCS
00/20B6: 08 PHP
00/20B7: 69 00 00 ADC #0000
00/20BA: 1B TCS
00/20BB: 08 PHP
00/20BC: 69 00 00 ADC #0000
00/20BF: 1B TCS
00/20C0: 08 PHP
00/20C1: 69 00 00 ADC #0000
00/20C4: 1B TCS
00/20C5: 08 PHP
00/20C6: 69 00 00 ADC #0000
00/20C9: 1B TCS
00/20CA: 08 PHP
00/20CB: 69 00 00 ADC #0000
00/20CE: 1B TCS
00/20CF: 08 PHP
00/20D0: 69 00 00 ADC #0000
00/20D3: 10 D6 BPL 20AB {-2A}
00/20D5: 9A TXS
00/20D6: E2 10 SEP #10
00/20D8: C8 INY
00/20D9: 10 04 BPL 20DF {+04}
00/20DB: C8 INY
00/20DC: C8 INY
00/20DD: 30 0C BMI 20EB {+0C}
00/20DF: 98 TYA
00/20E0: 0A ASL
00/20E1: 69 00 00 ADC #0000
00/20E4: 85 E2 STA E2
00/20E6: AB PLB
00/20E7: D0 F2 BNE 20DB {-0E}
00/20E9: 80 AC BRA 2097 {-54}
00/20EB: A9 00 00 LDA #0000
00/20EE: 1B TCS
00/20EF: 2B PLD
00/20F0: 4B PHK
00/20F1: AB PLB
00/20F2: C2 34 REP #34
00/20F4: 60 RTS
Hi John,
How are you calculating the time it takes to compute the result? I entered this program on my accelerated IIGS and whether i use the "FAST" or "Normal" Speed setting, from the time of 2000G command to execute it to the time to the putput of '1899' seems the same.

Am I doing something wrong? I was hoping to get a different result.

Chris
John Brooks
2018-03-22 11:21:17 UTC
Permalink
Post by Chris Rupnik
Hi John,
How are you calculating the time it takes to compute the result? I entered this program on my accelerated IIGS and whether i use the "FAST" or "Normal" Speed setting, from the time of 2000G command to execute it to the time to the putput of '1899' seems the same.
Am I doing something wrong? I was hoping to get a different result.
Chris
Hi Chris. I am timing by increasing the loop ctr to 16 bits and measuring with a stopwatch.

For 1000 loops of the sieve calc, use this patch:
1FFF:18 FB C2 30 A2 E8 03
1FFFG

For 10,000 loops:
1FFF:18 FB C2 30 A2 10 27
1FFFG

-JB
@JBrooksBSI
Chris Rupnik
2018-03-22 18:48:43 UTC
Permalink
Post by John Brooks
Post by Chris Rupnik
Hi John,
How are you calculating the time it takes to compute the result? I entered this program on my accelerated IIGS and whether i use the "FAST" or "Normal" Speed setting, from the time of 2000G command to execute it to the time to the putput of '1899' seems the same.
Am I doing something wrong? I was hoping to get a different result.
Chris
Hi Chris. I am timing by increasing the loop ctr to 16 bits and measuring with a stopwatch.
1FFF:18 FB C2 30 A2 E8 03
1FFFG
1FFF:18 FB C2 30 A2 10 27
1FFFG
-JB
@JBrooksBSI
On my 11GS - the 1000 loop runs in 13 seconds, either in FAST or NORMAL.
That is not what i was expecting. What do you get for the 1000 loop total time?
John Brooks
2018-03-23 04:00:48 UTC
Permalink
Post by Chris Rupnik
Post by John Brooks
Post by Chris Rupnik
Hi John,
How are you calculating the time it takes to compute the result? I entered this program on my accelerated IIGS and whether i use the "FAST" or "Normal" Speed setting, from the time of 2000G command to execute it to the time to the putput of '1899' seems the same.
Am I doing something wrong? I was hoping to get a different result.
Chris
Hi Chris. I am timing by increasing the loop ctr to 16 bits and measuring with a stopwatch.
1FFF:18 FB C2 30 A2 E8 03
1FFFG
1FFF:18 FB C2 30 A2 10 27
1FFFG
-JB
@JBrooksBSI
On my 11GS - the 1000 loop runs in 13 seconds, either in FAST or NORMAL.
That is not what i was expecting. What do you get for the 1000 loop total time?
I'm at GDC this week, but I'll run it on my 9 MHz ZipGS this weekend and post numbers.

IIRC, normal speed (1.0 MHz) was 24 seconds for 1000 iterations, and fast speed (2.8 MHz) was 60 seconds for 1000 iterations.

-JB
@JBrooksBSI
Michael J. Mahon
2018-03-24 03:12:34 UTC
Permalink
Post by John Brooks
Post by Chris Rupnik
Post by John Brooks
Post by Chris Rupnik
Hi John,
How are you calculating the time it takes to compute the result? I
entered this program on my accelerated IIGS and whether i use the
"FAST" or "Normal" Speed setting, from the time of 2000G command to
execute it to the time to the putput of '1899' seems the same.
Am I doing something wrong? I was hoping to get a different result.
Chris
Hi Chris. I am timing by increasing the loop ctr to 16 bits and
measuring with a stopwatch.
1FFF:18 FB C2 30 A2 E8 03
1FFFG
1FFF:18 FB C2 30 A2 10 27
1FFFG
-JB
@JBrooksBSI
On my 11GS - the 1000 loop runs in 13 seconds, either in FAST or NORMAL.
That is not what i was expecting. What do you get for the 1000 loop total time?
I'm at GDC this week, but I'll run it on my 9 MHz ZipGS this weekend and post numbers.
IIRC, normal speed (1.0 MHz) was 24 seconds for 1000 iterations, and fast
speed (2.8 MHz) was 60 seconds for 1000 iterations.
-JB
@JBrooksBSI
...or the other way around!
--
-michael - NadaNet 3.1 and AppleCrate II: http://michaeljmahon.com
Chris Rupnik
2018-03-24 19:39:35 UTC
Permalink
Post by John Brooks
Post by Chris Rupnik
Post by John Brooks
Post by Chris Rupnik
Hi John,
How are you calculating the time it takes to compute the result? I entered this program on my accelerated IIGS and whether i use the "FAST" or "Normal" Speed setting, from the time of 2000G command to execute it to the time to the putput of '1899' seems the same.
Am I doing something wrong? I was hoping to get a different result.
Chris
Hi Chris. I am timing by increasing the loop ctr to 16 bits and measuring with a stopwatch.
1FFF:18 FB C2 30 A2 E8 03
1FFFG
1FFF:18 FB C2 30 A2 10 27
1FFFG
-JB
@JBrooksBSI
On my 11GS - the 1000 loop runs in 13 seconds, either in FAST or NORMAL.
That is not what i was expecting. What do you get for the 1000 loop total time?
I'm at GDC this week, but I'll run it on my 9 MHz ZipGS this weekend and post numbers.
IIRC, normal speed (1.0 MHz) was 24 seconds for 1000 iterations, and fast speed (2.8 MHz) was 60 seconds for 1000 iterations.
-JB
@JBrooksBSI
On my accelerated GS - i got 13 seconds for 1000 iterations.
On my regular GS - i would get 24 seconds in FAST mode - and 59 seconds in NORMAL mode.
Tom Porter
2018-03-22 09:28:20 UTC
Permalink
This is what I use to list a program instead of
]POKE33,33:LIST:POKE33,40

...
EXEC FILE (LIST)- works great in APPLEWIN's Printer File

PR#3
PR#1
POKE 33,66
LIST
PR#3

As for making faster primes, I'd just slow it down!
Brian Patrie
2018-03-24 06:15:33 UTC
Permalink
Post by Tom Porter
This is what I use to list a program instead of
]POKE33,33:LIST:POKE33,40
....
EXEC FILE (LIST)- works great in APPLEWIN's Printer File
PR#3
PR#1
POKE 33,66
LIST
PR#3
As for making faster primes, I'd just slow it down!
32 or 64 for width is more reliable. If conditions are just right, an
unwanted cr can slip in with 33 (and probably 66). Harmless if all you
need is more convenient right-arrow editing; but a problem when writing
to a file, or printer.


FWIW, here's my (kinda ridiculously fancy) LLIST exec file:

:L=PEEK(32):W=PEEK(33):POKE32,0:POKE33,32:PR#1:CALL-998:LIST
:CALL-998:CALL-958:CALL-756+588*(W<41):POKE33,W:POKE32,L:CALL976

(all on one line). The CALL 976 is the reconnect (Pro)DOS call at $3D0,
since i'm not using the (Pro)DOS PR#1. -998 is cursor up. The
-756+588*(W<41) dance conditionally calls RDKEY, to absorb an extra cr
that happens in 80 column mode. (Yeah, a lot of fuss just to keep the
screen relatively pretty; but it was entertaining to write.) :)
d***@yahoo.com
2018-03-30 15:41:47 UTC
Permalink
Post by John Brooks
I wrote an optimized Apple IIGS program to calculate all the prime numbers up to 16834. Finding primes via the SoE method is a common CPU benchmark and I wanted to see how fast the GS could compute primes.
The program is 245 bytes long, consisting of the sieve routine at 192 bytes, and a 53 byte test program which runs the sieve 100 times and prints out the number of primes found: 1899.
John,

Can you provide a pointer to the previous benchmark program? Am I correct in reading that time as 140 ms @ 1MHz?

It's an interesting topic, the Sieve of Eratosthenes. I found Bob Sander-Cederlof's first take on it here: http://www.txbobsc.com/aal/1981/aal8110.html

Follow-ups here: http://www.txbobsc.com/aal/1982/aal8202.html#a3

And here: http://www.txbobsc.com/aal/1982/aal8211.html#a4

A comment on the original article: http://www.txbobsc.com/aal/1983/aal8303.html#a6

Comments about the 68000 processor benchmark: http://www.txbobsc.com/aal/1984/aal8407.html#a4

An update to the 6502 routine based on the 68000 algorithm: http://www.txbobsc.com/aal/1984/aal8407.html#a5

A 65802 routine is here: http://www.txbobsc.com/aal/1985/aal8509.html#a1

Feedback about the 68000 routine and informed speculation on memory handicaps: http://www.txbobsc.com/aal/1985/aal8510.html#a4

I think that was all the references I could find in Assembly Lines, and it appears as though your previous fastest benchmark is based on Bob's 65802 results. Is that correct?

I can't profess to understanding much of it, but it's short enough that I might, with some effort, eventually figure it out! ;^)

Thanks for the post. I led to a fascinating excursion down memory lane. And Ulam's Spiral is pretty interesting too!

Dave Rogers
John Brooks
2018-03-30 16:57:35 UTC
Permalink
Post by d***@yahoo.com
Post by John Brooks
I wrote an optimized Apple IIGS program to calculate all the prime numbers up to 16834. Finding primes via the SoE method is a common CPU benchmark and I wanted to see how fast the GS could compute primes.
The program is 245 bytes long, consisting of the sieve routine at 192 bytes, and a 53 byte test program which runs the sieve 100 times and prints out the number of primes found: 1899.
John,
It's an interesting topic, the Sieve of Eratosthenes. I found Bob Sander-Cederlof's first take on it here: http://www.txbobsc.com/aal/1981/aal8110.html
Follow-ups here: http://www.txbobsc.com/aal/1982/aal8202.html#a3
And here: http://www.txbobsc.com/aal/1982/aal8211.html#a4
A comment on the original article: http://www.txbobsc.com/aal/1983/aal8303.html#a6
Comments about the 68000 processor benchmark: http://www.txbobsc.com/aal/1984/aal8407.html#a4
An update to the 6502 routine based on the 68000 algorithm: http://www.txbobsc.com/aal/1984/aal8407.html#a5
A 65802 routine is here: http://www.txbobsc.com/aal/1985/aal8509.html#a1
Feedback about the 68000 routine and informed speculation on memory handicaps: http://www.txbobsc.com/aal/1985/aal8510.html#a4
I think that was all the references I could find in Assembly Lines, and it appears as though your previous fastest benchmark is based on Bob's 65802 results. Is that correct?
I can't profess to understanding much of it, but it's short enough that I might, with some effort, eventually figure it out! ;^)
Thanks for the post. I led to a fascinating excursion down memory lane. And Ulam's Spiral is pretty interesting too!
Dave Rogers
.
Yes, the fastest previous version I could find was the Assembly Lines 65802 version by Bob S-C at 140ms. It uses a wheel2 for both the flags memory (storing only odd bytes) and init routines.

While traveling last week I made a fast 6502 sieve, and slightly smaller/faster versions for the 65816.

Below is the 6502 version which runs in 145ms per sieve and is 174 bytes long (228 bytes with test code). The flags data is stored wheel2 (odd only) at $6000-$7FFF with primes == bit7 set.

2000:A2 FA 86 06 20 36 20 C6 06 D0 F9 20 3A FF A9 00
2010:AA 18 A0 60 8C 19 20 2C 00 7F 10 05 E8 D0 02 69
2020:01 EE 18 20 D0 F1 C8 10 EB 20 DA FD 8A 20 DA FD
2030:20 8E FD 6C FC FF A2 9C BD 48 20 95 39 CA D0 F8
2040:A0 00 A2 0E A9 F6 4C 6F 00 C2 2C A6 4A 4C A6 88
2050:6C 22 CA 64 A4 CA 68 86 85 71 85 75 85 79 85 7D
2060:85 81 85 85 85 89 86 70 E8 86 74 E8 86 78 E8 86
2070:7C E8 86 80 E8 86 84 E8 86 88 A2 0E B5 3A 99 00
2080:60 0A 99 01 60 0A 99 02 60 0A 99 03 60 0A 99 04
2090:60 0A 99 05 60 0A 99 06 60 98 18 69 07 A8 CA 10
20A0:DB 98 10 D6 29 7F A8 A5 70 49 80 AA 30 B8 A5 71
20B0:69 01 10 A4 A2 60 A9 3C A0 0B 85 CD E6 B0 2C 04
20C0:60 10 15 86 C8 84 C1 86 BF 18 85 BE 8E 00 FF 69
20D0:00 90 F7 E8 10 F1 A2 00 C8 98 0A 69 00 90 01 E8
20E0:C8 10 D7 60

It should run on all Apple II & compatibles. I've tested it on the integer Apple II too (if you save it as a system file & launch from P8 2.4 Bitsy Bye).

It prints the prime count in hex: $076C == 1900 primes, and ends via simulated reset to set up zero page and drop to the monitor on integer Apple II machines.

For Applesoft systems, this basic program will print out all 1900 primes:

10 HOME
20 PRINT "1) 2"
30 CTR = 2
40 FOR I = 1 TO 8191
50 IF PEEK (24576 + I) < 128 THEN NEXT : END
60 PRINT CTR") "I + I + 1:CTR = CTR + 1
70 NEXT
Post by d***@yahoo.com
I can't profess to understanding much of it, but it's short enough that I might, with some effort, eventually figure it out! ;^)
I'm happy to explain how it works.

My changes are primarily to optimize the inner loops of the sieve calc - the flag clear routine, and to initialize the flags array as wheel105 (3*5*7) instead of wheel2.

A nonintuitive part of the code is incrementally calculating squares. Adding Ctr*8 to the previously calculated square gives the next square:

1*1 = 1
3*3 = 9 (1*8+1)
5*5 = 25 (2*8+9)
7*7 = 49 (3*8+25)

In my case, I increment the ctr by 2 via INY, INY, then double again to get 4x via TYA, ASL. Since the flags array is wheel2, storing only odd flags, I can skip the last double to 8x.

One a prime is found, all flags less then Prime*Prime have already been cleared by smaller primes, so skipping to Prime squared is a big speed up.

Below is the 6502 disasm.

Enjoy.
-JB
@JBrooksBSI

-- Test code --
2000- A2 64 LDX #$64
2002- 86 06 STX $06
2004- 20 36 20 JSR $2036
2007- C6 06 DEC $06
2009- D0 F9 BNE $2004
200B- 20 3A FF JSR $FF3A
200E- A9 00 LDA #$00
2010- AA TAX
2011- 18 CLC
2012- A0 60 LDY #$60
2014- 8C 19 20 STY $2019
2017- 2C 00 60 BIT $6000
201A- 10 05 BPL $2021
201C- E8 INX
201D- D0 02 BNE $2021
201F- 69 01 ADC #$01
2021- EE 18 20 INC $2018
2024- D0 F1 BNE $2017
2026- C8 INY
2027- 10 EB BPL $2014
2027- 10 EB BPL $2014
2029- 20 DA FD JSR $FDDA
202C- 8A TXA
202D- 20 DA FD JSR $FDDA
2030- 20 8E FD JSR $FD8E
2033- 6C FC FF JMP ($FFFC)

-- 6502 Sieve code --
2036- A2 9C LDX #$9C
2038- BD 48 20 LDA $2048,X
203B- 95 39 STA $39,X
203D- CA DEX
203E- D0 F8 BNE $2038
2040- A0 00 LDY #$00
2042- A2 0E LDX #$0E
2044- A9 F6 LDA #$F6
2046- 4C 6F 00 JMP $006F
-- Wheel 105 (3*5*7) constants --
2049- C2 2C A6 4A 4C A6 88
2050- 6C 22 CA 64 A4 CA 68 86
-- Flags init --
2058- 85 71 STA $71
205A- 85 75 STA $75
205C- 85 79 STA $79
205E- 85 7D STA $7D
2060- 85 81 STA $81
2062- 85 85 STA $85
2064- 85 89 STA $89
2066- 86 70 STX $70
2068- E8 INX
2069- 86 74 STX $74
206B- E8 INX
206C- 86 78 STX $78
206E- E8 INX
206F- 86 7C STX $7C
2071- E8 INX
2072- 86 80 STX $80
2074- E8 INX
2075- 86 84 STX $84
2077- E8 INX
2078- 86 88 STX $88
2078- 86 88 STX $88
207A- A2 0E LDX #$0E
207C- B5 3A LDA $3A,X
207E- 99 00 60 STA $6000,Y
2081- 0A ASL
2082- 99 01 60 STA $6001,Y
2085- 0A ASL
2086- 99 02 60 STA $6002,Y
2089- 0A ASL
208A- 99 03 60 STA $6003,Y
208D- 0A ASL
208E- 99 04 60 STA $6004,Y
2091- 0A ASL
2092- 99 05 60 STA $6005,Y
2095- 0A ASL
2096- 99 06 60 STA $6006,Y
2099- 98 TYA
209A- 18 CLC
209B- 69 07 ADC #$07
209D- A8 TAY
209E- CA DEX
209E- CA DEX
209F- 10 DB BPL $207C
20A1- 98 TYA
20A2- 10 D6 BPL $207A
20A4- 29 7F AND #$7F
20A6- A8 TAY
20A7- A5 70 LDA $70
20A9- 49 80 EOR #$80
20AB- AA TAX
20AC- 30 B8 BMI $2066
20AE- A5 71 LDA $71
20B0- 69 01 ADC #$01
20B2- 10 A4 BPL $2058
20B4- A2 60 LDX #$60
20B6- A9 3C LDA #$3C
20B8- A0 0B LDY #$0B
-- Sieve loop --
20BA- 85 CD STA $CD
20BC- E6 B0 INC $B0
20BE- 2C 04 60 BIT $6004
20C1- 10 15 BPL $20D8
20C3- 86 C8 STX $C8
20C3- 86 C8 STX $C8
20C5- 84 C1 STY $C1
20C7- 86 BF STX $BF
20C9- 18 CLC
-- Inner loop --
20CA- 85 BE STA $BE
20CC- 8E 00 FF STX $FF00
20CF- 69 00 ADC #$00
20D1- 90 F7 BCC $20CA
20D3- E8 INX
20D4- 10 F1 BPL $20C7
20D6- A2 00 LDX #$00
20D8- C8 INY
20D9- 98 TYA
20DA- 0A ASL
20DB- 69 00 ADC #$00
20DD- 90 01 BCC $20E0
20DF- E8 INX
20E0- C8 INY
20E1- 10 D7 BPL $20BA
20E3- 60 RTS
Loading...