Discussion:
ZipChip and C800.CFFF and one more thing.
(too old to reply)
Jorge
2017-09-07 11:26:49 UTC
Permalink
Raw Message
Hi,

Question #1: It does not cache that range, am I right?

#2: Say you have a zipchip @8MHz running this loop:

loop:
LDA $C040 STROBE
JMP loop

That's 4+3=7 cycles I think, the zipchip is running at 8Mhz but on the scope I see one 500ns strobe pulse every two 1MHz cycles (500 kHz). I was expecting one pulse every cycle (1 MHz) given that it can run that code in less than a 1000ns cycle. I gues it goes like this:

loop
run code until you hit a r/w that must go to main memory
wait until the next 1MHz cycle begins
do the sync r/w
goto loop

So that although it executes the JMP and the LDA in a single cycle, the next cycle is the real load and it is not executing any more code until done with that, then the loop repeats.

All the accelerators do it like that or is there a better one?

Thanks,
--
Jorge.
Ralf Kiefer
2017-09-07 12:07:53 UTC
Permalink
Raw Message
Post by Jorge
That's 4+3=7 cycles I think, the zipchip is running at 8Mhz but on the
scope I see one 500ns strobe pulse every two 1MHz cycles (500 kHz).
That's ok. The fast CPU can do 8 cycles within 1000nsec in fast memory.
But your loop has one cycle to access the motherboard which needs
500nsec. So the fast cpu can do just 5 cycles within these 1000nsec.
Post by Jorge
All the accelerators do it like that or is there a better one?
My accelerator board [1] uses a buffer when writing to the motherboard.
If there are more than 12 cycles between two write cycles to the
motherboard my fast cpu runs at full speed, otherwise the cpu must wait.

- Ralf

[1] Schaetzle&Bsteh DC65 @12.5MHz
Michael J. Mahon
2017-09-07 18:50:26 UTC
Permalink
Raw Message
Post by Ralf Kiefer
Post by Jorge
That's 4+3=7 cycles I think, the zipchip is running at 8Mhz but on the
scope I see one 500ns strobe pulse every two 1MHz cycles (500 kHz).
That's ok. The fast CPU can do 8 cycles within 1000nsec in fast memory.
But your loop has one cycle to access the motherboard which needs
500nsec. So the fast cpu can do just 5 cycles within these 1000nsec.
Post by Jorge
All the accelerators do it like that or is there a better one?
My accelerator board [1] uses a buffer when writing to the motherboard.
If there are more than 12 cycles between two write cycles to the
motherboard my fast cpu runs at full speed, otherwise the cpu must wait.
- Ralf
The Zip Chip also does write buffering, but every accelerator has to wait
for an I/O read.
--
-michael - NadaNet 3.1 and AppleCrate II: http://michaeljmahon.com
Jorge
2017-09-10 16:43:13 UTC
Permalink
Raw Message
Post by Michael J. Mahon
The Zip Chip also does write buffering, but every accelerator has to wait
for an I/O read.
So... if I just swap the LDA for an STA, then it will pulse the strobe @1Mhz?
--
Jorge
Jorge
2017-09-10 21:34:10 UTC
Permalink
Raw Message
Post by Michael J. Mahon
The Zip Chip also does write buffering, but every accelerator has to wait
for an I/O read.
Nope... no luck, it seems STA doesn't do the trick, I'm trying with this:

CC00:
STA $C040
STA $C040
JMP $2000

2000:
STA $C040
STA $C040
JMP $CC00

On the scope screen I get this https://imgur.com/a/akVH2 which is non cached at $CC00 and still 500 kHz cached @ $2000.

I use $cc00 because that's RAM if you've got a Videx Videoterm @slot #3 (and enable it via a read or write to $c300 first).
--
Jorge
Jorge
2017-09-10 16:49:02 UTC
Permalink
Raw Message
Post by Ralf Kiefer
Post by Jorge
That's 4+3=7 cycles I think, the zipchip is running at 8Mhz but on the
scope I see one 500ns strobe pulse every two 1MHz cycles (500 kHz).
That's ok. The fast CPU can do 8 cycles within 1000nsec in fast memory.
But your loop has one cycle to access the motherboard which needs
500nsec. So the fast cpu can do just 5 cycles within these 1000nsec.
Post by Jorge
All the accelerators do it like that or is there a better one?
My accelerator board [1] uses a buffer when writing to the motherboard.
If there are more than 12 cycles between two write cycles to the
motherboard my fast cpu runs at full speed, otherwise the cpu must wait.
- Ralf
Thanks Ralf, so I'll try writing instead of reading to see if it makes a difference.
--
Jorge.
geoff body
2017-09-07 12:52:00 UTC
Permalink
Raw Message
Jorge,
I think you need to consider how a normal cycle and the setup times for a read or write access occur.
Have a look at the below link.
http://laughtonelectronics.com/Arcana/Visualizing%2065xx%20Timing/Visualizing%2065xx%20CPU%20Timing.html

Applying this to your example of load and jump.
Once the code is cached, the first 3 memory access for the load are cache, then the next is an external read, so the ZIP chip has to wait for the start of the next external CPU clock cycle so that the timing requirements of the next external access can be meet for the address setup time. The data is read at the end of the external CPU clock cycle. Finally the jump has 3 memory accesses from cache.
This means the loop is 6 cache accesses and a wait for next external clock and normal read using the external CPU clock.
The accelerator has to decide which addresses can be cached and which need to be accessed using the external CPU clock.
You only see the access for 500ns due to the address decoding only occurring for the second half of the external CPU clock.

Regards
Geoff B
Jorge
2017-09-10 16:49:55 UTC
Permalink
Raw Message
Post by geoff body
Jorge,
I think you need to consider how a normal cycle and the setup times for a read or write access occur.
Have a look at the below link.
http://laughtonelectronics.com/Arcana/Visualizing%2065xx%20Timing/Visualizing%2065xx%20CPU%20Timing.html
Applying this to your example of load and jump.
Once the code is cached, the first 3 memory access for the load are cache, then the next is an external read, so the ZIP chip has to wait for the start of the next external CPU clock cycle so that the timing requirements of the next external access can be meet for the address setup time. The data is read at the end of the external CPU clock cycle. Finally the jump has 3 memory accesses from cache.
This means the loop is 6 cache accesses and a wait for next external clock and normal read using the external CPU clock.
The accelerator has to decide which addresses can be cached and which need to be accessed using the external CPU clock.
You only see the access for 500ns due to the address decoding only occurring for the second half of the external CPU clock.
Regards
Geoff B
Thanks Geoff, that page is a very, very interesting reading!
--
Jorge.
Jorge
2017-09-10 16:53:37 UTC
Permalink
Raw Message
Post by Jorge
Hi,
Question #1: It does not cache that range, am I right?
WRT to question #1: this zipchip does not seem to cache C800.CFFF (shared ROM space) nor CN00..CNFF (slot #N ROM space), is that so?
--
Jorge
Michael J. Mahon
2017-09-10 21:12:35 UTC
Permalink
Raw Message
Post by Jorge
Post by Jorge
Hi,
Question #1: It does not cache that range, am I right?
WRT to question #1: this zipchip does not seem to cache C800.CFFF (shared
ROM space) nor CN00..CNFF (slot #N ROM space), is that so?
It will cache it if it is set to *fast*.
--
-michael - NadaNet 3.1 and AppleCrate II: http://michaeljmahon.com
Jorge
2017-09-11 15:17:43 UTC
Permalink
Raw Message
Post by Michael J. Mahon
Post by Jorge
WRT to question #1: this zipchip does not seem to cache C800.CFFF (shared
ROM space) nor CN00..CNFF (slot #N ROM space), is that so?
It will cache it if it is set to *fast*.
Are you 100% sure? I can't make it work!
--
Jorge
Michael J. Mahon
2017-09-11 16:09:23 UTC
Permalink
Raw Message
Post by Jorge
Post by Michael J. Mahon
Post by Jorge
WRT to question #1: this zipchip does not seem to cache C800.CFFF (shared
ROM space) nor CN00..CNFF (slot #N ROM space), is that so?
It will cache it if it is set to *fast*.
Are you 100% sure? I can't make it work!
No--and David's response convinces me that the $Cxxx region cannot be
safely cached, because there is no way to be sure that the card developer
didn't use bank switching within the space.

Bank switching within the $C800..$CFFF range by individual cards is not
unusual, and, since it is peculiar to the card, cannot be tracked by
accelerators.

It is even possible to bank switch within the $Csxx range, though I've not
actually encountered that.
--
-michael - NadaNet 3.1 and AppleCrate II: http://michaeljmahon.com
Ralf Kiefer
2017-09-11 21:07:32 UTC
Permalink
Raw Message
Post by Michael J. Mahon
It is even possible to bank switch within the $Csxx range, though I've not
actually encountered that.
That's it. I.e. the Videx Videoterm uses 2kB of video RAM which is
bankswitched within the $C800 space.

- Ralf
Jorge
2017-09-12 07:43:12 UTC
Permalink
Raw Message
Post by Ralf Kiefer
Post by Michael J. Mahon
It is even possible to bank switch within the $Csxx range, though I've not
actually encountered that.
That's it. I.e. the Videx Videoterm uses 2kB of video RAM which is
bankswitched within the $C800 space.
In 512 byte chunks... which begs the question of what is the meaning of setting a slot to "FAST", then ?
--
Jorge
Ralf Kiefer
2017-09-12 09:31:14 UTC
Permalink
Raw Message
Post by Jorge
Post by Ralf Kiefer
Post by Michael J. Mahon
It is even possible to bank switch within the $Csxx range, though I've not
actually encountered that.
That's it. I.e. the Videx Videoterm uses 2kB of video RAM which is
bankswitched within the $C800 space.
In 512 byte chunks... which begs the question of what is the meaning of
setting a slot to "FAST", then ?

The cache controller of the accelerator board can obviously emulate the
Apple hardware. Which means the standard soft switches of an Apple II or
IIe to clear the caches when bank switching takes place, i.e. switching
$D000 from bank 1 to 2 or to ROM. But the accelerator board doesn't have
the knowledge to handle a Videx Videoterm.

When I wrote my own BIOS [1] of the UCSD P-System to optimize the code
for my accelerator board I located the (new) driver code of every card
into the RAM of the language card. This Videoterm driver is incredible
fast :-) I.e. scrolling one line up is done within 0.1msec, clear screen
˜2msec or nearly 500 clearscreens per sec.

BTW if you use an accelerator board which caches everything available on
the Apple mainboard, use the IIe internal 80column solution. The driver
code is in the (hopefully) cached ROM and the video RAM is within the
(cached) mainboard RAM. My accelerator board runs scrolling one line up
in ˜4msec [2] and clear screen in ˜2msec too.

- Ralf

[1] in 1987 and 1988.
[2] With tricky coding and large code: ˜2msec
Jorge
2017-09-12 10:22:36 UTC
Permalink
Raw Message
Post by Jorge
Post by Jorge
Post by Ralf Kiefer
Post by Michael J. Mahon
It is even possible to bank switch within the $Csxx range, though I've not
actually encountered that.
That's it. I.e. the Videx Videoterm uses 2kB of video RAM which is
bankswitched within the $C800 space.
In 512 byte chunks... which begs the question of what is the meaning of
setting a slot to "FAST", then ?
The cache controller of the accelerator board can obviously emulate the
Apple hardware. Which means the standard soft switches of an Apple II or
IIe to clear the caches when bank switching takes place, i.e. switching
$D000 from bank 1 to 2 or to ROM. But the accelerator board doesn't have
the knowledge to handle a Videx Videoterm.
When I wrote my own BIOS [1] of the UCSD P-System to optimize the code
for my accelerator board I located the (new) driver code of every card
into the RAM of the language card. This Videoterm driver is incredible
fast :-) I.e. scrolling one line up is done within 0.1msec, clear screen
˜2msec or nearly 500 clearscreens per sec.
BTW if you use an accelerator board which caches everything available on
the Apple mainboard, use the IIe internal 80column solution. The driver
code is in the (hopefully) cached ROM and the video RAM is within the
(cached) mainboard RAM. My accelerator board runs scrolling one line up
in ˜4msec [2] and clear screen in ˜2msec too.
- Ralf
[1] in 1987 and 1988.
[2] With tricky coding and large code: ˜2msec
Cool! Awesome, amazing engineerding feats ! :-) Those things make one feel good (when they finally work as expected), isn't it? I know that feeling, and the one when it doesn't work too...

So the meaning of setting a slot #N to "FAST" is "DO NOT slow down (switch to 1MHz mode) for 32µs (?) due to accesses to $C080..C08F+$N0 ?
--
Jorge.
Jorge
2017-09-12 10:37:08 UTC
Permalink
Raw Message
To recap:

$Cn00..CnFF is never cached.
$C800..CFFF is never cached.
$C080..C0FF+$n0 will trigger a slow down sequence (32µs?) unless the slot is set to "FAST".

?
--
Jorge.
Jorge
2017-09-12 10:39:08 UTC
Permalink
Raw Message
Post by Jorge
$Cn00..CnFF is never cached.
$C800..CFFF is never cached.
$C080..C0FF+$n0 will trigger a slow down sequence (32µs?) unless the slot is set to "FAST".
Ooops, typo:

$C080..C08F+$n0
Ralf Kiefer
2017-09-12 11:06:29 UTC
Permalink
Raw Message
Ditto :-)

Which model do you use? II+ or //e?
Which cards are in the slots?
Which system software do you use?

There is a practical solution :-)

- Ralf
Jorge
2017-09-12 16:16:37 UTC
Permalink
Raw Message
Post by Ralf Kiefer
Which model do you use? II+ or //e?
II+
Post by Ralf Kiefer
Which cards are in the slots?
Videx in #3 + Disk II in #6
Post by Ralf Kiefer
Which system software do you use?
DOS 3.3
Post by Ralf Kiefer
There is a practical solution :-)
:-?
--
Jorge.
Ralf Kiefer
2017-09-12 17:12:45 UTC
Permalink
Raw Message
Post by Jorge
Post by Ralf Kiefer
Which cards are in the slots?
Videx in #3 + Disk II in #6
No Language Card in slot #0?
Post by Jorge
DOS 3.3
Post by Ralf Kiefer
There is a practical solution :-)
:-?
My idea: you need about $0200 bytes of RAM in the cachable range [1].
You change the drivers code in $C3xx (new EPROM with your special code).
When initializing the new driver copies the most "valuable" parts to the
fast RAM area.

- Ralf

[1] I'm not familiar with DOS and RAM management: what happens with the
range between $0400 and $07FF when using the 80column card?
Jorge
2017-09-12 17:49:12 UTC
Permalink
Raw Message
Post by Ralf Kiefer
Post by Jorge
Post by Ralf Kiefer
Which cards are in the slots?
Videx in #3 + Disk II in #6
No Language Card in slot #0?
Nope :-) but I can put one if I must I have a pile of them.
Post by Ralf Kiefer
Post by Jorge
DOS 3.3
Post by Ralf Kiefer
There is a practical solution :-)
:-?
My idea: you need about $0200 bytes of RAM in the cachable range [1].
You change the drivers code in $C3xx (new EPROM with your special code).
When initializing the new driver copies the most "valuable" parts to the
fast RAM area.
Yep, sounds good, $3xx would be better for me in this case I think (no?).
Post by Ralf Kiefer
[1] I'm not familiar with DOS and RAM management: what happens with the
range between $0400 and $07FF when using the 80column card?
Nothing, it stays the same, but there are the screen holes remember! (which the videx uses)
--
Jorge.
Jorge
2017-09-12 18:40:02 UTC
Permalink
Raw Message
Post by Ralf Kiefer
My idea: you need about $0200 bytes of RAM in the cachable range [1].
You change the drivers code in $C3xx (new EPROM with your special code).
When initializing the new driver copies the most "valuable" parts to the
fast RAM area.
Ralf,

There's going to be a problem because the driver has to access c080+n0. Ok, no, there's not going to be any problem if the slot is set to fast,.. Let's hope that accesses to cc00..cdff won't trigger a slowdowns too. Should I check that too? I hope not!
--
Jorge.
Michael J. Mahon
2017-09-12 21:24:51 UTC
Permalink
Raw Message
Post by Jorge
Post by Ralf Kiefer
My idea: you need about $0200 bytes of RAM in the cachable range [1].
You change the drivers code in $C3xx (new EPROM with your special code).
When initializing the new driver copies the most "valuable" parts to the
fast RAM area.
Ralf,
There's going to be a problem because the driver has to access c080+n0.
Ok, no, there's not going to be any problem if the slot is set to fast,..
Let's hope that accesses to cc00..cdff won't trigger a slowdowns too.
Should I check that too? I hope not!
They must cause a slowdown, since they must change memory on the Videx
card, accessible only via the Apple bus.

The best you can achieve, as Ralf did, is one slow cycle per byte of card
memory accessed.
--
-michael - NadaNet 3.1 and AppleCrate II: http://michaeljmahon.com
Jorge
2017-09-12 23:05:33 UTC
Permalink
Raw Message
Post by Michael J. Mahon
Post by Jorge
Let's hope that accesses to cc00..cdff won't trigger a slowdowns too.
Should I check that too? I hope not!
They must cause a slowdown, since they must change memory on the Videx
card, accessible only via the Apple bus.
The best you can achieve, as Ralf did, is one slow cycle per byte of card
memory accessed.
Thanks. Yes, I get that. By slowdowns I mean 54 millisecond slowdowns. To scroll up it's got first to clear to end of line the (new, soon-to-be) last line and that's a loop of 80 chars in which I believe it spends most of the time when scrolling.
--
Jorge.
Ralf Kiefer
2017-09-13 00:01:05 UTC
Permalink
Raw Message
Post by Jorge
By slowdowns I mean 54 millisecond slowdowns. To scroll up it's got first
to clear to end of line the (new, soon-to-be) last line and that's a
loop of 80 chars in which I believe it spends most of the time when
scrolling.
In Message-ID: <2cb0a57a-6978-4411-8eee-***@googlegroups.com>
you gave us the timing diagram of that code:

| 2000:
| STA c0a0
|
| loop:
| STA c059
|
| delay1:
| LDX #$50
| DEX
| BNE delay1
| STA c058
|
| delay2:
| LDX #$50
| DEX
| BNE delay2
| JMP loop

The access to $C0A0 (Slot #2) slows down the CPU but the access to
$C058/9 (game port) does not. I guess that slot #2 was configured to
"slow" which triggers the 54msec timer.

Try the same test but instead of sta $C0A0 let's do a "sta $CC00".
That's the start of Video Ram of the Videoterm. Try this when slot #3 is
set to "slow" and is set to "fast". And try this again after adding more
instructions:
$2000:
lda $CFFF
lda $C300
sta $CC00
loop:
...

And a third pair of tests:
$2000:
lda $C0B0
lda $CFFF
lda $C300
sta $CC00
loop:
...

BTW the commented driver code is available in the manual of the
Videoterm, see "Apple II Documentation Project".

- Ralf
Jorge
2017-09-14 14:21:15 UTC
Permalink
Raw Message
Post by Ralf Kiefer
Post by Jorge
By slowdowns I mean 54 millisecond slowdowns. To scroll up it's got first
to clear to end of line the (new, soon-to-be) last line and that's a
loop of 80 chars in which I believe it spends most of the time when
scrolling.
| STA c0a0
|
| STA c059
|
| LDX #$50
| DEX
| BNE delay1
| STA c058
|
| LDX #$50
| DEX
| BNE delay2
| JMP loop
The access to $C0A0 (Slot #2) slows down the CPU but the access to
$C058/9 (game port) does not. I guess that slot #2 was configured to
"slow" which triggers the 54msec timer.
Try the same test but instead of sta $C0A0 let's do a "sta $CC00".
That's the start of Video Ram of the Videoterm. Try this when slot #3 is
set to "slow" and is set to "fast". And try this again after adding more
lda $CFFF
lda $C300
sta $CC00
...
lda $C0B0
lda $CFFF
lda $C300
sta $CC00
...
BTW the commented driver code is available in the manual of the
Videoterm, see "Apple II Documentation Project".
- Ralf
Ralf, there's no need, I was a bit dense when I wrote that, I had checked that already in a previous post where it jumped from cc00 to 2000 in a loop... it does not trigger any 54ms synch slowdowns.

But thanks anyway!
--
Jorge.
David Empson
2017-09-12 11:18:47 UTC
Permalink
Raw Message
Post by Jorge
Post by Ralf Kiefer
Post by Michael J. Mahon
It is even possible to bank switch within the $Csxx range, though I've not
actually encountered that.
That's it. I.e. the Videx Videoterm uses 2kB of video RAM which is
bankswitched within the $C800 space.
In 512 byte chunks... which begs the question of what is the meaning of
setting a slot to "FAST", then ?
The FAST setting in the Zip Chip (and Zip GS) means "don't slow down to
1 MHz for a prolonged period after accessing I/O locations in this
slot".

If a slot is set to SLOW then it is expected that it has timing critical
code in its driver, and all subsequent cycles are synced with the
motherboard clock for several milliseconds after any location belonging
to the slot is accessed, effectively disabling the accelerator for a
period.

I don't have register documentation for the 8-bit Zip Chip, but I do for
the ZipGS, which is similar in concept. After a SLOW slot I/O location
is accessed, the ZipGS operates in sync with the motherboard clock for
52 to 54 ms.

Every read or write access to an I/O location must sync with the
motherboard clock. Given your test results of adjacent STA $C040
producing a 500 kHz strobe output, it looks like the Zip Chip needs to
sync with the motherboard clock for an entire I/O write cycle, then can
do three 8 MHz cycles for the code fetch from its internal cache
overlapping with the next motherboard cycle.

I see no evidence of I/O write buffering (I never tested that back in
the 1990s when I was last actively using my Apple IIgs).
--
David Empson
***@actrix.gen.nz
Jorge
2017-09-12 17:40:35 UTC
Permalink
Raw Message
Post by David Empson
Post by Jorge
Post by Ralf Kiefer
Post by Michael J. Mahon
It is even possible to bank switch within the $Csxx range, though I've not
actually encountered that.
That's it. I.e. the Videx Videoterm uses 2kB of video RAM which is
bankswitched within the $C800 space.
In 512 byte chunks... which begs the question of what is the meaning of
setting a slot to "FAST", then ?
The FAST setting in the Zip Chip (and Zip GS) means "don't slow down to
1 MHz for a prolonged period after accessing I/O locations in this
slot".
If a slot is set to SLOW then it is expected that it has timing critical
code in its driver, and all subsequent cycles are synced with the
motherboard clock for several milliseconds after any location belonging
to the slot is accessed, effectively disabling the accelerator for a
period.
I don't have register documentation for the 8-bit Zip Chip, but I do for
the ZipGS, which is similar in concept. After a SLOW slot I/O location
is accessed, the ZipGS operates in sync with the motherboard clock for
52 to 54 ms.
I have checked that and you're right:

https://imgur.com/a/xZcaW

2000G and it gives... 54 ms !!

2000:
STA c0a0

loop:
STA c059

delay1:
LDX #$50
DEX
BNE delay1
STA c058

delay2:
LDX #$50
DEX
BNE delay2
JMP loop
Post by David Empson
Every read or write access to an I/O location must sync with the
motherboard clock. Given your test results of adjacent STA $C040
producing a 500 kHz strobe output, it looks like the Zip Chip needs to
sync with the motherboard clock for an entire I/O write cycle, then can
do three 8 MHz cycles for the code fetch from its internal cache
overlapping with the next motherboard cycle.
I see no evidence of I/O write buffering (I never tested that back in
the 1990s when I was last actively using my Apple IIgs).
Ahhh, yes, that too, it's a pity.
--
Jorge.
Ralf Kiefer
2017-09-13 23:06:01 UTC
Permalink
Raw Message
Post by Jorge
2000G and it gives... 54 ms !!
I found the background for this value :-) In the disk ][ (or IWM) code
there is a software controlled delay of 36.6msec. That's the time for
head settling after stepping.

There is another rule: the spindle motor should(!) be on for 150msec
before starting the seek operation. So the ZipChip waits just for about
66msec instead of 150msec. Obviously no problem.

Source:
Norman Leung - Software Control of the Disk II or IWM Controller.pdf
on Asimov

- Ralf
Jorge
2017-09-14 14:16:17 UTC
Permalink
Raw Message
Post by Ralf Kiefer
Post by Jorge
2000G and it gives... 54 ms !!
I found the background for this value :-) In the disk ][ (or IWM) code
there is a software controlled delay of 36.6msec. That's the time for
head settling after stepping.
There is another rule: the spindle motor should(!) be on for 150msec
before starting the seek operation. So the ZipChip waits just for about
66msec instead of 150msec. Obviously no problem.
Norman Leung - Software Control of the Disk II or IWM Controller.pdf
on Asimov
Interesting reading... :-) The SA400 OEM Manual (from google is your friend) says track to track 40ms settling time 10ms.

Cheers,
--
Jorge.
TomCh
2017-09-17 14:29:37 UTC
Permalink
Raw Message
Post by Michael J. Mahon
...
It is even possible to bank switch within the $Csxx range, though I've not
actually encountered that.
--
-michael - NadaNet 3.1 and AppleCrate II: http://michaeljmahon.com
Off topic, but in reply to Michael, the Apple Mousecard uses bank switching within the $Csxx range (selected by writing to the card's DEVICE SELECT' address range at $C080+s*$10).

Tom
Michael J. Mahon
2017-09-17 15:10:28 UTC
Permalink
Raw Message
Post by TomCh
Post by Michael J. Mahon
...
It is even possible to bank switch within the $Csxx range, though I've not
actually encountered that.
--
-michael - NadaNet 3.1 and AppleCrate II: http://michaeljmahon.com
Off topic, but in reply to Michael, the Apple Mousecard uses bank
switching within the $Csxx range (selected by writing to the card's
DEVICE SELECT' address range at $C080+s*$10).
Tom
Good to know, Tom--thanks!
--
-michael - NadaNet 3.1 and AppleCrate II: http://michaeljmahon.com
David Empson
2017-09-10 21:15:58 UTC
Permalink
Raw Message
Post by Jorge
Post by Jorge
Hi,
Question #1: It does not cache that range, am I right?
WRT to question #1: this zipchip does not seem to cache C800.CFFF (shared
ROM space) nor CN00..CNFF (slot #N ROM space), is that so?
It can't, because the accelerator doesn't know how the hardware is
implemented in those regions, except for well known cases like the Apple
//c or internal ROM in the //e.

Some I/O cards have bank-switched ROM, RAM or I/O in one of those
spaces. For the IOSTROBE area ($C800-$CFFF), some cards only partly
decode the $CFFF address for disabling the latch, so the accelerator
can't know what happens when $CFFE is accessed, for example.
--
David Empson
***@actrix.gen.nz
Jorge
2017-09-10 21:38:40 UTC
Permalink
Raw Message
Post by David Empson
Post by Jorge
WRT to question #1: this zipchip does not seem to cache C800.CFFF (shared
ROM space) nor CN00..CNFF (slot #N ROM space), is that so?
It can't, because the accelerator doesn't know how the hardware is
implemented in those regions, except for well known cases like the Apple
//c or internal ROM in the //e.
Some I/O cards have bank-switched ROM, RAM or I/O in one of those
spaces. For the IOSTROBE area ($C800-$CFFF), some cards only partly
decode the $CFFF address for disabling the latch, so the accelerator
can't know what happens when $CFFE is accessed, for example.
You're right David, I think, I've tried c300..c3ff and c800.cfff and there's no way, or so it seems. And yes, the slot was set to fast (Michael).

It's a pity, because I love the Videx but the zipchip won't make them scroll any faster... :-(
--
Jorge
Ralf Kiefer
2017-09-10 21:49:04 UTC
Permalink
Raw Message
Post by Jorge
It's a pity, because I love the Videx but the zipchip won't make them
scroll any faster... :-(

The Videx Videoterm runs the 6845. Using this chip you need not move the
whole video memory for several bytes when scrolling like the Apple IIe
hardware. Change the registers which define the upper left corner, AFAIR
R12 and R13. That's all.

Copy the driver code to accelerated RAM. That's the solution :-)

- Ralf
Jorge
2017-09-10 22:18:13 UTC
Permalink
Raw Message
Post by Jorge
Post by Jorge
It's a pity, because I love the Videx but the zipchip won't make them
scroll any faster... :-(
The Videx Videoterm runs the 6845. Using this chip you need not move the
whole video memory for several bytes when scrolling like the Apple IIe
hardware. Change the registers which define the upper left corner, AFAIR
R12 and R13. That's all.
Hi Ralf,

Yes it's got hardware scrolling but every time it scrolls up first it's got to fill/clear the (new) last line (the last 80 chars) with spaces before adjusting the 6845 scan start register!
Post by Jorge
Copy the driver code to accelerated RAM. That's the solution :-)
Yes, yes, I know... :-) But I was hoping there was another way !

Thanks
--
Jorge.
Steven Hirsch
2017-09-10 21:55:32 UTC
Permalink
Raw Message
Post by Jorge
You're right David, I think, I've tried c300..c3ff and c800.cfff and
there's no way, or so it seems. And yes, the slot was set to fast
(Michael).
It's a pity, because I love the Videx but the zipchip won't make them
scroll any faster... :-(
If the critical part of the scroll loop is small enough you can rewrite the
ROM code to place a copy in the stack page and execute from there. I have a
mod to the RamFAST ROM that used this trick in polled I/O mode to leverage the
ZipChip. On entry to the firmware block read routine, it would copy a small
loop to the deepest part of the stack page and JSR to it. It kills a few
cycles every time to copy the loop, but getting 8x throughput for the
subsequent 512-byte transfer more than makes up for it. I suppose it's
theoretically possible for there to be insufficient stack space, but I never
ran into any real life scenario that cut things so close. The loop was
perhaps 16 bytes or so.
Jorge
2017-09-10 22:28:11 UTC
Permalink
Raw Message
Post by Steven Hirsch
If the critical part of the scroll loop is small enough you can rewrite the
ROM code to place a copy in the stack page and execute from there. I have a
mod to the RamFAST ROM that used this trick in polled I/O mode to leverage the
ZipChip. On entry to the firmware block read routine, it would copy a small
loop to the deepest part of the stack page and JSR to it. It kills a few
cycles every time to copy the loop, but getting 8x throughput for the
subsequent 512-byte transfer more than makes up for it. I suppose it's
theoretically possible for there to be insufficient stack space, but I never
ran into any real life scenario that cut things so close. The loop was
perhaps 16 bytes or so.
That's cool!

I had almost forgotten the funny things we had to do back in the day in these vintage/primitive computers... :-)

Thanks!
--
Jorge.
Jorge
2017-09-12 10:53:43 UTC
Permalink
Raw Message
Now that I think about it, I could have known that the Videx was not running fast because ctrl-g sounded the same in 80 colums no matter what, unlike in 40 columns.

Silly me I burned an eprom to put it in the videx with the loops to trigger $c040 to see it on the scope, and checked both from c300 and c800. I could have tested c800.cfff from the videx ram at cc00.
Loading...