Discussion:
cc65 / ca65: how to interface C and assembly?
(too old to reply)
Linards Ticmanis
2005-12-06 16:02:29 UTC
Permalink
Hello,

please excuse the cross-posting, but I thought that maybe somebody from
these three groups might be able to help me along a bit.

I'm not new to the C language or to 6502 assembly language, but I'm new
to cc65. I want to write a program in C using cc65, but I need to
integrate a small piece of cycle-counted assembly code to do low-level
disk writes with the correct timing; the Apple hardware is very picky in
this regard, so it's impossible to use compiled code.

I'd like best to use inline assembly ("asm") statements inside a C frame
function, but the code is not allowed to cross a page boundary, since
this would destroy the timing of branch instructions; and it seems that
".align" is not allowed in asm statements. Is there any way to tell the
compiler that you want a piece of inline assembly to be page-aligned?

If I can't use inline assembly, I'd need to know how to interface
assembly and C code. Especially, how do I access function parameters of
different data types from assembly language, how do I return values, and
how can I use temporary storage in the zero page without interfering
with whatever the compiled code does on the zero page. The cc65
documentation is largely silent on these matters, and while I've looked
at a few pieces of assembly code produced by the compiler, I found it
rather convoluted and not that easy to understand.

I understand that there's a software-implemented stack, using "sp" for a
stack pointer / base pointer, local variables and parameters are both
stored on this stack, and addressed with "(sp),y" type instructions, and
that for a number of stack manipulation operations, there are a couple
of standardized "jsr"s to library code; but exactly how the different
data types are stored remains unclear to me, also how to use zero page
in a compatible way.

If anybody could help with hints or with just some sample code that I
could look at, I'd be grateful for replies in these groups. Given that
the resulting discussion will probably be of some interest to the people
in these three groups, I've not set a "Followup-To:"; if you're
absolutely against cross posting, feel free to reply in just one group.

If there are any web forums or mailing lists where this question is more
likely to get useful answers, I'd be also grateful for a link.

Many thanks in advance!
--
Linards Ticmanis
Andy McFadden
2005-12-06 16:33:58 UTC
Permalink
Post by Linards Ticmanis
I'd like best to use inline assembly ("asm") statements inside a C frame
function, but the code is not allowed to cross a page boundary, since
this would destroy the timing of branch instructions; and it seems that
".align" is not allowed in asm statements. Is there any way to tell the
compiler that you want a piece of inline assembly to be page-aligned?
If I can't use inline assembly, I'd need to know how to interface
assembly and C code. Especially, how do I access function parameters of
different data types from assembly language, how do I return values, and
how can I use temporary storage in the zero page without interfering
with whatever the compiled code does on the zero page. The cc65
documentation is largely silent on these matters, and while I've looked
at a few pieces of assembly code produced by the compiler, I found it
rather convoluted and not that easy to understand.
If you're worried about where your code lives, relocate your code
somewhere else during program initialization, and jump to it. This
is essentially the approach used for assembly code embedded in Applesoft
programs. Write some inline assembly that calls the function at the
fixed address.

The way you carefully avoid interfering with zero page is to save and
restore the zero-page values on entry and exit. You won't be using
interrupts since your code is timing-critical, so there's no race
condition.
--
Send mail to ***@fadden.com (Andy McFadden) - http://www.fadden.com/
CD-Recordable FAQ - http://www.cdrfaq.org/
CiderPress Apple II archive utility for Windows - http://www.faddensoft.com/
Fight Internet Spam - http://spam.abuse.net/spam/ & http://spamcop.net/
MagerValp
2005-12-06 17:20:47 UTC
Permalink
LT> I'd like best to use inline assembly ("asm") statements inside a C
LT> frame function, but the code is not allowed to cross a page boundary,

Unfortunately, this rules out inline __asm__ statements.

LT> If I can't use inline assembly, I'd need to know how to interface
LT> assembly and C code.

Sure thing, it's simple enough. The software stack is implemented with
a zp pointer called sp. It starts at the top, and moves down as items
are pushed on the stack. You can .importzp sp and access it directly,
but it's easier to use the built in library calls for manipulating the
stack. When your code is called, the arguments are pushed onto the
stack left to right. If you declare your function __fastcall__ the
last (rightmost) argument will be in A/X instead of on the stack,
which saves a few cycles. Your code is responsible for pulling all the
arguments off the stack, so if you have e.g. 4 bytes of arguments, sp
should be increased with 4 before your code exits (2 if you use
__fastcall__). The return value should be in A (lo) and X (hi). X must
be set even if you just return a single byte, as cc65 doesn't clear X
for you if the value is cast e.g. from a char to an int.

Some of the available library routines are:

popa pull one byte off stack, increase sp by one
popax pull two bytes off stack, increase sp by two
pusha dec sp byte one, push one byte onto the stack
pushax dec sp by two, push two bytes onto stack
incsp1 increase sp by one
incsp2 increase sp by two (and so on)

Here's a silly example that copies 1-128 bytes from src to dst, and
returns the length:

-- main.c --

#include "copyshort.h"

void main(void) {
copyshort(10, 0x1000, 0x2000);
}


-- copyshort.h --

/* copy 128 bytes or less from, returns bytes copied */
extern unsigned char __fastcall__ copyshort(unsigned char len, void *src, void *dst);


-- copyshort.s --

.export _copyshort ; prepend labels with _ so C can see them

.import popa, popax
.importzp ptr1, ptr2

src = ptr1 ; borrow runtime zp pointers
dst = ptr2 ; tmp1 and tmp2 are also available


.code

_copyshort:
sta dst ; __fastcall__ so last arg is
stx dst + 1 ; in A/X

jsr popax ; pull src off stack
sta src
stx src + 1

jsr popa ; pull len off stack
pha ; save len for return value
tay
dey
: lda (src),y
sta (dst),y
dey
bpl :-

ldx #0 ; return len in A/X
pla
rts


LT> If there are any web forums or mailing lists where this question is
LT> more likely to get useful answers, I'd be also grateful for a link.

Check out the cc65 mailing list:

http://www.cc65.org/#List
--
___ . . . . . + . . o
_|___|_ + . + . + . Per Olofsson, arkadspelare
o-o . . . o + ***@cling.gu.se
- + + . http://www.cling.gu.se/~cl3polof/
U. v. Bassewitz
2005-12-06 22:02:39 UTC
Permalink
Post by Linards Ticmanis
I'd like best to use inline assembly ("asm") statements inside a C frame
function, but the code is not allowed to cross a page boundary, since
this would destroy the timing of branch instructions; and it seems that
".align" is not allowed in asm statements.
The builtin inline assembler is not a full featured assembler, therefore
control commands (like .align) aren't accepted. A second reason is that
the inline assembler code is processed by the the optimizer, and handling
control commands in this stage would be too complex.
Post by Linards Ticmanis
Is there any way to tell the
compiler that you want a piece of inline assembly to be page-aligned?
Alignment is done by the linker. If you want to embed your inline
assembler code in a C function, you can place this C function in it's own
segment using something as

#pragma codeseg (push, FASTCODE);

void fastcode (void)
{
/* Your code here */
}

#pragma codeseg (pop);

Using your own linker configuration, you can the place the FASTCODE
segment whereever you like. See the linker docs for more information on
how to do this. The FAQ does also have a paragraph that talks about code
placement:

http://www.cc65.org/faq.php#ORG
Post by Linards Ticmanis
I understand that there's a software-implemented stack, using "sp" for a
stack pointer / base pointer, local variables and parameters are both
stored on this stack, and addressed with "(sp),y" type instructions, and
that for a number of stack manipulation operations, there are a couple
of standardized "jsr"s to library code; but exactly how the different
data types are stored remains unclear to me, also how to use zero page
in a compatible way.
MagerValp has already given a good example. I can add a few things here:

The low 16 bit of a 32 bit value are returned in A/X as with 16 bit
values. The high 16 bit are returned in the zero page location named
"sreg". If you're using inline asm, you don't have to care about the
declaration, the compiler will do that for you. If you're writing
assembler modules, either use one of

.import sreg:zp
.importzp sreg

or include the file "zeropage.inc", which will give you access to all zero
page locations, the compiler knows about.

The following zero page variables can be used freely by assembler
functions:

ptr1, ptr2, ptr3, ptr4
tmp1, tmp2, tmp3, tmp3

But beware, other functions will also use these locations, so if you're
calling a subroutine, they may get clobbered.

regbank

is a 6 byte zero page space used for register variables. It has the
disadvantage that any caller expects the current to survive the function
call, so you need to save it somewhere. The advantage is that you can
expect to behave other functions in this way.

For more examples on how to interface with assembler, just have a look at
the library sources, which can be found in the source archive.

Regards


Uz
--
Ullrich von Bassewitz ***@spamtrap.musoftware.de
22:38:08 up 62 days, 6:25, 11 users, load average: 0.00, 0.02, 0.03
Linards Ticmanis
2005-12-06 23:18:51 UTC
Permalink
Thanks to all who answered for the help!
Post by U. v. Bassewitz
The builtin inline assembler is not a full featured assembler, therefore
control commands (like .align) aren't accepted. A second reason is that
the inline assembler code is processed by the the optimizer, and handling
control commands in this stage would be too complex.
Being processed by the optimizer pretty much rules out using inline
assembly I fear, since the Apple disk controller wants to receive a byte
exactly every 32 cycles; one more or less and it won't work. So any kind
of optimization or reorganization would destroy the functionality.

So I guess I'll use a separate assembly file and .align my code that
way. There's no need to call any functions from the write code, it's
just a simple loop that dumps memory locations' contents to the write
register. And now that I know how parameter passing works, it should be
easy enough.

One more thing, on the Apple many hardware funktions are activated by
accessing certain addresses. It's only the access that matters, the data
read or written is irrelevant. But when I use peekpoke.h and say
something like "PEEK(0xC030);" without using the result for anything, it
gets optimized away. I've settled on "*((char *)0xC030) = __AX__;",
which gives a "pure" store, storing whatever is currently in A; but this
is problematic for some addresses, since the 6502 accesses an address
twice during a write. Some of these "softswitches" are toggle switches,
so you need to read them, which produces only one access; writing them
toggles them back and forth instead.

Is there any way to get a "pure" load, i.e. make the compiler emit just
an LDA (or a BIT, or whatever) but to ignore the result? I mean without
resorting to inline assembly. I tried to do it with "volatile" but that
doesn't seem to have any effect.

(No need to answer this rather peculiar request, unless you feel like
answering. I know I can just store the result to some temporary variable
and forget about it, it's more or less a question on whether it's possible.)
--
Linards Ticmanis
Shawn Jefferson
2005-12-07 05:53:53 UTC
Permalink
Post by Linards Ticmanis
One more thing, on the Apple many hardware funktions are activated by
accessing certain addresses. It's only the access that matters, the data
read or written is irrelevant. But when I use peekpoke.h and say
something like "PEEK(0xC030);" without using the result for anything, it
gets optimized away. I've settled on "*((char *)0xC030) = __AX__;",
which gives a "pure" store, storing whatever is currently in A; but this
is problematic for some addresses, since the 6502 accesses an address
twice during a write. Some of these "softswitches" are toggle switches,
so you need to read them, which produces only one access; writing them
toggles them back and forth instead.
Is there any way to get a "pure" load, i.e. make the compiler emit just
an LDA (or a BIT, or whatever) but to ignore the result? I mean without
resorting to inline assembly. I tried to do it with "volatile" but that
doesn't seem to have any effect.
A volatile type would be great. I have a program that suffers from a
similiar problem, in that I store a value to a memory location to check for
the presence of hardware. If I check for that value in that location right
after the assignment, the optimizer optimizes it away. I got around it by
putting the check far enough down in the code from the assignment that the
optimizer doesn't remember it.
--
Shawn Jefferson
(fix reply to for email)
Michael J. Mahon
2005-12-07 08:36:11 UTC
Permalink
Post by Shawn Jefferson
Post by Linards Ticmanis
One more thing, on the Apple many hardware funktions are activated by
accessing certain addresses. It's only the access that matters, the data
read or written is irrelevant. But when I use peekpoke.h and say
something like "PEEK(0xC030);" without using the result for anything, it
gets optimized away. I've settled on "*((char *)0xC030) = __AX__;",
which gives a "pure" store, storing whatever is currently in A; but this
is problematic for some addresses, since the 6502 accesses an address
twice during a write. Some of these "softswitches" are toggle switches,
so you need to read them, which produces only one access; writing them
toggles them back and forth instead.
Is there any way to get a "pure" load, i.e. make the compiler emit just
an LDA (or a BIT, or whatever) but to ignore the result? I mean without
resorting to inline assembly. I tried to do it with "volatile" but that
doesn't seem to have any effect.
A volatile type would be great. I have a program that suffers from a
similiar problem, in that I store a value to a memory location to check for
the presence of hardware. If I check for that value in that location right
after the assignment, the optimizer optimizes it away. I got around it by
putting the check far enough down in the code from the assignment that the
optimizer doesn't remember it.
A volatile type is a necessity in any optimized language used to
access addresses when the access has side effects. As noted, the
Apple (and indeed, any 6502 machine) is loaded with such side effects.

One must also make some "policy" decisions about indirect accesses
(through pointers). Without explicit information to the contrary,
it is best to assume that a pointer *cannot* point to a volatile,
since to do otherwise would eliminate almost all optimizations.

-michael

Music synthesis for 8-bit Apple II's!
Home page: http://members.aol.com/MJMahon/

"The wastebasket is our most important design
tool--and it is seriously underused."
Lyrical Nanoha
2005-12-07 09:56:48 UTC
Permalink
Post by Michael J. Mahon
A volatile type is a necessity in any optimized language used to
access addresses when the access has side effects. As noted, the
Apple (and indeed, any 6502 machine) is loaded with such side effects.
One must also make some "policy" decisions about indirect accesses
(through pointers). Without explicit information to the contrary,
it is best to assume that a pointer *cannot* point to a volatile,
since to do otherwise would eliminate almost all optimizations.
I for one don't like my code computer-optimized because I lose control
over it.

-uso.
Michael J. Mahon
2005-12-07 23:52:34 UTC
Permalink
Post by Lyrical Nanoha
Post by Michael J. Mahon
A volatile type is a necessity in any optimized language used to
access addresses when the access has side effects. As noted, the
Apple (and indeed, any 6502 machine) is loaded with such side effects.
One must also make some "policy" decisions about indirect accesses
(through pointers). Without explicit information to the contrary,
it is best to assume that a pointer *cannot* point to a volatile,
since to do otherwise would eliminate almost all optimizations.
I for one don't like my code computer-optimized because I lose control
over it.
In high-level languages, optimization is often necessary to allow
more efficient execution of natural expressions of algorithms.

For example, a loop counter may be kept in a register, even
though it is expressed as a variable (typically in memory).

Of course, if you like the compiler's unoptimized code, then you
can turn optimization off.

-michael

Music synthesis for 8-bit Apple II's!
Home page: http://members.aol.com/MJMahon/

"The wastebasket is our most important design
tool--and it is seriously underused."
Michael J. Mahon
2005-12-07 08:31:33 UTC
Permalink
Post by Linards Ticmanis
Thanks to all who answered for the help!
Post by U. v. Bassewitz
The builtin inline assembler is not a full featured assembler, therefore
control commands (like .align) aren't accepted. A second reason is that
the inline assembler code is processed by the the optimizer, and handling
control commands in this stage would be too complex.
Being processed by the optimizer pretty much rules out using inline
assembly I fear, since the Apple disk controller wants to receive a byte
exactly every 32 cycles; one more or less and it won't work. So any kind
of optimization or reorganization would destroy the functionality.
I hope that the optimizer knows enough to leave assembly code
unchanged. I presumed (oops) that UVB meant only that it "passed
through" the optimizer, not that the optimizer attempted to do
anything with it... ;-(
Post by Linards Ticmanis
So I guess I'll use a separate assembly file and .align my code that
way. There's no need to call any functions from the write code, it's
just a simple loop that dumps memory locations' contents to the write
register. And now that I know how parameter passing works, it should be
easy enough.
A good fallback position.
Post by Linards Ticmanis
One more thing, on the Apple many hardware funktions are activated by
accessing certain addresses. It's only the access that matters, the data
read or written is irrelevant. But when I use peekpoke.h and say
something like "PEEK(0xC030);" without using the result for anything, it
gets optimized away. I've settled on "*((char *)0xC030) = __AX__;",
which gives a "pure" store, storing whatever is currently in A; but this
is problematic for some addresses, since the 6502 accesses an address
twice during a write. Some of these "softswitches" are toggle switches,
so you need to read them, which produces only one access; writing them
toggles them back and forth instead.
Is there any way to get a "pure" load, i.e. make the compiler emit just
an LDA (or a BIT, or whatever) but to ignore the result? I mean without
resorting to inline assembly. I tried to do it with "volatile" but that
doesn't seem to have any effect.
It is not true that all stores cause a read access. I don't know
how this myth continues to propagate. (Perhaps it's because of
the "double access" note about the Applesoft POKE function, which
occurs because that store is *always* indexed.)

Only indexed and indirect indexed stores cause a read access prior
to the write access (because of a data path conflict in the chip
which requires another cycle).

-michael

Music synthesis for 8-bit Apple II's!
Home page: http://members.aol.com/MJMahon/

"The wastebasket is our most important design
tool--and it is seriously underused."
Linards Ticmanis
2005-12-07 11:15:30 UTC
Permalink
Post by Michael J. Mahon
I hope that the optimizer knows enough to leave assembly code
unchanged. I presumed (oops) that UVB meant only that it "passed
through" the optimizer, not that the optimizer attempted to do
anything with it... ;-(
Probably you are right.
Post by Michael J. Mahon
It is not true that all stores cause a read access. I don't know
how this myth continues to propagate. (Perhaps it's because of
the "double access" note about the Applesoft POKE function, which
occurs because that store is *always* indexed.)
Only indexed and indirect indexed stores cause a read access prior
to the write access (because of a data path conflict in the chip
which requires another cycle).
Thanks. I was indeed thinking that every store instruction causes two
accesses. If not, the solution I gave above seems to be ideal, provided
you use an absolute address, not some expression that has to be
calculated at runtime (which will compile into an y-indexed indirect store).
--
Linards Ticmanis
sicklittlemonkey
2005-12-07 16:08:23 UTC
Permalink
Post by Linards Ticmanis
the solution I gave above seems to be ideal, provided
you use an absolute address, not some expression that has to be
calculated at runtime (which will compile into an y-indexed indirect store)
Note that for the switch to write mode in your disk write code:
(B83D) LDA #$FF ;(a) = sync byte.
STA Q7H,X ;Write 1 sync byte.
that STA abs,X (with no page crossing) is pretty much essential.
It syncs with the Woz Machine. You can't replace it with a STA abs.

This is covered in obscene detail in Sather's Understanding the Apple
II.

Just to expand on the double write issue, various 6502 ops
perform from 1 to 4 reads and writes to the given address.
Read-modify-write ops always perform two writes.
Read-modify-write,X ops perform two reads, two writes.

65C02 ops perform 1 to 3 reads and writes to the address,
and the read-modify-write ops only perform one write.

Ah, the Apple II. The more you look, the more there is.
It's like Natural Philosophy, or something. ;-)

Cheers,
Nick.
Michael J. Mahon
2005-12-07 23:53:42 UTC
Permalink
Post by Linards Ticmanis
Post by Michael J. Mahon
I hope that the optimizer knows enough to leave assembly code
unchanged. I presumed (oops) that UVB meant only that it "passed
through" the optimizer, not that the optimizer attempted to do
anything with it... ;-(
Probably you are right.
Post by Michael J. Mahon
It is not true that all stores cause a read access. I don't know
how this myth continues to propagate. (Perhaps it's because of
the "double access" note about the Applesoft POKE function, which
occurs because that store is *always* indexed.)
Only indexed and indirect indexed stores cause a read access prior
to the write access (because of a data path conflict in the chip
which requires another cycle).
Thanks. I was indeed thinking that every store instruction causes two
accesses. If not, the solution I gave above seems to be ideal, provided
you use an absolute address, not some expression that has to be
calculated at runtime (which will compile into an y-indexed indirect store).
...which is exactly the problem with slot-relative addresses.

-michael

Music synthesis for 8-bit Apple II's!
Home page: http://members.aol.com/MJMahon/

"The wastebasket is our most important design
tool--and it is seriously underused."
Linards Ticmanis
2005-12-08 03:18:37 UTC
Permalink
Post by Michael J. Mahon
Post by Linards Ticmanis
Thanks. I was indeed thinking that every store instruction causes two
accesses. If not, the solution I gave above seems to be ideal,
provided you use an absolute address, not some expression that has to
be calculated at runtime (which will compile into an y-indexed
indirect store).
...which is exactly the problem with slot-relative addresses.
It is. But then, the only real "toggle" switches, where the same address
switches both ways, are the speaker and cassette outputs, I think. Or
are there any more?

(I removed the non-apple groups from this one)
--
Linards Ticmanis
Michael J. Mahon
2005-12-08 04:34:42 UTC
Permalink
Post by Linards Ticmanis
Post by Michael J. Mahon
Post by Linards Ticmanis
Thanks. I was indeed thinking that every store instruction causes two
accesses. If not, the solution I gave above seems to be ideal,
provided you use an absolute address, not some expression that has to
be calculated at runtime (which will compile into an y-indexed
indirect store).
...which is exactly the problem with slot-relative addresses.
It is. But then, the only real "toggle" switches, where the same address
switches both ways, are the speaker and cassette outputs, I think. Or
are there any more?
(I removed the non-apple groups from this one)
(Good idea.)

There are lots of slot card addresses that have side effects that
could be messed up by multiple accesses. Reading status registers
comes to mind as a popular one.

-michael

Music synthesis for 8-bit Apple II's!
Home page: http://members.aol.com/MJMahon/

"The wastebasket is our most important design
tool--and it is seriously underused."
Oliver Schmidt
2005-12-11 11:39:03 UTC
Permalink
Hi,
Post by Linards Ticmanis
Post by U. v. Bassewitz
A second reason is that
the inline assembler code is processed by the the optimizer, ...
Being processed by the optimizer pretty much rules out using inline
assembly I fear, ...
Not really. cc65 supports a pragma to control optimization on a
function level granularity. Something like this works nicely and is
used successfully i.e. in the Contiki sources.

#pragma optimize(push, off)
void foo(void)
{
asm("...");
asm("...");
}
#pragma optimize(pop)

Best, Oliver
Linards Ticmanis
2005-12-11 13:38:39 UTC
Permalink
Post by Oliver Schmidt
Post by Linards Ticmanis
Being processed by the optimizer pretty much rules out using inline
assembly I fear, ...
Not really. cc65 supports a pragma to control optimization on a
function level granularity. Something like this works nicely and is
used successfully i.e. in the Contiki sources.
#pragma optimize(push, off)
void foo(void)
{
asm("...");
asm("...");
}
#pragma optimize(pop)
Thanks. I've already settled on a separate assembly file though.
Everything works just great. Thanks for all the people who helped.
--
Linards Ticmanis
Loading...