How to Create a Number of Parallel Processess
For warriors which contain a multi-process paper, or a vector
launched imp, it is neccessary to create a number of processes
running in parallel. The usual method of achieving this is to
use a combination of spl 1 and mov -1,0 instructions. To generate
an exact number of parallel processes simply converting the number
required in binary 3 -> 11, subtract one -> 10, use a spl 1 for
every one and a mov -1,0 for every zero.
E.g. 5 decimal = 101 binary, take away one = 100 binary. This
becomes the following code,
spl 1 ;
mov -1,0 ; Generate 5 processes
mov -1,0 ;
However, it is possible to do this slightly faster, and perhaps
even gain an extra b-field or two into the bargain, which can
be used for storing data, or decrementing locations in core.
There are snippets which product 3,5,7 and 9 processes. If required,
it is possible to add a number of spl 1,<nnn instructions to the end
of the snippet. Each one will double the number of processes.
N OLD CODE NEW CODE COMMENTS
3 spl 1, <xxx spl 2, <xxx One cycle faster, and one
mov -1, 0 spl 1, <yyy extra b-field
5 spl 1, <xxx spl 2, <xxx Faster by two cycles, and
mov -1, 0 spl 2, <yyy two extra b-fields
mov -1, 0 spl 1, <zzz
7 spl 1, <xxx spl 1, <xxx One cycle faster, but no
spl 1, <yyy spl 1, }0 extra b-fields, but won't
mov -1, 0 spl 1, <yyy work under ICWS'88 and is
self-modifying.
9 spl 1, <xxx spl 2, <xxx Three cycles faster, plus
mov -1, 0 spl 2, <yyy two extra b-fields, but
mov -1, 0 spl 1, }0 won't work under ICWS'88
mov -1, 0 spl 1, <zzz and is self-modifying.
11 spl 1, <xxx spl 1, <xxx Two cycles fast, plus
mov -1, 0 spl 1, }0 one extra b-fields, but
spl 1, <xxx spl 2, <yyy won't work under ICWS'88
mov -1, 0 spl 1, <zzz and is self-modifying.
While you will be lucky if this method gains your paper a
fraction of a point on the hill, it is still clearly better.
Of course, the code for 2^n processes remains identical, and
cannot be improved.
There exist also another snippet to save an instruction. Because the following
single instruction generates 3 processes:
spl 0, }0
By adding further lines you can get the following paralell processes:
N OLD CODE ALT 1 CODE ALT 2 CODE
3 spl 1, <xxx spl 2, <xxx spl 0, }0
mov -1, 0 spl 1, <yyy
5 spl 1, <xxx spl 2, <xxx spl 0, }0
mov -1, 0 spl 2, <yyy mov asd, 0
mov -1, 0 spl 1, <zzz (asd spl 1)
6 spl 1, <xxx - spl 0, }0
mov -1, 0 spl 1, <xxx
spl 1, <yyy
7 spl 1, <xxx spl 1, <xxx -
spl 1, <yyy spl 1, }0
mov -1, 0 spl 1, <yyy
9 spl 1, <xxx spl 2, <xxx spl 0, }0
mov -1, 0 spl 2, <yyy mov -1, 0
mov -1, 0 spl 1, }0 spl 1, <xxx
mov -1, 0 spl 1, <zzz
11 spl 1, <xxx spl 1, <xxx spl 0, }0
mov -1, 0 spl 1, }0 mov 1, 0
spl 1, <xxx spl 2, <yyy spl 1, <xxx
mov -1, 0 spl 1, <zzz
Keep in mind, that some processes get a cycle ahead with these.
For example the following code
spl 2
spl 1
will get processes out of sync if it starts with more than 1.
Incidentally, getting them out of sync isn't always bad. One
could try to take advantage of these.
spl 1
spl 1
spl 2
The code above could allowing for example to boot one piece of
code of length 4 and one piece of code with length 8 with no
wasted cycles and minimal boot code length.
For larger numbers of parallel running processes one could use
a kind of vector launch parallel processes as the example shows:
;normal 17 parallel process loader
;4 wasted cycles
spl 1
mov -1, 0
mov -1, 0
mov -1, 0
mov -1, 0
;no cycle waste
;every cycle used for spliting
;but 4 instructions longer
s0 spl 2
s1 spl s3
s2 spl @v1, }0
s3 spl *v1, }0
s4 spl 1
s5 ;code goes here
v1 dat s4, s3
dat s4, s4
dat s4, 0
dat s5, 0
The process queue from run in this case looks like:
; 0
; 1 2
; 2 2 3
; 2 3 3 3
; 3 3 3 3 4
; 3 3 3 4 4 4
; 3 3 4 4 4 4 4
; 3 4 4 4 4 4 4 4
; 4 4 4 4 4 4 4 4 5
; 4 4 4 4 4 4 4 5 5 5
; 4 4 4 4 4 4 5 5 5 5 5
; 4 4 4 4 4 5 5 5 5 5 5 5
; 4 4 4 4 5 5 5 5 5 5 5 5 5
; 4 4 4 5 5 5 5 5 5 5 5 5 5 5
; 4 4 5 5 5 5 5 5 5 5 5 5 5 5 5
; 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
; 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
It's hard to say which of these two scores better. Just try both
and look which one fits better to your warrior.
But you might also consider booting part of the spl block together
with the main body, as it is shown for example in Digitalis 2002.
This might be from importance to boot as fast as possible away from
your bulky quickscanning part.
|