login



no image


::recent

::off-topic
-:-whoami
welcome!
alternative

::downloads
-:-free
-:-non-free

::coding
-:-projects
gallery
samp
other

::misc
contact
my bookmarks


no image
no image
no image

...

general development



WMML4k.1Mon Jun 25, 2007
q/kd: updateSun Jun 10, 2007
q/kd-treeFri May 18, 2007
xicc updateFri May 11, 2007
gnu again: bison and flexSun May 06, 2007
Introducing Mr. Lindenmayer!Thu Nov 16, 2006
Mood Music of ColorsTue Oct 31, 2006



WMML4k.1Mon Jun 25, 2007

Another braindump for you programmers out there, who try to crunch together a small binary (like myself does at the moment). If you recognize something stupid, do not hesitate to write some mail to seb@phresnel.org!

/****************************************************************************\

  What Monsieur Mach learned while trying to hack a 4k, Issue 1
 ---------------------------------------------------------------
  (C)2007 Sebastian Mach
  http://phresnel.org | seb@phresnel.org

 * use constants which have dedicated support in the fpu (e.g. 0.0,1.0) can
   save 8 and more bytes per use
   e.g.: - imagine a function g() with return value [-1..+1]
         - you want to map it to [0..1]
         - instead of g()*0.5+0.5, you should definitely use (g()+1.0)*0.5
 * if a lot of calls to the same function exist, and if their return-val
   is not changed in code-structure, then it may be better to do all
   the calls in a loop and save the results in an array
   (see e.g.defRandomXForms(), where a lot of calls to a random number
   generator have been pushed into a lookup-table generator)
 * changing lengths of arrays can cause size-explosions of >1k.
   do this carefully, both when increasing and decreasing.
   sizes of 2^n are less harsmfull in most cases.
 * SDL eats around 1k for simple screen buffer alloc and surface-locking
 * SDL eats around 0.1k for SDL_WM_SetCaption()
 * test if loop unrolling is beneficial or not. a loop like
      "for(u=0; u<3; u++ ) c[u] = 0;"
   might be beneficial to be unrolled. a loop like
      "for(u=0; u<3; u++ ) c[u] = pow(x,4)/sqrt(y) * 3.14159;"
   not.
 * step by step: a) compile and check size; b) tweak; c) a);
 
****************************************************************************/





q/kd: updateSun Jun 10, 2007

Today I've made a quick-and-dirty transaction of my old Preetham-Style sky into my mini ray-tracer obir (which is a dead end project, the only reason it exists is to test the q/kd-tree).

I also hacked some simple fog, based on the ray's direction into the sky, plus some ambient lighting, based on the shadow ray at each intersection.

Pardon me for this yet-again-rush-edit, but I am really in a hurry. I will give you more information in the next days or weeks, just be informed that my previous 4-8 frames at 512x512 resolution (and OpenGL-like shading, that is, only primary-rays plus dot-product lighting without shooting shadow rays) went down to 1-4 frames on my old AthlonXP 1800+. But the cool thing is that I can still increase the size of the heightmap a lot without loosing much performance (as far as my 512MB's of RAM let me get, I render 8193x8193 sized maps at 1-3 frames, yielding a total of approximately 130 millions triangles per frame, or a maximum of 390 millions per second.

I would be highly thankful if you could download the linux-binaries and send me the stderr-dump (for sure I will name your name in upcoming paperage ;) ) including a system description (Amount of RAM plus Processor Model).

laterz o/

edit:
grabbed from the newly opened discussions on ompf ( visuals thread and demos thread ), here come some more units of information:


  • quick and dirty dragn'drop of my old Preetham-Style-Sky into obir
  • simple ambient lighting (I use the sky-light received from some direction (which is calcuated by negating the two horizontal components of the shadow-ray) + the sun-light multiplied by 0.2 if the intersection lies in a shadow )
  • hackish fog: attenuation is based on the simple formula f = 1.0 / (1+att*distance). The fog color is again taken from the skylight, where I simply used the direction of the ray peeking into the sky, with the vertical component minimized to zero (Preetham doesn't like downwards directions)
  • most screens were taken off a 1025² 24bit heightmap (while my q/kd-tree only has 16bit accuracy); gimp really choked on 8193² maps on my 512MB RAM machine (took me half an hour to create some plasma magic + gauss filter, and then the map produced was ugly; just take note that the framerate only has minimal lost on such huge maps (I can still render 0.5-1.5 fps@512x512 screen-res)
  • I disabled MLRTA for the demo since without empty occluders, the speed gain is between -10%/50% (where the 50% are reached when the screen is only partially covered by the landscape)
  • mem-usage is (I think) very low (this was actually the major motivation for the q/kd-tree): you can fill the acceleration structure with 8193²-heightmaps in approx. 400 MB RAM, that is around 130M triangles


downloadsabout
obir qkd demo 1 (x86SSE) (obirdemo1_x86SSE_linux.tar.gz)gzipped tarball containing a self-running demonstration of the q/kd-tree. This is the linux version for x86-processors with SSE instruction set (e.g. AMD (R) AthlonXP or newer)



q/kd-treeFri May 18, 2007

While still not having any more technical details on the q/kd-tree here come the proofs that I am still working on it. The images show visualizations of the just implemented MLRTA. They also show that my traversal and Entry-Point (EP) search is definitely far away from being perfect.

My benchmark currently throws out up to 3 frames per second on average (I made it a hard benchmark). For the common case where the horizon is nearly (well...) horizontal and strikes through the screen median I get 8-10 fps (given a screen resolution of 512x512).

As for the images:

  • red: the tiles, also showing adaptive tile splitting
  • green: the more greenish, the deeper the corresponding EP in the q/kd-tree
  • black: ignored regions during rendering (fully outside the frustum)
I am also using a modified frustum culling algorithm with up to three additional splitting planes for the case the ray-bundle is coherent (one additional plane for each coherent axis).

Sorry for rush-editing through this update, damn busy at the moment with that stuff (took a day of holyday for it ;) ). Now, here come the images, maybe I gonna edit this later when I have some new results, see you!

edit: summary of today: I need Empty Occluders. Without them the whole thing isn't significantly faster than with no MLRTA at all.
The ToDo-List:

  • Find good way to create empty occluders (I already have something in mind inspired by SAH)
  • Packet-Tracing





xicc updateFri May 11, 2007

Still having fun and making progress:

  • Implicit namespaces (using the fine braces '{' and '}' )
  • Check for double declarations (see last line)
  • A fine heap with relative base+x style addressing!


smach@debian-smach:~/Projects/braindump/flextest3$ ./test_run

 1   {
 2           {
 3           }
 4           float alpha;
 5           alpha = 0;
 6           float bravo;
 7           {
 8                   float alpha;
 9           }
10           bravo = alpha;
11   }
12 
13 
14   {
15           float golf;
16           float golf;
17   }

;-------------------------------------------
;compiled with xicc 0.1.0
;(http://xitrace.org, (C)2007 Sebastian Mach)
;-------------------------------------------
push #heapPtr                   ; stack->1
push #heapPtr                   ; stack->2
push #heapPtr                   ; stack->3
pop #heapPtr                    ; stack->2
; variable `alpha` defined, ofs heap ptr is now 4
push 0.000000                   ; stack->3
movtos alpha                    ; stack->3
pop                             ; stack->2
; variable `bravo` defined, ofs heap ptr is now 8
push #heapPtr                   ; stack->3
; variable `alpha` defined, ofs heap ptr is now 4
pop #heapPtr                    ; stack->2
push alpha                      ; stack->3
movtos bravo                    ; stack->3
pop                             ; stack->2
pop #heapPtr                    ; stack->1
push #heapPtr                   ; stack->2
; variable `golf` defined, ofs heap ptr is now 4
fatal error (16): variable `golf` already declared in line 15


edit: quick notes to myself...

  • [done (see below dump)] implicit namespaces/scopes should not pushpop the base ptr, that should only happen at function begin/end
  • the typechecks!
If you find some error in the assembly, any hint to me is appreciated! (see contact page)


smach@debian-smach:~/Projects/braindump/xicc$ ./test_run


{
        float papa;
        float oscar;
        {
                float papa;
                papa = oscar + papa;
        }
        papa = oscar + papa;
}
;-------------------------------------------
;compiled with xicc 0.1.0
;(http://xitrace.org, (C)2007 Sebastian Mach)
;-------------------------------------------
; alloc global frame...
push #base              ; stack->1
push #heapPtr           ; stack->2
add #base, #heapPtr
; ...done
push #base              ; stack->3
push #heapPtr           ; stack->4
add #base, #heapPtr
add #heapPtr,4          ; get 4 bytes from heap for `papa`, heapptr->4
add #heapPtr,4          ; get 4 bytes from heap for `oscar`, heapptr->8
add #heapPtr,4          ; get 4 bytes from heap for `papa`, heapptr->12
push [#base+4]          ; for `oscar`! / stack->5
push [#base+8]          ; for `papa`! / stack->6
addp                    ; stack->5
movtos [#base+8]        ; for `papa`! / stack->5
pop                     ; stack->4
push [#base+4]          ; for `oscar`! / stack->5
push [#base+0]          ; for `papa`! / stack->6
addp                    ; stack->5
movtos [#base+0]        ; for `papa`! / stack->5
pop                     ; stack->4
pop #heapPtr            ; stack->3
pop #base               ; stack->2
; dealloc global frame...
pop #heapPtr            ; stack->1
pop #base               ; stack->0
; ...done
;-------------------------------------------

One more dump with global variables, interacting with local ones. Notice how fine the scoping works (that is, in the function body the global xray's visibility is overriden by the local declaration of another xray).


float uniform;
float victor;
float xray;
{
        float xray;
        uniform = victor = xray*uniform;
}

;-------------------------------------------
;compiled with xicc 0.1.0
;(http://xitrace.org, (C)2007 Sebastian Mach)
;-------------------------------------------
add #heapPtr,4          ; get 4 bytes from heap for `uniform`, heapptr->4
add #heapPtr,4          ; get 4 bytes from heap for `victor`, heapptr->8
add #heapPtr,4          ; get 4 bytes from heap for `xray`, heapptr->12
push #base              ; stack->1
push #heapPtr           ; stack->2
add #base, #heapPtr
add #heapPtr,4          ; get 4 bytes from heap for `xray`, heapptr->4
push [#base+0]          ; for `xray`! / stack->3
push [0]                ; for `uniform`! / stack->4
mulp                    ; stack->3
movtos [4]              ; for `victor`! / stack->3
movtos [0]              ; for `uniform`! / stack->3
pop                     ; stack->2
pop #heapPtr            ; stack->1
pop #base               ; stack->0
;-------------------------------------------






gnu again: bison and flexSun May 06, 2007

Writing a compiler is fun! You can currently find me in the abbyss of flex and bison (both are free GNU implementations of lex and yacc).

Those tools provide a fast and robust way of generating a lexical analyzer and a parser/compiler for your very own programming language (or some other neat stuff). And as xitrace [;)]definitely needs[/;)] an own scripting language for scene modelling I need to descent into the above mentioned abyss of compiler construction, that is, I am programming a programming lanugage.

At this point, thanks to Mister Ludwig again for hinting me over to yacc in a recent talk!

Enough brabble, have a look at a first expression parser below, which generates some sort (my own one, for sure) of assembly.


smach@debian-smach:~/Desktop/flextest3$ cat program && cat program | ./a.out

bravo = 2;
charlie = bravo + 2;
alpha = 1/delta + zulu;
charlie = 5.5 + (alpha = (bravo=5/2)) * 2;

++++++++++++++++++++++++++++++++++++++++
push 2.000000                   ; stack->1
movtos bravo                    ; stack->1
pop                             ; stack->0
                                ; next expression:)
push bravo                      ; stack->1
push 2.000000                   ; stack->2
addp                            ; stack->1
movtos charlie                  ; stack->1
pop                             ; stack->0
                                ; next expression:)
push 1.000000                   ; stack->1
push delta                      ; stack->2
divp                            ; stack->1
push zulu                       ; stack->2
addp                            ; stack->1
movtos alpha                    ; stack->1
pop                             ; stack->0
                                ; next expression:)
push 5.500000                   ; stack->1
push 5.000000                   ; stack->2
push 2.000000                   ; stack->3
divp                            ; stack->2
movtos bravo                    ; stack->2
movtos alpha                    ; stack->2
push 2.000000                   ; stack->3
mulp                            ; stack->2
addp                            ; stack->1
movtos charlie                  ; stack->1
pop                             ; stack->0
                                ; next expression:)






Introducing Mr. Lindenmayer!Thu Nov 16, 2006

Aristid Lindenmayer had the basic ideas of the concepts of L-Systems. It's late, too late, so I just link you over to Wikipedia for some basic knowledge about them:click.

To get really interesting results it is essential to program parametric L-Systems. I've managed it (finally, I've read dozens of pages on that topic the last days, have a look at 'Algorithmic Botany at the University of Calgary' in my bookmarks section).

It took about 150 lines of source code for the equation interpreter (with only rudimentary syntax-checks), and further 200 lines to write the actual L-System interpreter. At this point, respect to the coders of C-compilers! (Private note for me: Work out comments and make code documentation!)

Here's a small example output of my interpreter, which calculates a small faculty row (no screenshot but a copy+paste):


L-System Interpreter (C)2006 Sebastian Mach alias 'greenhybrid'
===============================================================

I now will solve the following L-System:

  axiom: A(1,1,1)
  p0: A(x,y,I) --> A( x*y, y+1, 1/(x*y) )

derivating...

A(1,1,1)
A(1.000000,2.000000,1.000000)
A(2.000000,3.000000,0.500000)
A(6.000000,4.000000,0.166667)
A(24.000000,5.000000,0.041667)
A(120.000000,6.000000,0.008333)
A(720.000000,7.000000,0.001389)

See you!





Mood Music of ColorsTue Oct 31, 2006

Dear friends,
The fractalic section at phresnel.org lives due the images since the first picture, so no time wasting words now, here's a visual preview of my upcoming fractal-renderer. Enjoy!





content->far()
no image
no image
no image
no image
no image
no image
no image
no image
no image
no image
no image
no image
no image
no image
no image
no image
no image
no image
no image