Matrix3D thread about flash.Memory

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

Matrix3D thread about flash.Memory

Orlando Leite
Hello,

I'm studing haXe after know about the Fast Memory. I'm trying to good a best performance, then I'm changing the Floats from a Matrix3D class to use this kind of Float.

I create a 'position generator', and all Floats was changed to be Int (reference in the memory vector). Then, when I want get, getFloat( myVar ), and I want set, setFloat( myVar, value ).

Just like should be, I think. But it's not working.

I attach the original code, that works 'Matrix3D', and the (possible) new code '_Matrix3D', not working at all.


I want to know where I miss and if I'll get a better performance for this class.



Thanks!


--
-----------------------------------------------------------------------
Orlando Leite
Flash Developer

--
haXe - an open source web programming language
http://haxe.org

_Matrix3D.hx (52K) Download Attachment
FastFloat.hx (302 bytes) Download Attachment
Matrix3D.hx (32K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Matrix3D thread about flash.Memory

Iain Surgey
FastFloat.get() I presume returns a "pointer" to a new float in your memory area. Remember floats are 4 bytes so you would need to increment by 4 instead of 1 in the get() method or several floats will be reading and writing to the same place.

2009/5/13 Orlando Leite <[hidden email]>
Hello,

I'm studing haXe after know about the Fast Memory. I'm trying to good a best performance, then I'm changing the Floats from a Matrix3D class to use this kind of Float.

I create a 'position generator', and all Floats was changed to be Int (reference in the memory vector). Then, when I want get, getFloat( myVar ), and I want set, setFloat( myVar, value ).

Just like should be, I think. But it's not working.

I attach the original code, that works 'Matrix3D', and the (possible) new code '_Matrix3D', not working at all.


I want to know where I miss and if I'll get a better performance for this class.



Thanks!


--
-----------------------------------------------------------------------
Orlando Leite
Flash Developer

--
haXe - an open source web programming language
http://haxe.org


--
haXe - an open source web programming language
http://haxe.org
Reply | Threaded
Open this post in threaded view
|

Re: Matrix3D thread about flash.Memory

Nicolas Cannasse
In reply to this post by Orlando Leite
Orlando Leite a écrit :
> Hello,
>
> I'm studing haXe after know about the Fast Memory. I'm trying to good a
> best performance, then I'm changing the Floats from a Matrix3D class to
> use this kind of Float.
>
> I create a 'position generator', and all Floats was changed to be Int
> (reference in the memory vector). Then, when I want get, getFloat( myVar
> ), and I want set, setFloat( myVar, value ).

First, it's better to use doubles than floats.
Then, your indexes need to be multiples of 8 (since a double takes 8
bytes in memory).

Finally, using flash.Memory is faster than using arrays, but should be
slower than using class fields.

Nicolas

--
haXe - an open source web programming language
http://haxe.org
Reply | Threaded
Open this post in threaded view
|

Re: Matrix3D thread about flash.Memory

Orlando Leite
Thanks for answers, really!

But...

I change the pointer increment to 4. Didn't work. Is like this memory vector have a size limit.

Anyway, Nicolas, if a class field is better then it, then I should resign. But, one more question, is there a way to put it in the low level memory and make the process all there, and after take result? I think it could get a better performance, but I don't know much, about this level programming.


Thanks!

2009/5/13 Nicolas Cannasse <[hidden email]>
Orlando Leite a écrit :

Hello,

I'm studing haXe after know about the Fast Memory. I'm trying to good a best performance, then I'm changing the Floats from a Matrix3D class to use this kind of Float.

I create a 'position generator', and all Floats was changed to be Int (reference in the memory vector). Then, when I want get, getFloat( myVar ), and I want set, setFloat( myVar, value ).

First, it's better to use doubles than floats.
Then, your indexes need to be multiples of 8 (since a double takes 8 bytes in memory).

Finally, using flash.Memory is faster than using arrays, but should be slower than using class fields.

Nicolas

--
haXe - an open source web programming language
http://haxe.org



--
-----------------------------------------------------------------------
Orlando Leite
Flash Developer

--
haXe - an open source web programming language
http://haxe.org
Reply | Threaded
Open this post in threaded view
|

Re: Matrix3D thread about flash.Memory

Iain Surgey
In reply to this post by Nicolas Cannasse

First, it's better to use doubles than floats.
Then, your indexes need to be multiples of 8 (since a double takes 8 bytes in memory).


I was wondering about this actually.. I presume there is a performance benefit to using doubles due to flash natively using double precision? My app is fairly large and uses get/setFloat for a lot of things (luckily most of it is done though API access), but does the performance hit from converting to/from double precision on every read/write make it worthwhile changing everything?

Iain.

--
haXe - an open source web programming language
http://haxe.org
Reply | Threaded
Open this post in threaded view
|

Re: Matrix3D thread about flash.Memory

Nicolas Cannasse
Iain Surgey a écrit :

>
>     First, it's better to use doubles than floats.
>     Then, your indexes need to be multiples of 8 (since a double takes 8
>     bytes in memory).
>
>
> I was wondering about this actually.. I presume there is a performance
> benefit to using doubles due to flash natively using double precision?
> My app is fairly large and uses get/setFloat for a lot of things
> (luckily most of it is done though API access), but does the performance
> hit from converting to/from double precision on every read/write make it
> worthwhile changing everything?

It always depend on your app, but I guess yes, minimizing the
float/double conversions should be faster, especially since your app is
more likely to be limited by cpu than by memory bandwidth.

Nicolas

--
haXe - an open source web programming language
http://haxe.org
Reply | Threaded
Open this post in threaded view
|

Re: Matrix3D thread about flash.Memory

Iain Surgey
In reply to this post by Orlando Leite
You need to create a ByteArray instance, set its length field to greater than 1024, then use Memory.select() to load it into a special place for the API. I found out the hard way that switching between two different bytearrays is expensive, so don't do that.

The Memory api is simply for loading values from an area of memory and setting them. It can speed things up a great deal, but don't use it right across the board for everything.

2009/5/13 Orlando Leite <[hidden email]>
Thanks for answers, really!

But...

I change the pointer increment to 4. Didn't work. Is like this memory vector have a size limit.

Anyway, Nicolas, if a class field is better then it, then I should resign. But, one more question, is there a way to put it in the low level memory and make the process all there, and after take result? I think it could get a better performance, but I don't know much, about this level programming.


Thanks!

2009/5/13 Nicolas Cannasse <[hidden email]>

Orlando Leite a écrit :

Hello,

I'm studing haXe after know about the Fast Memory. I'm trying to good a best performance, then I'm changing the Floats from a Matrix3D class to use this kind of Float.

I create a 'position generator', and all Floats was changed to be Int (reference in the memory vector). Then, when I want get, getFloat( myVar ), and I want set, setFloat( myVar, value ).

First, it's better to use doubles than floats.
Then, your indexes need to be multiples of 8 (since a double takes 8 bytes in memory).

Finally, using flash.Memory is faster than using arrays, but should be slower than using class fields.

Nicolas

--
haXe - an open source web programming language
http://haxe.org



--
-----------------------------------------------------------------------
Orlando Leite
Flash Developer

--
haXe - an open source web programming language
http://haxe.org


--
haXe - an open source web programming language
http://haxe.org
Reply | Threaded
Open this post in threaded view
|

Re: Matrix3D thread about flash.Memory

Orlando Leite
Now works.

Didn't have a good performance that I had figure.

If there is a way to make the process without get and sets all the time, I think could get a better performance.

Thanks all, I'm thinking a way to improve this performance. If I got something I say you.



2009/5/13 Iain Surgey <[hidden email]>
You need to create a ByteArray instance, set its length field to greater than 1024, then use Memory.select() to load it into a special place for the API. I found out the hard way that switching between two different bytearrays is expensive, so don't do that.

The Memory api is simply for loading values from an area of memory and setting them. It can speed things up a great deal, but don't use it right across the board for everything.

2009/5/13 Orlando Leite <[hidden email]>
Thanks for answers, really!


But...

I change the pointer increment to 4. Didn't work. Is like this memory vector have a size limit.

Anyway, Nicolas, if a class field is better then it, then I should resign. But, one more question, is there a way to put it in the low level memory and make the process all there, and after take result? I think it could get a better performance, but I don't know much, about this level programming.


Thanks!

2009/5/13 Nicolas Cannasse <[hidden email]>

Orlando Leite a écrit :

Hello,

I'm studing haXe after know about the Fast Memory. I'm trying to good a best performance, then I'm changing the Floats from a Matrix3D class to use this kind of Float.

I create a 'position generator', and all Floats was changed to be Int (reference in the memory vector). Then, when I want get, getFloat( myVar ), and I want set, setFloat( myVar, value ).

First, it's better to use doubles than floats.
Then, your indexes need to be multiples of 8 (since a double takes 8 bytes in memory).

Finally, using flash.Memory is faster than using arrays, but should be slower than using class fields.

Nicolas

--
haXe - an open source web programming language
http://haxe.org



--
-----------------------------------------------------------------------
Orlando Leite
Flash Developer

--

haXe - an open source web programming language
http://haxe.org


--
haXe - an open source web programming language
http://haxe.org



--
-----------------------------------------------------------------------
Orlando Leite
Flash Developer

--
haXe - an open source web programming language
http://haxe.org
Reply | Threaded
Open this post in threaded view
|

Re: Matrix3D thread about flash.Memory

Chris Hecker
In reply to this post by Nicolas Cannasse

I'd time it with both.  x87 can convert on load and save for free most
of the time (assuming the flash vm isn't totally busted like java), and
memory usage is a bigger deal than cpu often these days.

Post your results!

Chris



Nicolas Cannasse wrote:

> Orlando Leite a écrit :
>> Hello,
>>
>> I'm studing haXe after know about the Fast Memory. I'm trying to good
>> a best performance, then I'm changing the Floats from a Matrix3D class
>> to use this kind of Float.
>>
>> I create a 'position generator', and all Floats was changed to be Int
>> (reference in the memory vector). Then, when I want get, getFloat(
>> myVar ), and I want set, setFloat( myVar, value ).
>
> First, it's better to use doubles than floats.
> Then, your indexes need to be multiples of 8 (since a double takes 8
> bytes in memory).
>
> Finally, using flash.Memory is faster than using arrays, but should be
> slower than using class fields.
>
> Nicolas
>

--
haXe - an open source web programming language
http://haxe.org
Reply | Threaded
Open this post in threaded view
|

Re: Matrix3D thread about flash.Memory

Michael Baczynski-2
In reply to this post by Orlando Leite
Hi Orlando,

have you looked at the mem package in hx3ds ? you find some wrapper
classes that simulate arrays using the Memory API. e.g.:

MemoryManager.allocate(1024); //fetch 1024kb
var doubleArray:DoubleMemory = MemoryManager.getDoubleMemory(100);
//create array capable of storing 100 doubles

now reading/writing ist done with:
doubleArray.get(index) or doubleArray.set(index, value);

I did tons of tests with the Memory class, and like Nicolas said it's
slower than field access so building a Matrix with the Memory class
makes no sense here. But it strongly depends on the cpu type - on a
fancy i7 the Memory API access is only 10% slower than a field access -
don' ask why :-)

from my experience the best use for the Memory class is to use it as a
temporary buffer for subsequent reading/writing. a perfect example is
the FP10 drawing API, where you accumulate commands in a Vector instead
of repeatedly calling graphics.lineTo(..), moveTo().. etc. now replace
the Vector with a Memory, push data into it and at the end write the
data back to the Vector. this seems to be more work, but the speed
difference is HUGE.
Same technique can be applied to BitmapData.setVector() - I've ported
Joa's particle animation
(http://blog.joa-ebert.com/2009/04/03/massive-amounts-of-3d-particles-without-alchemy-and-pixelbender/)
to HaXe
and it runs 40% faster with the Memory class, resulting in a 8fps
increase allowing me to simulate 1 million particles at 26fps (again
depends on the cpu).

best,
michael

Orlando Leite wrote:

> Now works.
>
> Didn't have a good performance that I had figure.
>
> If there is a way to make the process without get and sets all the
> time, I think could get a better performance.
>
> Thanks all, I'm thinking a way to improve this performance. If I got
> something I say you.
>
>

--
haXe - an open source web programming language
http://haxe.org
Reply | Threaded
Open this post in threaded view
|

Re: Matrix3D thread about flash.Memory

Iain Surgey


I did tons of tests with the Memory class, and like Nicolas said it's slower than field access so building a Matrix with the Memory class makes no sense here. But it strongly depends on the cpu type - on a fancy i7 the Memory API access is only 10% slower than a field access - don' ask why :-)


Hi Michael,

May I ask what you are using to benchmark in flash? I find the flash Timer class to be horribly inaccurate, especially when the rest of the player is under heavy load it tends to delay firing. Do you just do it a bunch of times and take the mean average for semi-accurate results?

Just now i've been experimenting with using flash.net.LocalConnection to establish a kinda pseudo-threaded timer, running it in another swf but its throwing status errors.

Also, if you've already done some tests on the Memory API, perhaps you would know if theres any significant performance difference between Double and Float. I wrote everything with Float and now i'm wondering if its worth the hassle of changing to Double for performance (it's not so trivial as I have many hand calculated offsets which would break).

Thanks,
Iain.

--
haXe - an open source web programming language
http://haxe.org
Reply | Threaded
Open this post in threaded view
|

Re: Matrix3D thread about flash.Memory

Robert Sköld
In reply to this post by Michael Baczynski-2
Sounds real cool, do you have an example of that online?

On May 14, 2009, at 01:36, Michael Baczynski wrote:

> Hi Orlando,
>
> have you looked at the mem package in hx3ds ? you find some wrapper  
> classes that simulate arrays using the Memory API. e.g.:
>
> MemoryManager.allocate(1024); //fetch 1024kb
> var doubleArray:DoubleMemory = MemoryManager.getDoubleMemory(100); //
> create array capable of storing 100 doubles
>
> now reading/writing ist done with:
> doubleArray.get(index) or doubleArray.set(index, value);
>
> I did tons of tests with the Memory class, and like Nicolas said  
> it's slower than field access so building a Matrix with the Memory  
> class makes no sense here. But it strongly depends on the cpu type -  
> on a fancy i7 the Memory API access is only 10% slower than a field  
> access - don' ask why :-)
>
> from my experience the best use for the Memory class is to use it as  
> a temporary buffer for subsequent reading/writing. a perfect example  
> is the FP10 drawing API, where you accumulate commands in a Vector  
> instead of repeatedly calling graphics.lineTo(..), moveTo().. etc.  
> now replace the Vector with a Memory, push data into it and at the  
> end write the data back to the Vector. this seems to be more work,  
> but the speed difference is HUGE.
> Same technique can be applied to BitmapData.setVector() - I've  
> ported Joa's particle animation (http://blog.joa-ebert.com/2009/04/03/massive-amounts-of-3d-particles-without-alchemy-and-pixelbender/ 
> ) to HaXe
> and it runs 40% faster with the Memory class, resulting in a 8fps  
> increase allowing me to simulate 1 million particles at 26fps (again  
> depends on the cpu).
>
> best,
> michael
>
> Orlando Leite wrote:
>> Now works.
>>
>> Didn't have a good performance that I had figure.
>>
>> If there is a way to make the process without get and sets all the  
>> time, I think could get a better performance.
>>
>> Thanks all, I'm thinking a way to improve this performance. If I  
>> got something I say you.
>>
>>
>
> --
> haXe - an open source web programming language
> http://haxe.org


--
haXe - an open source web programming language
http://haxe.org
Reply | Threaded
Open this post in threaded view
|

Re: Matrix3D thread about flash.Memory

Tony Polinelli
http://lab.polygonal.de/2009/03/14/a-little-alchemy-in-hx3ds/

Tony Polinelli
http://www.touchmypixel.com


2009/5/14 Robert Sköld <[hidden email]>
Sounds real cool, do you have an example of that online?


On May 14, 2009, at 01:36, Michael Baczynski wrote:

Hi Orlando,

have you looked at the mem package in hx3ds ? you find some wrapper classes that simulate arrays using the Memory API. e.g.:

MemoryManager.allocate(1024); //fetch 1024kb
var doubleArray:DoubleMemory = MemoryManager.getDoubleMemory(100); //create array capable of storing 100 doubles

now reading/writing ist done with:
doubleArray.get(index) or doubleArray.set(index, value);

I did tons of tests with the Memory class, and like Nicolas said it's slower than field access so building a Matrix with the Memory class makes no sense here. But it strongly depends on the cpu type - on a fancy i7 the Memory API access is only 10% slower than a field access - don' ask why :-)

from my experience the best use for the Memory class is to use it as a temporary buffer for subsequent reading/writing. a perfect example is the FP10 drawing API, where you accumulate commands in a Vector instead of repeatedly calling graphics.lineTo(..), moveTo().. etc. now replace the Vector with a Memory, push data into it and at the end write the data back to the Vector. this seems to be more work, but the speed difference is HUGE.
Same technique can be applied to BitmapData.setVector() - I've ported Joa's particle animation (http://blog.joa-ebert.com/2009/04/03/massive-amounts-of-3d-particles-without-alchemy-and-pixelbender/) to HaXe
and it runs 40% faster with the Memory class, resulting in a 8fps increase allowing me to simulate 1 million particles at 26fps (again depends on the cpu).

best,
michael

Orlando Leite wrote:
Now works.

Didn't have a good performance that I had figure.

If there is a way to make the process without get and sets all the time, I think could get a better performance.

Thanks all, I'm thinking a way to improve this performance. If I got something I say you.



--
haXe - an open source web programming language
http://haxe.org


--
haXe - an open source web programming language
http://haxe.org


--
haXe - an open source web programming language
http://haxe.org
Reply | Threaded
Open this post in threaded view
|

Re: Matrix3D thread about flash.Memory

Nicolas Cannasse
In reply to this post by Michael Baczynski-2
Michael Baczynski a écrit :
> Same technique can be applied to BitmapData.setVector() - I've ported
> Joa's particle animation
> to HaXe
> and it runs 40% faster with the Memory class, resulting in a 8fps
> increase allowing me to simulate 1 million particles at 26fps (again
> depends on the cpu).

I would like you to blog about this and show the results :)

Best,
Nicolas

--
haXe - an open source web programming language
http://haxe.org
Reply | Threaded
Open this post in threaded view
|

Re: Matrix3D thread about flash.Memory

Armén
In reply to this post by Iain Surgey
> On Thu, May 14, 2009 at 02:41, Iain Surgey <[hidden email]> wrote:
> ...
> May I ask what you are using to benchmark in flash? I find the flash Timer
> class to be horribly inaccurate, especially when the rest of the player is
> under heavy load it tends to delay firing. Do you just do it a bunch of
> times and take the mean average for semi-accurate results?
> ...

I, for one, would never think of using the Timer class, just the
notion of it gives me chills.

What I do, and what works pretty damn well for most of my simple
profiling needs, is to do something like this:

var t1 = flash.Lib.getTimer(); //retrieves SWF (Player?) running time, ms

do_heavy_computation(); //preferrably not long enough to bring up the
"Abort script" dialog...

trace("Computation took " + (flash.lib.getTimer() - t1));

No need for asynchronious Timer firing or anything. Pretty dependable,
as far as I know.

--
haXe - an open source web programming language
http://haxe.org
Reply | Threaded
Open this post in threaded view
|

Re: Matrix3D thread about flash.Memory

Iain Surgey
In reply to this post by Iain Surgey


2009/5/14 Michael Baczynski <[hidden email]>
you need to crank up the number of iterations. I'm usually running the code inside an inner loop with many iterations and an outer loop to take the minimum value, which produces more predictable results that the average:

var min:int = (1 << 31); //initialize with high value
for (i in 0...100) //take minimum bound over 100 samples
{
  var t:int = flash.Lib.getTimer();
  for (r in 0...1000000)
  {
      //do stuff
  }
  min = untyped Math.min(min, flash.Lib.getTimer() - t);
}
trace("result: " + min + "ms.");

I'm still not convinced by flash's ability to accurately time anything when under heavy load (since it's single thread), and surely getTimer() uses the same basic system as the Timer class.. but your method of doing many iterations and taking the minimum sample seems to give a decent enough result.

yes there is a signification difference. I just ran some tests and multiplying two doubles is twice as fast as multiplying two floats.
--
http://www.polygonal.de

Multiplication wasn't the concern.. anyways I thought once something had been pulled from the Memory API it was all dealing with doubles anyways. My concern was the conversion that would take place when loading single-precision from memory into flashs double-precision floats.

It turns out my PC is very strangely faster at pulling singles, converting them and writing them back than dealing with doubles directly. Perhaps this is because less memory is being accessed in each read, though I really have no idea. Yet my friends PC running the same swf gives the opposite (expected) result of doubles being faster. I'm just gonna leave things as they are for the time being, but thanks for your benchmarking tips.



--
haXe - an open source web programming language
http://haxe.org
Reply | Threaded
Open this post in threaded view
|

Re: Matrix3D thread about flash.Memory

jlm@justinfront.net
Flash timers are more reliable when flash player has stabilized, so make sure you wait before running timing code as initialising of the player is always very different in the time it takes and will alter your results.  Also you can use system time with the timer to correct for lost time if you are using timer over a long period.

On 15 May 2009, at 01:35, Iain Surgey wrote:



2009/5/14 Michael Baczynski <[hidden email]>
you need to crank up the number of iterations. I'm usually running the code inside an inner loop with many iterations and an outer loop to take the minimum value, which produces more predictable results that the average:

var min:int = (1 << 31); //initialize with high value
for (i in 0...100) //take minimum bound over 100 samples
{
  var t:int = flash.Lib.getTimer();
  for (r in 0...1000000)
  {
      //do stuff
  }
  min = untyped Math.min(min, flash.Lib.getTimer() - t);
}
trace("result: " + min + "ms.");

I'm still not convinced by flash's ability to accurately time anything when under heavy load (since it's single thread), and surely getTimer() uses the same basic system as the Timer class.. but your method of doing many iterations and taking the minimum sample seems to give a decent enough result.

yes there is a signification difference. I just ran some tests and multiplying two doubles is twice as fast as multiplying two floats.
--
http://www.polygonal.de

Multiplication wasn't the concern.. anyways I thought once something had been pulled from the Memory API it was all dealing with doubles anyways. My concern was the conversion that would take place when loading single-precision from memory into flashs double-precision floats.

It turns out my PC is very strangely faster at pulling singles, converting them and writing them back than dealing with doubles directly. Perhaps this is because less memory is being accessed in each read, though I really have no idea. Yet my friends PC running the same swf gives the opposite (expected) result of doubles being faster. I'm just gonna leave things as they are for the time being, but thanks for your benchmarking tips.


--
haXe - an open source web programming language
http://haxe.org


--
haXe - an open source web programming language
http://haxe.org