About the performance of a haXe SWC in AS3

classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|

About the performance of a haXe SWC in AS3

Joe Dohn
Hi!

I imported a SWC made in haXe in some AS3 project. I have an equivalent haXe project, both projects' purpose is to test performance.

They measure how much time it takes for a haXe pixel-processing function to complete.
My concern is that it's faster when the function is compiled and ran in the haXe project, where it takes 7ms for two calls in a row. It takes 10-11ms when the function is ran from a SWC in the AS3 project.

The only difference besides that SWC thing is that both function calls are inlined in haXe and not in AS3. But surely 2 function calls can't take 3-4ms.

So where does this loss in speed comes from? To link the SWC I put its path in Flash Develop's "SWC Libraries", located in the "compiler options" tab in project properties.
I also set "Optimize bytecode" to true in the same tab.

Is this loss unavoidable? What should I do to get the best performance out of a haXe SWC in AS3 ?

Thanks!

--
haXe - an open source web programming language
http://haxe.org
Reply | Threaded
Open this post in threaded view
|

Re: About the performance of a haXe SWC in AS3

Tarwin Stroh-Spijer
What are you using to compile the SWF? Flex, Flash etc?

Maybe it's recompiling and "optimising" in a not so optimal way?

On Sunday, May 22, 2011, Joe Dohn <[hidden email]> wrote:

> Hi!
>
> I imported a SWC made in haXe in some AS3 project. I have an equivalent
> haXe project, both projects' purpose is to test performance.
>
> They measure how much time it takes for a haXe pixel-processing function to complete.
> My concern is that it's faster when the function is compiled and ran in
> the haXe project, where it takes 7ms for two calls in a row. It takes
> 10-11ms when the function is ran from a SWC in the AS3 project.
>
> The only difference besides that SWC thing is that both function calls
> are inlined in haXe and not in AS3. But surely 2 function calls can't
> take 3-4ms.
>
> So where does this loss in speed comes from? To link the SWC I put its
> path in Flash Develop's "SWC Libraries", located in the "compiler
> options" tab in project properties.
> I also set "Optimize bytecode" to true in the same tab.
>
> Is this loss unavoidable? What should I do to get the best performance out of a haXe SWC in AS3 ?
>
> Thanks!

--


Tarwin Stroh-Spijer
_______________________

Touch My Pixel
http://www.touchmypixel.com/
phone: +61 3 8060 5321
_______________________

--
haXe - an open source web programming language
http://haxe.org
Reply | Threaded
Open this post in threaded view
|

Re: About the performance of a haXe SWC in AS3

Joe Dohn
I'm using the Flex Compiler Shell. Results are the same with "Optimize bytecode" set to false.

According to Flash Develop's output here are my compile options:
mxmlc -load-config+=obj\FastGraphicsConfig.xml -debug=true -incremental=true -accessible=true -benchmark=false -static-link-runtime-shared-libraries=true -locale fr_FR -o obj\FastGraphics634416844882881161

The project has been created in Flash Builder 4 and imported right away in Flash Develop.

My haXe compile.hxml file contains:
-cp src
-swf9 bin/haXe.swf
-swf-header 800:600:50:FFFFFF
-swf-lib obj\haXeResources.swf
--flash-strict
-swf-version 10
-debug
-main Main

With 4 calls to that pixel-processing function, haXe project takes 14-15ms and AS3 with haXe SWC takes 22-23ms. :/

Also, the SWC has been created using this compile.hxml file:
-cp src
-swf9 bin/FastMemory.swc
-swf-header 800:600:50:FFFFFF
--flash-strict
-swf-version 10
-main FastMemory

--
haXe - an open source web programming language
http://haxe.org
Reply | Threaded
Open this post in threaded view
|

Re: About the performance of a haXe SWC in AS3

Joe Dohn
In reply to this post by Tarwin Stroh-Spijer
I tried exporting my haXe test project itself to a SWC, with all graphics files embedded in it, and called it from the AS3 project with just haxe.init(this).

I got full performance.

So I have no idea what's the culprit. Is it the fact that I instantiate graphics files outside of the SWC? Here's what I do from AS3 (simplified code):


/* TREE and CAR are classes defined using Flex Embed tag, they represent PNG images */
var bmpTree:Bitmap = new TREE();
var bmpCar:Bitmap = new CAR();

/*Just a wrapper class containing x/y/height/width and an offset pointing to the BitmapData in fast memory ByteArray */
var tree:BitmapMemory = FastMemory.createBitmapMemory(bmpTree.bitmapData);
var car:BitmapMemory = FastMemory.createBitmapMemory(bmpCar.bitmapData);

var startTime:Int, endTime:Int;
startTime = getTimer();
FastMemory.doSomePixelProcessing(tree, car);
endTime = getTimer();


Why would doSomePixelProcessing be that much slower if called from AS3? Everything in FastMemory is haXe compiled already.

--
haXe - an open source web programming language
http://haxe.org
Reply | Threaded
Open this post in threaded view
|

Re: About the performance of a haXe SWC in AS3

Nicolas Cannasse
In reply to this post by Joe Dohn
Le 22/05/2011 18:22, Joe Dohn a écrit :
> I'm using the Flex Compiler Shell. Results are the same with "Optimize bytecode" set to false.
>
> According to Flash Develop's output here are my compile options:
> mxmlc -load-config+=obj\FastGraphicsConfig.xml -debug=true -incremental=true -accessible=true -benchmark=false -static-link-runtime-shared-libraries=true -locale fr_FR -o obj\FastGraphics634416844882881161

Try to turn off debug mode, it adds additional opcodes for register
tracking that slow things down.

Nicolas

--
haXe - an open source web programming language
http://haxe.org
Reply | Threaded
Open this post in threaded view
|

Re: About the performance of a haXe SWC in AS3

Joe Dohn
Hi,

Yeah I tried this as well but with no luck. See my other mail to Tarwin Stroh-Spijer ( http://lists.motion-twin.com/pipermail/haxe/2011-May/043991.html ), I'm hoping it describes the issues a little better. :)

--
haXe - an open source web programming language
http://haxe.org
Reply | Threaded
Open this post in threaded view
|

Re: About the performance of a haXe SWC in AS3

Matthew Spencer-2
Have you tried the HaXe version without inlining to see if it gives the same results as the slower AS3/SWC combo?

Do you copy out the data from tree/car into locals at the beginning of FastMemory.doSomePixelProcessing(tree,car), or do you access the data directly from your tree/car class?

You can use http://www.docsultant.com/nemo440/ to inspect the bytecode itself for both the HaXe and AS3 versions.

On Sun, May 22, 2011 at 6:41 PM, Joe Dohn <[hidden email]> wrote:
Hi,

Yeah I tried this as well but with no luck. See my other mail to Tarwin Stroh-Spijer ( http://lists.motion-twin.com/pipermail/haxe/2011-May/043991.html ), I'm hoping it describes the issues a little better. :)

--
haXe - an open source web programming language
http://haxe.org


--
haXe - an open source web programming language
http://haxe.org
Reply | Threaded
Open this post in threaded view
|

Re: About the performance of a haXe SWC in AS3

Joe Dohn
I just tried the compiler option --no-inline out of curiosity and d'oh! inlining gives me a 3x performance boost. Interestingly enough, adding --no-opt changes nothing, so I guess the "only" benefits I'm currently getting from haXe super compiler are inlining and fast mem.

But the SWC is inlined properly even after AS3 project compiled, I used Sothink SWF Decompiler to check that yesterday. Nemo seems to agree with that though it doesn't always want to display all classes (bug I guess).

tree and car (BitmapMemory type, haXe compiled) only contain data, no method, no getter/setter, only public properties. I access them directly and don't copy their properties in temp variables. FastMemory.createBitmapMemory copies the input BitmapData into fast mem once and for all.


I don't know what's wrong. doSomePixelProcessing can't be inlined since it's called from AS3, but all inner function calls are. The dreaded black hole inside the Flash Player must have munched those 8ms into nothingness.

--
haXe - an open source web programming language
http://haxe.org
Reply | Threaded
Open this post in threaded view
|

Re: About the performance of a haXe SWC in AS3

Matthew Spencer-2

I just tried the compiler option --no-inline out of curiosity and d'oh! inlining gives me a 3x performance boost. Interestingly enough, adding --no-opt changes nothing, so I guess the "only" benefits I'm currently getting from haXe super compiler are inlining and fast mem.
Not saying to disabled inlining completely. Only saying to remove the "inline" command from the front of doSomePixelProcessing();
There is no reason to inline that specific call in the first place. All you're doing is bloating the instruction cache, and making larger code.

If you want to have the safest function you can get that will work between HaXe and AS3, either unpack to temporary variables inside the function, or refactor the function to accept primatives (int/float). The latter is the best option, but not as pretty. Primative types not residing on an object directly correlate to a local register.

The dreaded black hole inside the Flash Player must have munched those 8ms into nothingness.
Does not happen. Those 8ms's are going to your code. My bet is that extra work is being done on your car and tree object getproperty lookups when no inlining is happening.






--
haXe - an open source web programming language
http://haxe.org
Reply | Threaded
Open this post in threaded view
|

Re: About the performance of a haXe SWC in AS3

Joe Dohn
I tried what you told me, I unpacked BitmapMemory values inside the doSomePixelProcessing function. It was easier and cleaner to tweak this way than your other suggestion.

It was a good test idea but didn't change a thing. Then I removed inline since, as you noted, inlining big functions that aren't called too often (4 times here) is said to be useless. I first removed inline from the haXe project which has full performance.

To my surprise, it instantly lost its 8ms magic and came on par with my slower AS3 project.
So apparently, 4 inline IS useful. I have no idea how 4 function calls can take that much time, there must be something else in there; my good pal Black Hole Munchman most likely, he's the one to blame when things disappear mysteriously.
The fact is that when I remove inline from doSomePixelProcessing, I lose 8ms. Add it back and Mr. Munchman pukes my precious time, intact.


Unrelated:
It could be worth noting that I get reliable 22-23ms results in the slow version of my test, except sometimes. I'm testing from Firefox 4, and if I kill plugin-container process and clear cache, and reopen the test SWF, I can get either 18-19, 20-21 or 22-23ms. The results are reliable and I have to kill process and clear cache to get from one range to the other. (i.e. from 18-19ms to 22-23)
The fast version of my test is always showing 14-15ms no matter how hard I kill plugin-container. Mysterious...

--
haXe - an open source web programming language
http://haxe.org
Reply | Threaded
Open this post in threaded view
|

Re: About the performance of a haXe SWC in AS3

Matthew Spencer-2
So apparently, 4 inline IS useful. I have no idea how 4 function calls can take that much time, there must be something else in there;
It's not the function calls consuming the time, you're right that there is something else in there. Without actually working on the code, or seeing a printout of the bytecode -> asm conversion, I can't provide anymore help on this issue.

Best of luck.


--
haXe - an open source web programming language
http://haxe.org
Reply | Threaded
Open this post in threaded view
|

Re: About the performance of a haXe SWC in AS3

Cauê W.
the only thing that comes into my mind is that there's a huge performance loss if you're using the memory api with a swf loaded into another ApplicationDomain. I don't know if that applies to your case, most probably not, since you should be binding to the swc at compile time.

2011/5/23 Matthew Spencer <[hidden email]>
So apparently, 4 inline IS useful. I have no idea how 4 function calls can take that much time, there must be something else in there;
It's not the function calls consuming the time, you're right that there is something else in there. Without actually working on the code, or seeing a printout of the bytecode -> asm conversion, I can't provide anymore help on this issue.

Best of luck.


--
haXe - an open source web programming language
http://haxe.org


--
haXe - an open source web programming language
http://haxe.org
Reply | Threaded
Open this post in threaded view
|

Re: About the performance of a haXe SWC in AS3

Matthew Spencer-2

the only thing that comes into my mind is that there's a huge performance loss if you're using the memory api with a swf loaded into another ApplicationDomain. I don't know if that applies to your case, most probably not, since you should be binding to the swc at compile time.
That's a good thought, it does take about 6ms to select.

--
haXe - an open source web programming language
http://haxe.org
Reply | Threaded
Open this post in threaded view
|

Re: About the performance of a haXe SWC in AS3

Joe Dohn
After at least a total time of far too long spent on this, I've finally found the culprit.

It's one of those times where you sit down with a "uh." face, wondering how such a meaningless thing thinks it's permitted to make you deploy that much effort.

So without further suspense, the culprit was.... the timer. D'uh.

The timer in AS3 doesn't give the same numbers as the timer in haXe. I was using respectively flash.utils.getTimer and flash.Lib.getTimer.

That was the ONLY reason for this 8ms loss, and probably the reason why my haXe test gave constant timing while my AS3 test varied after a plugin-container killing rampage.

I'm also guessing haXe timer is thus better than AS3's. However it's also more optimistic, so be aware of this when you time haXe SWC that are destined to AS3 usage.

What an interesting and horrible waste of time that was :D


I wonder what's the inner difference in haXe that would explain this difference.

--
haXe - an open source web programming language
http://haxe.org
Reply | Threaded
Open this post in threaded view
|

Re: About the performance of a haXe SWC in AS3

Tony Polinelli
fun to watch play out tho ;P



On Tue, May 24, 2011 at 9:03 AM, Joe Dohn <[hidden email]> wrote:
After at least a total time of far too long spent on this, I've finally found the culprit.

It's one of those times where you sit down with a "uh." face, wondering how such a meaningless thing thinks it's permitted to make you deploy that much effort.

So without further suspense, the culprit was.... the timer. D'uh.

The timer in AS3 doesn't give the same numbers as the timer in haXe. I was using respectively flash.utils.getTimer and flash.Lib.getTimer.

That was the ONLY reason for this 8ms loss, and probably the reason why my haXe test gave constant timing while my AS3 test varied after a plugin-container killing rampage.

I'm also guessing haXe timer is thus better than AS3's. However it's also more optimistic, so be aware of this when you time haXe SWC that are destined to AS3 usage.

What an interesting and horrible waste of time that was :D


I wonder what's the inner difference in haXe that would explain this difference.

--
haXe - an open source web programming language
http://haxe.org



--
Tony Polinelli
http://touchmypixel.com

--
haXe - an open source web programming language
http://haxe.org
Reply | Threaded
Open this post in threaded view
|

Re: About the performance of a haXe SWC in AS3

Matthew Spencer-2
Glad you found the culprit. Was not expecting that at all.

It was a good test idea but didn't change a thing. Then I removed inline since, as you noted, inlining big functions that aren't called too often (4 times here) is said to be useless. I first removed inline from the haXe project which has full performance.

To my surprise, it instantly lost its 8ms magic and came on par with my slower AS3 project.
But what about this then?
 

--
haXe - an open source web programming language
http://haxe.org
Reply | Threaded
Open this post in threaded view
|

Re: About the performance of a haXe SWC in AS3

Nicolas Cannasse
In reply to this post by Joe Dohn
Le 24/05/2011 01:03, Joe Dohn a écrit :

> After at least a total time of far too long spent on this, I've finally
> found the culprit.
>
> It's one of those times where you sit down with a "uh." face, wondering
> how such a meaningless thing thinks it's permitted to make you deploy
> that much effort.
>
> So without further suspense, the culprit was.... the timer. D'uh.
>
> The timer in AS3 doesn't give the same numbers as the timer in haXe. I
> was using respectively flash.utils.getTimer and flash.Lib.getTimer.

That seems very strange, see the actual implementation of
flash.Lib.getTimer :

        public inline static function getTimer() : Int {
                return untyped __global__["flash.utils.getTimer"]();
        }

This should give you the exact same result / bytecode as calling
flash.utils.getTimer()

Best,
Nicolas

--
haXe - an open source web programming language
http://haxe.org
Reply | Threaded
Open this post in threaded view
|

Re: About the performance of a haXe SWC in AS3

Matthew Spencer-2
I don't think it's the timer implementation, the fundamental difference is that inline command is changing something. Even if the bytecode is the same, the stack size is not, which will cause different machine code to be generated anyways. Not to mention, Flash is funky when it comes to machine code generation. A few times I've had 20%+ differences in execution speed from reordering a few instructions. Then again, I'm usually abusing bytecode in ways it was not meant for.

This should give you the exact same result / bytecode as calling flash.utils.getTimer()
Looks like it is. He got the same results as AS3 after removing the inline call. He also got the higher performance when he called his app as a whole from AS3. These are results are consistent with his HaXe tests.

Joe Dohn: I can get either 18-19, 20-21 or 22-23ms.
Forgot to mention, a single call on a function this "fast" is not anywhere close to a reliable test (In Flash). Call it [n] times and divide your final time by [n]. There are a lot of factors in Flash that will often make tests (especially the first) chug. One such mention is the irritating GC.

--
haXe - an open source web programming language
http://haxe.org
Reply | Threaded
Open this post in threaded view
|

Re: About the performance of a haXe SWC in AS3

Joe Dohn
SO!

After posting my last mail, I noticed that something was amiss and more tests were required. Yeah, like I don't have other things to do and bother everyone with.
But it is crucial that I am able to time this part of my code properly, so there's no choice but to persist.

Indeed as you noted Matthew, if the timer is responsible, what about inline? Removing inlining from the full haXe version did make me lose 8ms. Here is another fact, concerning my AS3 test this time:

var startTime:int, endTime:int;
startTime = getTimer();

// Runs 4 "doSomePixelProcessing" the same way I did previously, except from haXe
var haXeTime:String = haXeSWC.superPixelProcessor(buffer, car, tree, house, idiot);
endTime = getTimer();
trace(haXeTime); // 14-15ms, aka full performance
trace(endTime - startTime); // 23ms


While the timer may not be responsible (though it DOES suck! I've got some really confusing results when trying to check whether it was responsible or not. I decided these confusing things were unrelated to my actual problem, but my confidence in Flash timer is lower than ever), so while the timer may not be responsible, something is occurring.

The results are the same whether superPixelProcessor is inlined or not.
However doSomePixelProcessing is still inlined; even in AS3 since the 4 calls inside superPixelProcessor are haXe compiled. When I remove inline from doSomePixelProcessing though, the haXe timer shows about 24ms and AS3 says 25-26ms.

There's some weird stuff occuring in there. superPixelProcessor's code is as follow:

public function superPixelProcessor(bitmap1:BitmapMemory, bitmap2:BitmapMemory, bitmap3:BitmapMemory, bitmap4:BitmapMemory, bitmap5:BitmapMemory):String
{
var startTime:Int, endTime:Int;
startTime = Lib.getTimer();
doSomePixelProcessing(bitmap1, bitmap2);
doSomePixelProcessing(bitmap1, bitmap3);
doSomePixelProcessing(bitmap1, bitmap4);
doSomePixelProcessing(bitmap1, bitmap5);
endTime = Lib.getTimer();

return "haXe time: " + Std.string(endTime - startTime) + "ms";
}


If you guys want I'll try to setup a simplified yet still relevant test scenario and send the source. But I'm thinking this has more to do with the mix of AS3, haXe and inline than with what I'm doing with fast memory pixel processing.

--
haXe - an open source web programming language
http://haxe.org
Reply | Threaded
Open this post in threaded view
|

Re: About the performance of a haXe SWC in AS3

Matthew Spencer-2
Looks like this is not going to be debuggable at AS3 or bytecode level.

Run both tests with a mm.cfg:
TraceOutputBuffered = 1
AS3Verbose = 1

Compare said assembly.
Look for suspects around:
startTime = Lib.getTimer ();

Is your main function where you call superPixelProcessor pretty clean? Or does it do a bunch of other stuff too?

This is starting to remind me of an issue I had, fixed it by converting all registers except 0(this) to local static properties. Ironically; registers, local static properties and memory calls have the same performance hit (memory calls do have slightly more overhead unless using a value already on the stack though :/).

If you guys want I'll try to setup a simplified yet still relevant test scenario and send the source. But I'm thinking this has more to do with the mix of AS3, haXe and inline than with what I'm doing with fast memory pixel processing.
Dunno how easy it would be to rig a simple example, doesn't seem to be very reproducible behavior. I'd be interested to see one though.
In my opinion, we can just keep this at the "AVM2" level. Both HaXe and AS3 are having the same issue, because in the end it's the same code (assuming not inlined). The very small inaccuracies could be GC, timer inaccuracies, or subtle differences in function calls vs SWC function calls.

var time:Int = Lib.getTimer ();
var dt:Int = time + (10) * 1000;     // (desiredDurationSeconds) * 1000
var t:Int = 10;                             // tests per timer call (Set very high for fast code, low for slower code)
var nt:Int = 0;
while (Lib.getTimer () < dt)
{
      for (i in 0...t)
      {
             //Code to test here
      }
      nt+=t;
}
time = Lib.getTimer () - time;
var timeE:Float = cast (time,Float) / cast (nt,Float);
trace ("Test took: " + timeE + "ms/ea");
 
Use the timing method above, it will give significantly more accurate results.

Lastly, I'm sure you already do this. But make sure you only have a single instance of flash open, and it's what you're testing. You may also want to give the projector players a try both debug and release, to see if the issue still persists.


--
haXe - an open source web programming language
http://haxe.org
12