How important is standardized Unicode support?

classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|

How important is standardized Unicode support?

Gareth Smith
Hi,

>From what I can see, Haxe does not treat strings in a standard way
across platforms.

Are there any plans for a standard cross-platform string, perhaps
defined as lists of Unicode code points? Is this important to anyone?

Gareth

--
haXe - an open source web programming language
http://haxe.org
Reply | Threaded
Open this post in threaded view
|

RE: How important is standardized Unicode support?

luca deltodesco
I actually have a class I built to do just that, at least for the flash and neko platforms (not extended it to any other)

http://pastebin.com/iq6xBja6

You'll have to excuse the fact that this is not 'quite' haXe code, it uses my preprocessor to shortern some parts that couldn't otherwise be made quite as short in the utf8 and 16 encode/decode methods, but with a bit of modification could be made into plain haXe. (lines 259,301,329)

It's used like:

using Unicode;

var ustr = "some string with unicode character".fromString(); //convert string to unicode point array
var original = ustr.string(); //and back again

var bstr = ustr.encode(_utf8)  //encode unicode point array into array of utf8 bytes (including BOM)
ustr = bstr.decode(); //decode back to unicode point array using BOM to determine encoding.

bstr = ustr.encode(_utf16(true),false); //encode into utf16 bigendian (without BOM)
ustr = bstr.decode(_utf16(true)); //need to specify which encoding to use, it won't attempt to figure it out itself and defaults to utf8

var point = 'a'.wchar(); //shorthand for 'a'.fromString()[0]. aka convert character 'a' into it's unicode point value.

it would be nice to have lazy methods too, maybe through iterators to do lazy encoding/decoding etc.

> Date: Sun, 7 Nov 2010 16:09:23 +0000

> From: [hidden email]
> To: [hidden email]
> Subject: [haXe] How important is standardized Unicode support?
>
> Hi,
>
> >From what I can see, Haxe does not treat strings in a standard way
> across platforms.
>
> Are there any plans for a standard cross-platform string, perhaps
> defined as lists of Unicode code points? Is this important to anyone?
>
> Gareth
>
> --
> haXe - an open source web programming language
> http://haxe.org

--
haXe - an open source web programming language
http://haxe.org
Reply | Threaded
Open this post in threaded view
|

RE: How important is standardized Unicode support?

Daniel Uranga
Im interested in the modifications to make that class (http://pastebin.com/iq6xBja6) plain Haxe.
Wich is the best way to handle UTF8/UTF16 strings in haxe? Im targeting Flash 10.
Reply | Threaded
Open this post in threaded view
|

Memory leak in haxe.xml.Fast

Antoine Gersant
Hello HaXe people !

While profiling some code of mine, I think I stumbled upon a memory leak caused by the Fast class' method node. Maybe other methods have similar problem, I didn't check them.

I attached a tiny LeakTest class to this email to show everyone how I reached this conclusion. Creating only one instance of this class will cause a huge memory leak of Xml objects never being garbage collected. If you run the swf and profile it with Flash Builder 4, you'll notice the number of Xml instances is sky rocketting with no limit.

I hope this is not a misunderstanding of mines ; please tell me if I'm doing something wrong.

Thanks for reading !


Antoine Gersant

--
haXe - an open source web programming language
http://haxe.org

LeakTest.hx (460 bytes) Download Attachment
Main.hx (183 bytes) Download Attachment
LeakTest.swf (16K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Memory leak in haxe.xml.Fast

Nicolas Cannasse
Le 26/04/2011 23:14, Antoine Gersant a écrit :

> Hello HaXe people !
>
> While profiling some code of mine, I think I stumbled upon a memory leak
> caused by the *Fast *class' method *node*. Maybe other methods have
> similar problem, I didn't check them.
>
> I attached a tiny *LeakTest *class to this email to show everyone how I
> reached this conclusion. Creating only one instance of this class will
> cause a huge memory leak of *Xml *objects never being garbage collected.
> If you run the swf and profile it with Flash Builder 4, you'll notice
> the number of *Xml *instances is sky rocketting with no limit.

There's a leak in Xml.parse on Flash9 that was fixed in haXe 2.07, make
sure that you're using it instead of 2.06

Nicolas

--
haXe - an open source web programming language
http://haxe.org
Reply | Threaded
Open this post in threaded view
|

Re: Memory leak in haxe.xml.Fast

Antoine Gersant
Thanks for your reply.
I am using haXe 2.07 (I double checked that) and I compile with
-swf-version 10 but I'm still leaking on that simple test.

Le 27/04/2011 09:36, Nicolas Cannasse a écrit :

> Le 26/04/2011 23:14, Antoine Gersant a écrit :
>> Hello HaXe people !
>>
>> While profiling some code of mine, I think I stumbled upon a memory leak
>> caused by the *Fast *class' method *node*. Maybe other methods have
>> similar problem, I didn't check them.
>>
>> I attached a tiny *LeakTest *class to this email to show everyone how I
>> reached this conclusion. Creating only one instance of this class will
>> cause a huge memory leak of *Xml *objects never being garbage collected.
>> If you run the swf and profile it with Flash Builder 4, you'll notice
>> the number of *Xml *instances is sky rocketting with no limit.
>
> There's a leak in Xml.parse on Flash9 that was fixed in haXe 2.07,
> make sure that you're using it instead of 2.06
>
> Nicolas
>


--
haXe - an open source web programming language
http://haxe.org
Reply | Threaded
Open this post in threaded view
|

Re: Memory leak in haxe.xml.Fast

Nicolas Cannasse
Le 27/04/2011 20:17, Antoine Gersant a écrit :
> Thanks for your reply.
> I am using haXe 2.07 (I double checked that) and I compile with
> -swf-version 10 but I'm still leaking on that simple test.

Check also that it's not a normal GC behavior. If the memory stabilize
after a long run, then it's perfectly normal.

Nicolas

--
haXe - an open source web programming language
http://haxe.org
Reply | Threaded
Open this post in threaded view
|

Re: Memory leak in haxe.xml.Fast

Antoine Gersant
I'm quite sure it's not normal GC behavior. I let my small test (the one
enclosed with the OP mail) for about 13 minutes and here is what I got.

Screenshots :
http://i.imgur.com/ZEfsy.jpg (this one is the most explicit to me)
http://i.imgur.com/clmMm.jpg (you can see how the garbage collector
cycle leave a few bytes in the wild each time).




Le 27/04/2011 21:39, Nicolas Cannasse a écrit :

> Le 27/04/2011 20:17, Antoine Gersant a écrit :
>> Thanks for your reply.
>> I am using haXe 2.07 (I double checked that) and I compile with
>> -swf-version 10 but I'm still leaking on that simple test.
>
> Check also that it's not a normal GC behavior. If the memory stabilize
> after a long run, then it's perfectly normal.
>
> Nicolas
>


--
haXe - an open source web programming language
http://haxe.org
Reply | Threaded
Open this post in threaded view
|

Re: Memory leak in haxe.xml.Fast

Nicolas Cannasse
Le 27/04/2011 22:23, Antoine Gersant a écrit :
> I'm quite sure it's not normal GC behavior. I let my small test (the one
> enclosed with the OP mail) for about 13 minutes and here is what I got.
>
> Screenshots :
> http://i.imgur.com/ZEfsy.jpg (this one is the most explicit to me)
> http://i.imgur.com/clmMm.jpg (you can see how the garbage collector
> cycle leave a few bytes in the wild each time).

This seems perfectly normal : the GC will only collect Xml instances
created by xml Fast nodes accesses once in a while, thus making this
scissor-like memory graph. A leak would mean an infinite growing memory.

Nicolas

--
haXe - an open source web programming language
http://haxe.org
Reply | Threaded
Open this post in threaded view
|

Re: Memory leak in haxe.xml.Fast

Cauê W.
are there any circular references on Xml ?
I've been studying about the AVM2, and it seems that it works primarily by reference counting. Circular references then stay in memory much more than non-circular references, thus making them to seem to be leaking. 

It sure shows how fucked up the gc on avm2 really is.

2011/4/28 Nicolas Cannasse <[hidden email]>
Le 27/04/2011 22:23, Antoine Gersant a écrit :

I'm quite sure it's not normal GC behavior. I let my small test (the one
enclosed with the OP mail) for about 13 minutes and here is what I got.

Screenshots :
http://i.imgur.com/ZEfsy.jpg (this one is the most explicit to me)
http://i.imgur.com/clmMm.jpg (you can see how the garbage collector
cycle leave a few bytes in the wild each time).

This seems perfectly normal : the GC will only collect Xml instances created by xml Fast nodes accesses once in a while, thus making this scissor-like memory graph. A leak would mean an infinite growing memory.


Nicolas

--
haXe - an open source web programming language
http://haxe.org


--
haXe - an open source web programming language
http://haxe.org
Reply | Threaded
Open this post in threaded view
|

Re: Memory leak in haxe.xml.Fast

Antoine Gersant
I don't know if the references are circular and I don't know how to check this in Flash Builder. What's the most simple way ?
It doesn't change my conclusion that something is wrong though.

Le 29/04/2011 02:55, Cauê Waneck a écrit :
are there any circular references on Xml ?
I've been studying about the AVM2, and it seems that it works primarily by reference counting. Circular references then stay in memory much more than non-circular references, thus making them to seem to be leaking. 

It sure shows how fucked up the gc on avm2 really is.

2011/4/28 Nicolas Cannasse <[hidden email]>
Le 27/04/2011 22:23, Antoine Gersant a écrit :

I'm quite sure it's not normal GC behavior. I let my small test (the one
enclosed with the OP mail) for about 13 minutes and here is what I got.

Screenshots :
http://i.imgur.com/ZEfsy.jpg (this one is the most explicit to me)
http://i.imgur.com/clmMm.jpg (you can see how the garbage collector
cycle leave a few bytes in the wild each time).

This seems perfectly normal : the GC will only collect Xml instances created by xml Fast nodes accesses once in a while, thus making this scissor-like memory graph. A leak would mean an infinite growing memory.


Nicolas

--
haXe - an open source web programming language
http://haxe.org



--
haXe - an open source web programming language
http://haxe.org
Reply | Threaded
Open this post in threaded view
|

Re: Memory leak in haxe.xml.Fast

Dave Raggett
These days with memory more abundant than when garbage collectors were first developed, reference counting works nicely together with a periodic sweep to catch cyclic structures. One approach is to add objects to a sweep list if the ref count is non-zero after decrementing and the compiler can't be sure that the data structure doesn't contain cycles. This keeps the sweep size small as most data structures are trees. Rules of thumb are used to determine when to trigger a sweep (a mark phase followed by a collect phase). I don't know if AVM2 is using this technique though, but wouldn't be surprised.

On 29/04/11 17:43, Antoine Gersant wrote:
I don't know if the references are circular and I don't know how to check this in Flash Builder. What's the most simple way ?
It doesn't change my conclusion that something is wrong though.

Le 29/04/2011 02:55, Cauê Waneck a écrit :
are there any circular references on Xml ?
I've been studying about the AVM2, and it seems that it works primarily by reference counting. Circular references then stay in memory much more than non-circular references, thus making them to seem to be leaking. 

It sure shows how fucked up the gc on avm2 really is.

2011/4/28 Nicolas Cannasse <[hidden email]>
Le 27/04/2011 22:23, Antoine Gersant a écrit :

I'm quite sure it's not normal GC behavior. I let my small test (the one
enclosed with the OP mail) for about 13 minutes and here is what I got.

Screenshots :
http://i.imgur.com/ZEfsy.jpg (this one is the most explicit to me)
http://i.imgur.com/clmMm.jpg (you can see how the garbage collector
cycle leave a few bytes in the wild each time).

This seems perfectly normal : the GC will only collect Xml instances created by xml Fast nodes accesses once in a while, thus making this scissor-like memory graph. A leak would mean an infinite growing memory.


Nicolas

--
haXe - an open source web programming language
http://haxe.org




-- 
 Dave Raggett [hidden email] http://www.w3.org/People/Raggett

--
haXe - an open source web programming language
http://haxe.org
Reply | Threaded
Open this post in threaded view
|

Re: Memory leak in haxe.xml.Fast

Antoine Gersant
Since things seem to have cooled down about this leak I decided to take
another stab at it.
I modified my test class to make it smaller and more useful to track
down the leak. It also has my comments about what is going on. Hopefully
someone will be able to tell me what's wrong with NodeAccess.resolve()

The LeakTest class can now create any given number of leaked XML objects.
I tried leaking just one and analyzing a memory snapshot. Here it is
(with comments) : http://i.imgur.com/72NHX.jpg

Thanks for reading


Antoine Gersant

--
haXe - an open source web programming language
http://haxe.org

LeakTest.hx (1021 bytes) Download Attachment
Main.hx (183 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Memory leak in haxe.xml.Fast

Antoine Gersant
Sorry for spamming about this peculiar issue but I did dig deeper and the leak actually comes from the XML class.
It had already been reported here : http://code.google.com/p/haxe/source/detail?r=3468
The fix Nicolas provided is : http://code.google.com/p/haxe/source/diff?spec=svn3468&r=3468&format=side&path=/trunk/std/flash9/_std/Xml.hx

However, as far as I can understand, there is still no way to clear the "leaked" (ok, let's call them unwanted now) Xml other than untyped myXml._map = null;
Am I correct ?


--
haXe - an open source web programming language
http://haxe.org
Reply | Threaded
Open this post in threaded view
|

Re: Memory leak in haxe.xml.Fast

Nicolas Cannasse
Le 01/05/2011 21:17, Antoine Gersant a écrit :

> Sorry for spamming about this peculiar issue but I did dig deeper and
> the leak actually comes from the XML class.
> It had already been reported here :
> http://code.google.com/p/haxe/source/detail?r=3468
> The fix Nicolas provided is :
> http://code.google.com/p/haxe/source/diff?spec=svn3468&r=3468&format=side&path=/trunk/std/flash9/_std/Xml.hx
>
> However, as far as I can understand, there is still no way to clear the
> "leaked" (ok, let's call them unwanted now) Xml other than *untyped
> myXml._map = null;*
> Am I correct ?

Check if your /std/flash9/_std/Xml.hx uses a 'static var _map'.
If it is, then you need to upgrade to haXe 2.07 which includes the fix.

Nicolas

--
haXe - an open source web programming language
http://haxe.org
Reply | Threaded
Open this post in threaded view
|

Re: Memory leak in haxe.xml.Fast

Antoine Gersant
I am up to date and I don't have the 'static var _map'.
I still wish there was a clearMap() method available on Xml instances
because I need to process XML data (thus filling the map with tons of
XMLs) without deleting it afterward. Am I condemned to use the untyped
trick mentioned earlier ?


Le 02/05/2011 03:56, Nicolas Cannasse a écrit :

> Le 01/05/2011 21:17, Antoine Gersant a écrit :
>> Sorry for spamming about this peculiar issue but I did dig deeper and
>> the leak actually comes from the XML class.
>> It had already been reported here :
>> http://code.google.com/p/haxe/source/detail?r=3468
>> The fix Nicolas provided is :
>> http://code.google.com/p/haxe/source/diff?spec=svn3468&r=3468&format=side&path=/trunk/std/flash9/_std/Xml.hx 
>>
>>
>> However, as far as I can understand, there is still no way to clear the
>> "leaked" (ok, let's call them unwanted now) Xml other than *untyped
>> myXml._map = null;*
>> Am I correct ?
>
> Check if your /std/flash9/_std/Xml.hx uses a 'static var _map'.
> If it is, then you need to upgrade to haXe 2.07 which includes the fix.
>
> Nicolas
>


--
haXe - an open source web programming language
http://haxe.org
Reply | Threaded
Open this post in threaded view
|

Re: Memory leak in haxe.xml.Fast

Nicolas Cannasse
Le 02/05/2011 18:48, Antoine Gersant a écrit :
> I am up to date and I don't have the 'static var _map'.
> I still wish there was a clearMap() method available on Xml instances
> because I need to process XML data (thus filling the map with tons of
> XMLs) without deleting it afterward. Am I condemned to use the untyped
> trick mentioned earlier ?

Two possibilities :

a) check that the map length does not grow : if you're always accessing
the same element, there is no reason that it grows since it should
actually retain a reference to it, in order to allow == comparisons
between the same wrapped elements.

b) the map should maybe be created using a weak Dictionary, try to
modify the way it's created.

Report me any result you have.

Hope that helps,

Nicolas

--
haXe - an open source web programming language
http://haxe.org
Reply | Threaded
Open this post in threaded view
|

Re: Memory leak in haxe.xml.Fast

Antoine Gersant
Thanks for the helpful answer.

I'm accessing many different elements in the XMLs so that's why my maps
keeps growing a lot.
Hopefully the weak key will solve all my problems. Pardon my ignorance,
but do I have to recompile the haxe compiler for changes in
/stdflash9/_std to take effect ?

I'll let you know if it works but I'm quite confident about it =)


Le 03/05/2011 11:33, Nicolas Cannasse a écrit :

> Le 02/05/2011 18:48, Antoine Gersant a écrit :
>> I am up to date and I don't have the 'static var _map'.
>> I still wish there was a clearMap() method available on Xml instances
>> because I need to process XML data (thus filling the map with tons of
>> XMLs) without deleting it afterward. Am I condemned to use the untyped
>> trick mentioned earlier ?
>
> Two possibilities :
>
> a) check that the map length does not grow : if you're always
> accessing the same element, there is no reason that it grows since it
> should actually retain a reference to it, in order to allow ==
> comparisons between the same wrapped elements.
>
> b) the map should maybe be created using a weak Dictionary, try to
> modify the way it's created.
>
> Report me any result you have.
>
> Hope that helps,
>
> Nicolas
>


--
haXe - an open source web programming language
http://haxe.org
Reply | Threaded
Open this post in threaded view
|

Re: Memory leak in haxe.xml.Fast

Nicolas Cannasse
Le 03/05/2011 20:02, Antoine Gersant a écrit :
> Thanks for the helpful answer.
>
> I'm accessing many different elements in the XMLs so that's why my maps
> keeps growing a lot.
> Hopefully the weak key will solve all my problems. Pardon my ignorance,
> but do I have to recompile the haxe compiler for changes in
> /stdflash9/_std to take effect ?

Not at all, simply recompile your project ;)

Nicolas

--
haXe - an open source web programming language
http://haxe.org
Reply | Threaded
Open this post in threaded view
|

Re: Memory leak in haxe.xml.Fast

Antoine Gersant
Looks like it didn't work :<

I modified line #183 in std/flash9/_std to :
map = new flash.utils.Dictionary(true);

Then I recompiled my project and ran precise measurements on its boot phase. The boot phase consists in loading one XML file and performing verifications on it using haxe.xml.Check  (which populates our _map dictionnary quite well).
The verification code doesnt leave any references behind. It's basically a
try {
    Check.checkNode(mySettingsXml, myComplexRule),
catch {
    //do something which doesnt happen in those tests because my file is good
}


With or without weak keys, this leaves 81 XML objects in memory. If I clear the map (using the untyped trick) after the checks, I cut it down back to 1 (which was the expected result).

I have no clue about why this didn't work though.

Antoine Gersant

Le 03/05/2011 21:00, Nicolas Cannasse a écrit :
Le 03/05/2011 20:02, Antoine Gersant a écrit :
Thanks for the helpful answer.

I'm accessing many different elements in the XMLs so that's why my maps
keeps growing a lot.
Hopefully the weak key will solve all my problems. Pardon my ignorance,
but do I have to recompile the haxe compiler for changes in
/stdflash9/_std to take effect ?

Not at all, simply recompile your project ;)

Nicolas



--
haXe - an open source web programming language
http://haxe.org
12