Bug 13153 - Both web interface and squeezebox "hang" when one browses folders containing non-ascii chars
: Both web interface and squeezebox "hang" when one browses folders containing ...
Status: CLOSED FIXED
Product: Logitech Media Server
Classification: Unclassified
Component: Localization
: 7.4.0
: PC Debian Linux
: P2 major with 1 vote (vote)
: 7.6.0
Assigned To: Alan Young
: charset_issues
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-07-30 18:53 UTC by Nick Orlov
Modified: 2011-05-06 14:14 UTC (History)
4 users (show)

See Also:
Category: Bug


Attachments
screenshot - see how location is being displayed (115.70 KB, image/jpeg)
2009-07-30 19:08 UTC, Nick Orlov
Details
corresponding frame source (gzipped) (4.05 KB, application/x-tar)
2009-07-30 19:08 UTC, Nick Orlov
Details
error message in server.log (267 bytes, application/octet-stream)
2009-08-02 14:00 UTC, Nick Orlov
Details
mp3 file with russian filename (43.72 KB, audio/mpeg)
2009-08-05 16:04 UTC, Chris Owens
Details
empty directory with koi8 chars created on my box (10.00 KB, application/x-tar)
2009-08-08 10:54 UTC, Nick Orlov
Details
better patch for a hang (344 bytes, application/x-tar)
2009-09-21 19:09 UTC, Nick Orlov
Details
this chunks fixes garbage in 'location' field for me (310 bytes, application/x-tar)
2009-09-21 19:12 UTC, Nick Orlov
Details
Oops, original patch had been inverted (337 bytes, application/x-tar)
2009-09-22 17:21 UTC, Nick Orlov
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Nick Orlov 2009-07-30 18:53:09 UTC
This is happening for at least 3 months (may be 6). "Hang" quoted because one can resurrect interface by clicking to "home" (for example) in case of a web interface, or "now playing" on a remote in case of squeezebox. Quite annoying bug....

I'm using Debian/Sid + squeezesever/7.4, updated on a daily basis.

Any extra info available by request, and I'm more than willing to test any patches.
Comment 1 Nick Orlov 2009-07-30 19:01:38 UTC
P.S. Probably this is important

xxx@yyyy:~$ locale     
LANG=ru_RU.KOI8-R
LC_CTYPE="ru_RU.KOI8-R"
LC_NUMERIC="ru_RU.KOI8-R"
LC_TIME="ru_RU.KOI8-R"
LC_COLLATE="ru_RU.KOI8-R"
LC_MONETARY="ru_RU.KOI8-R"
LC_MESSAGES=C
LC_PAPER="ru_RU.KOI8-R"
LC_NAME="ru_RU.KOI8-R"
LC_ADDRESS="ru_RU.KOI8-R"
LC_TELEPHONE="ru_RU.KOI8-R"
LC_MEASUREMENT="ru_RU.KOI8-R"
LC_IDENTIFICATION="ru_RU.KOI8-R"
LC_ALL=

=====

Version: 7.4 - r27933 @ Thu Jul 30 04:00:06 PDT 2009
Hostname: yyyy
Server IP Address: 1.2.3.4
Server HTTP Port Number: 9000
Operating system: Debian - EN - koi8-r
Platform Architecture: i686-linux
Perl Version: 5.10.0 - i486-linux-gnu-thread-multi
Comment 2 Nick Orlov 2009-07-30 19:08:10 UTC
Created attachment 5544 [details]
screenshot - see how location is being displayed
Comment 3 Nick Orlov 2009-07-30 19:08:41 UTC
Created attachment 5545 [details]
corresponding frame source (gzipped)
Comment 4 Nick Orlov 2009-07-30 19:09:32 UTC
Also, displayed location is totally screwed (probably related), see attachments
Comment 5 James Richardson 2009-08-01 11:19:57 UTC
Michael: any idea what's happening here?
Comment 6 Michael Herger 2009-08-01 23:54:42 UTC
QA - can you reproduce? What about server.log (no frame debug needed)?
Comment 7 Nick Orlov 2009-08-02 14:00:08 UTC
Created attachment 5554 [details]
error message in server.log

This is the only message that appears in a server.log when one tries to access such folder from a browser. Please let me know if you need traces with log level increased.
Comment 8 Chris Owens 2009-08-05 15:24:51 UTC
I can't reproduce it because the scanner crashes.  And I can't zip up the example file I made because winzip crashes!  I'll look and see if there's already a bug for the scanner crash.
Comment 9 Chris Owens 2009-08-05 15:46:07 UTC
Hmm it now seems to be working okay after some messing about.  I'll upload a sample file for reference.
Comment 10 Chris Owens 2009-08-05 16:04:30 UTC
Created attachment 5573 [details]
mp3 file with russian filename

To make a directory name I just copied and pasted the cyrillic character string from the filename.  I don't speak Russian so I hope this random Russian string I found on the web isn't anything offensive!
Comment 11 Nick Orlov 2009-08-06 08:43:25 UTC
Chris, it says "letter to the mother", I do not think you should worry about anything :)

If all you need is a dir/file samples I've got a tons of them....
Comment 12 Chris Owens 2009-08-07 09:47:15 UTC
Nick, since I can't reproduce the bug, could you verify that the sample I created also demonstrates the bug for you?  If it does, then I need to figure out what's different between your system and mine.  If it doesn't, then we need to look at what's different between your files and mine!
Comment 13 Nick Orlov 2009-08-08 10:53:36 UTC
Chris,

Yes, I can reproduce it with your sample with 100% hit rate.
Just in case I'm using:

Version: 7.4 - r27977 @ Sat Aug 1 04:00:31 PDT 2009
Hostname: xxxx
Server IP Address: yyy.yyy.yyy.yyy
Server HTTP Port Number: 9000
Operating system: Debian - EN - koi8-r
Platform Architecture: i686-linux
Perl Version: 5.10.0 - i486-linux-gnu-thread-multi
MySQL Version: 5.0.84-1
Total Players Recognized: 2

What could be different - are you using UTF or KOI8 system-wide locale?

Also, I have the following in /etc/default/squeezecenter:

# locale settings
if [ -r /etc/default/locale ]; then
        . /etc/default/locale
        export LANG
fi

More: I can reproduce this even by creating empty directory with non-ascii chars in a name. After that content of a 'parent' directory cannot be displayed. I'll upload tar file with such dir.

P.S. I'm not sure if it is clear from the short bug description, but it is a "Home > Music Folder" view that is affected.
Comment 14 Nick Orlov 2009-08-08 10:54:52 UTC
Created attachment 5588 [details]
empty directory with koi8 chars created on my box
Comment 15 Nick Orlov 2009-08-08 10:58:42 UTC
I guess this is totally irrelevant, but I'm not using any "fancy" mount options for my data pertition:

/dev/sda2 on /data type reiserfs (ro,nosuid,nodev,noatime)
Comment 16 Chris Owens 2009-08-10 11:22:08 UTC
Hm I should try KOI8
Comment 17 Chris Owens 2009-08-19 18:13:54 UTC
Ross I had been following up on this but I've gotten busy again.  You have a good VMware setup with some linux images.  Do you have a russian language image running this KOI8 encoding?  Could you set one up?  Thanks!
Comment 18 Ross Levine 2009-08-28 17:05:30 UTC
So far unable to reproduce when switching FC10 to Russian, will try Ubuntu next week.
Comment 19 Chris Owens 2009-08-31 15:18:29 UTC
Ross have you found anything out about this KOI8 encoding?  Is it something that needs to be configured at install time?
Comment 20 Nick Orlov 2009-08-31 17:19:22 UTC
> Ross have you found anything out about this KOI8 encoding?  Is it something
that needs to be configured at install time?

Not really. Can be enabled any time. Check 'dpkg-reconfigure locales'

P.S. Since linux fs treat filenames as a raw byte sequences (as opposed to Mac for example where all filenames are being stored in unicode) you would want to make sure locale is set before any 'localized' files are created though.

P.P.S. KOI8-R historically was a 'standard' Russian encoding used on Linux, I think nowadays unicode is a default out of the box (in fresh installs), that's probably why you never run into this.

P.P.P.S. Would it help if I will provide 'guest' access to my box?
Comment 21 Ross Levine 2009-09-08 17:32:26 UTC
Lowering severity due to reduced likelihood of KOI8 systems as explained in comment #20. I'm not able to reproduce this with Ubuntu and Fedora switching to Russian.
Comment 22 Nick Orlov 2009-09-10 20:04:23 UTC
Ok, I've lost the hope to have it fixed in a reasonable time and decided to dive into code by myself.

So here is what's happening:

Slim::Music::Info::fileName() at the end does

  return Slim::Utils::Unicode::utf8decode_locale($j);

which calls Encode::decode() and converts fileName from koi8 to internal perl string form

at the same time

Slim::Music::Info::sortFilename()

does

   Slim::Utils::Unicode::utf8encode_locale( 
      fileName($_)
   )

which seems to be reasonable, but utf8encode_locale ends up calling utf8on() which tries to convert from UTF-8 to internal perl representation _again_, and that's exactly where it fails.

For now I've just commented out utf8on call in utf8encode_locale, which fixed the issue for me.
Comment 23 Nick Orlov 2009-09-10 20:32:44 UTC
Apparently the fix is incomplete: when scrolling through directories containing non-ascii chars using IR remote on SB3 nothing is being displayed (SB3 display is completely black)... But at least it does not hang...
Comment 24 Nick Orlov 2009-09-21 19:09:11 UTC
Created attachment 5883 [details]
better patch for a hang

After playing with it for a while I came up with the following patch that for one thing solves the 'hang' problem for me completely, and for another looks like a safe hack (hack because one should not try to encode into UTF-8 twice). With this patch in place I have no problem browsing folders neither using WEB UI nor SB3 itself. Please note that I've changed UTF-8 -> utf8, which seems to be reasonable (utf8 is less restrictive - why to fail for no good reason?)
Comment 25 Nick Orlov 2009-09-21 19:12:11 UTC
Created attachment 5884 [details]
this chunks fixes garbage in 'location' field for me

I'm not so sure about this one, it definitely works for me, but probably there was a reason to use utf8decode_guess at the first place. Probably one should look at autodetection of KOI8-R encoding instead ....
Comment 26 Nick Orlov 2009-09-21 19:14:45 UTC
I've just posted 2 patches that certainly changes things for the better for me. Can someone knowledgeable review them? Would be nice if they would be committed so I do not have to apply them every time I'm upgrading squeezeboxserver ....
Comment 27 Nick Orlov 2009-09-21 19:17:03 UTC
P.S. I know why one could have troubles reproducing original problem: /etc/init.d/squeezeboxserver forces utf8 charset, which does not work for me for obvious reasons and commented out.....
Comment 28 Chris Owens 2009-09-22 10:28:09 UTC
Hi Nick,

We're currently working to get 7.4 ready for release in a very short time.  We're not ignoring you!  :) I'll get someone to review your patch probably next week.
Comment 29 Nick Orlov 2009-09-22 17:21:17 UTC
Created attachment 5894 [details]
Oops, original patch had been inverted

Chris, ok :) I hope you are men of your word :)
I've just noticed that the first patch I've posted is inverted. Here is the right one (just in case)
Comment 30 Michael Herger 2009-10-15 23:15:37 UTC
Nick - would you mind giving the latest 7.4.1 build a try? I've identified and fixed quite a few issues with browsing cyrillic and other non-western files/folders.
Comment 31 Nick Orlov 2009-10-16 20:19:21 UTC
As of "Version: 7.5.0 - r28873 @ Fri Oct 16 02:00:29 PDT 2009" it still does not work for me (both my patches are still required).

Did you try to reproduce it using non-utf locale?
Comment 32 Michael Herger 2009-10-16 23:31:20 UTC
oops.. didn't read this bug carefully enough. My fixes are mostly Windows  
only.

Did you ever try adding --charset=utf8 to the slimserver.pl startup  
parameters?
Comment 33 Nick Orlov 2009-10-20 10:36:53 UTC
Forcing utf8 charset with 8 bit system locale makes it much worse, see bug #9236
Comment 34 Chris Owens 2009-10-21 09:47:35 UTC
Moving these bugs to P4 to make room for moving P1.5 bugs to P2
Comment 35 Nick Orlov 2009-10-21 16:46:14 UTC
Any chance of reviewing patches I've sent and possibly closing the bug instead of lowering priority?
Comment 36 Andy Grundman 2009-10-21 16:54:41 UTC
I looked at your patch.  Our Unicode handling is a mess and it is hard to say what the impact of a change such as yours would have on other aspects of the system. :(  I think we need to revisit how we handle all Unicode input and output and use the simplest methods possible (such as utf8::decode and utf8::encode) and get rid of much of the Unicode module.  Of course, much of the mess is a result of trying to be compatible with absolutely everything out there, including horrible Windows encodings, broken encodings, etc.  So being more strict and clean is probably not possible in the real world.
Comment 37 Nick Orlov 2009-10-21 21:35:55 UTC
Agreed on most parts, but I do believe that the first chunk (aka "better patch for a hang") should be safe and fixes most severe aspects of the problem.

P.S. Over time I've learned to code "just to make things work" as opposed to "for the sake of art of it". I think proposed solution is "good enough" compromise to deal with an issue in hands.
Comment 38 Pat Ransil 2009-10-23 05:09:34 UTC
Administrative move of 7.5 bugs. All P2, P3, P4 being downgraded one level. Will then split P1s.
Comment 39 Chris Owens 2010-02-11 15:55:24 UTC
This bug needs to be fixed at some point, and hopefully we can use some of your patch, Nick.

I have made sure our test suite has a test for the symptom you originally reported, as well.
Comment 40 Nick Orlov 2010-02-18 18:40:48 UTC
Both patches I posted are still good: I'm applying them every time I'm upgrading to a next nightly. Hope you eventually will accept them ....
Comment 41 Alan Young 2010-12-01 07:03:24 UTC
Nick, are you still using a non-UTF-8 locale/encoding for your filesystem?

Is there some particular reason for using KOI8 instead of UFT-8?
Comment 42 Nick Orlov 2010-12-02 18:06:03 UTC
(In reply to comment #41)
> Nick, are you still using a non-UTF-8 locale/encoding for your filesystem?

Yes, I do.

> Is there some particular reason for using KOI8 instead of UFT-8?

Besides historical reasons (KOI8 had been de-facto standard Russian Linux locale long before UTF8 got any traction) it's faster and more stable (less of an issue nowadays since more and more software supports unicode, but still).
Comment 43 Alan Young 2010-12-03 07:01:23 UTC
Ok, I think that non-UTF-8 has to be becoming pretty rare with Linux but I guess it should still work. I doubt that it will become part of our official support matrix.

Are you interested in trying 7.6 with the recent changes? I'd be interested in your feedback if you are.
Comment 44 Nick Orlov 2010-12-08 18:00:53 UTC
(In reply to comment #43)
> Are you interested in trying 7.6 with the recent changes? I'd be interested in
> your feedback if you are.

Well, there are good news and bad news. Good news are - both issues I complained about are solved in 7.6. Bad news - as of now (r31605) 7.6 has too many regressions (for example Artist sorting is totally screwed for names containing non-ascii chars - let me know if you need screenshot or something). So I'm back to 7.5.2 + my patches...
Comment 45 Alan Young 2010-12-10 00:23:04 UTC
Nick, thanks for this.

I'm not exactly sure but I suspect that the sorting issue may be related to your use of a non-UTF-8 locale. Sorting is using the following method:

    $COLLATION{perllocale} = sub { use locale; $_[0] cmp $_[1] };

where the strings from the DB that are being collated here will be UTF-8 encoded.

This collation is dependent upon LC_COLLATE (env var) being set appropriately. 

Taking those two points together, maybe try setting LC_COLLATE="ru_RU" instead of LC_COLLATE="ru_RU.KOI8-R".

I would be very interested to hear your results.
Comment 46 Alan Young 2010-12-10 00:56:14 UTC
After some more experiments, I think what you might need is LC_COLLATE="ru_RU.utf8"

[awy@oz ~]$ LC_COLLATE="ru_RU.KOI8-R" perl t.pl t
abcd
efgh
äbcd
åbcd
æbcd
øbcd
[awy@oz ~]$ LC_COLLATE="ru_RU.utf8" perl t.pl t
abcd
åbcd
äbcd
æbcd
efgh
øbcd
[awy@oz ~]$ LC_COLLATE=no_NO.utf8 perl t.pl t
abcd
efgh
æbcd
äbcd
øbcd
åbcd
Comment 47 SVN Bot 2010-12-10 02:48:21 UTC
 == Auto-comment from SVN commit #31609 to the slim repo by ayoung ==
 == http://svn.slimdevices.com/slim?view=revision&revision=31609 ==

bug 16683: Non-ASCII characters in file and directory names 
Fixed Bug 15805 - Artists sorting incorrectly on 7.5-embedded SQLite
Fixed Bug 14800 - Sorting not done correctly with Swedish characters
Bug 13153 - Both web interface and squeezebox "hang" when one browses folders containing non-ascii chars
Set LC_COLLATE so that SQLite database sorting using perlcollate works for different languages.
Comment 48 Paul Chandler 2011-05-06 14:14:26 UTC
7.6.0 32390   -USed the Cyrillic mp3 and folder ---was able to browse and play.

set language to Russion (on TOuch) still works.