Bugzilla – Bug 13600
Non-western characters incorrectly sorted
Last modified: 2011-05-08 00:42:38 UTC
Since a couple of versions (I'm with a 7.4.28243 build right now, but that used to work last month), Greek artists are grouped between western letters I and J. They used to be after letter Z a few weeks ago, which is more logical. There have been no changes neither in artists' list nor in my configuration.
Michael: does this sound like a dupe (or related to) of another bug? or new behavior.
Themis - what build have you been using before you saw this happen? Would you mind upload one of thos files so we can test? James - surely related to _some_ other bug report, but I doubt the one you linked. Character set handling is one big issue...
Oh well... hard to say which build it was. WHS logs show that last builds downloaded were 28228, 28235, 28243 and the actual 28244. If you could provide me with a link to the 28228 build (of 7.4) for WHS I could test to see whether the problem existed. What is for sure, is that I didn't have the problem last month (july). I applied the last build on aug 15th (28243 at that time I think), then this dysfunction arose. At first I thought non-western characters were not scanned at all (It had been true somewhere back on 6.something), but then I discovered that artists were simply between letters I and J...
Created attachment 5670 [details] Greek characters
Themis: Please try with this version, let me know if you still see the issue http://downloads.slimdevices.com/nightly/?ver=7.4 Tested with Version: 7.4 - r28469 but I was unable to replicate the error as stated in this bug.
I retried, still the same problem. Tried to instal the .exe innstead of the .msi on my WHS: still the same. I even installed 7.4 r28494 on my laptop (which never had SC installed), copied 3 flacs in a directory, Alice in Chains - 1 song The file that I sent you - 1 song Ravel - Dafnis & Chloe - 1 song , the greek artist is between the Alice in Chains and Ravel, although it should be (and it used to be) at the end of the Latin-character list (after letter Z), that is: after Ravel.
Thanks for all the extra testing, assigning back to Michael for comment
Assigning to Andy...
*** Bug 13810 has been marked as a duplicate of this bug. ***
*** Bug 13296 has been marked as a duplicate of this bug. ***
== Auto-comment from SVN commit #28507 to the slim repo by andy == == https://svn.slimdevices.com/slim?view=revision&revision=28507 == Fixed bug 13600, change sort and search columns to be BLOB instead of TEXT so UTF-8 data can be stored correctly in them
It sorts different, but correct? É(dith Piaf)and È(tienne de Crécy) are sorted after Z. I would expect them to be sorted under E..
What language do you have selected?
Dutch
OK, let me look at this some more. Collation used for Dutch is utf8_general_ci which from what I can tell should sort É next to E: http://www.collation-charts.org/mysql60/mysql604.utf8_general_ci.european.html
Ok, works fine now for me since 28547 (there were various other problems in versions 28507 till 28538). Thank you for having fixed this. Regards, Themis
Great news! Thanks for the feedback.
Wait, is this really fixed? According to Dennis it may not be.
No, I think it's still not fixed for accentuated characters.
Reopening...
Previously, weren't characters with accents 'normalized' to some extent into their plain character equivalents when stored in the sort and search columns? I don't see that now in the database. That would explain why é and e are no longer sorted as the same. See bug 13811. This was also the case for the search columns, who's behavior has changed since 7.3. For instance, searching for 'torme' would find 'Mel Tormé', but now it does not. This was brought up discussed recently in the beta forum before the change to BLOB columns, so I'm guessing it might be a change made to the SQLite branch that should not have been merged into the trunk. http://forums.slimdevices.com/showthread.php?t=67321
Hmm, as far as I could tell nothing was ever normalized (I think you really mean transliterated) it was just stored in a TEXT column which completely garbled the data. I don't think searching that way ever worked, at least for me. I recall trying to search for "Budi" (looking for Büdi) and it didn't work. I need to take a closer look at this.
I just fired up the 7.3.4 server and you're right about the storage. The characters were not transliterated. One difference that I do see, though, is that in the 7.3 database those accented characters aren't capitalized in the sort column, while they are in the 7.4 database - TORMé MEL vs. TORMÉ MEL. If storing the sort text as a BLOB, I'm not sure how you could get É and E to sort the same or even adjacent to one another. From http://dev.mysql.com/doc/refman/5.0/en/blob.html : "BLOB columns are treated as binary strings (byte strings). TEXT columns are treated as nonbinary strings (character strings). BLOB columns have no character set, and sorting and comparison are based on the numeric values of the bytes in column values. TEXT columns have a character set, and values are sorted and compared based on the collation of the character set."
Yeah you're right, I hope I didn't screw this up even more by changing the column types. I did a bit of research the other day before I changed this about why the characters looked wrong in the database, and there may be a bug in the version of the server we are using that's causing it. :(
== Auto-comment from SVN commit #28582 to the slim repo by andy == == https://svn.slimdevices.com/slim?view=revision&revision=28582 == Fixed bug 13600, revert previous fix, the real problem was SET NAMES UTF8 had been accidentally removed from the on-connect SQL statements. I tested this with ?\195?\136tienne and was able to search for it using Etienne because MySQL performs the removal of diacritics automatically. It also appears to sort correctly now. You will need to do a full wipe and rescan because I have removed the schema_11 file.
Well, Édith Piaf and Étienne de Crécy are still sorted after Z... 7.4.0 - r28603 @ Tue Sep 22
Created attachment 5896 [details] View wrong sort order in Squeezecenter
Created attachment 5897 [details] Change character in alphabet
You did a full wipe and rescan?
Yes, I did a full wipe & rescan. This is scheduled daily and I triggered it manually to be sure.
Please attach your scanner log to the bug
Created attachment 5912 [details] scanner.log As requested: scanner.log
Created attachment 5917 [details] part 1 of 2 part scanner.log With debugging enabled.
Created attachment 5918 [details] part 2 of 2 part scanner.log With debugging enabled.
Andy: does the attached logs help?
I don't think a scanner log will help with this one. I thought I saw correct sort when I was testing. QA can you reproduce?
@Andy Can another log help?
What we need for this is a set of 2 test files that should sort one way but sort the opposite way. Can someone provide that?
ftp://squeezeboxserver.kicks-ass.net/%C9dith%20Piaf/ Dont use IE or disable FTP passive mode in IE.
OK, but those are all from 1 artist right? So not the best set for testing sorting I think?
And another one... ftp://squeezeboxserver.kicks-ass.net/Etienne%20De%20Crecy/
ftp://squeezeboxserver.kicks-ass.net/Music
OK I really don't think you should be posting that here. :) Can someone please just attach 2 test files to this bug that don't sort properly? They can be very short, all we need are the tags.
Not a very wise thing to do, indeed. Can you tell me a simple way to 'shorten' some files..
I downloaded a couple of the Édith Piaf tracks, scanned my test library with the latest SbS 7.4 and on my system the name sorts correctly among the E's. Edgar Meyer Édith Piaf Eric Clapton
Yeah I also saw the correct sort when I tested...
So what's happening on my system?
As I said before, I've scheduled a daily wipe & rescan. I also triggered several manual wipe & rescan actions. It didn't solve my sort problem. Now I've completely uninstalled Squeezebox Server, deleted "Program Files" & "ProgramData" files. It looks like sorting does work as expected now.
Created attachment 5941 [details] wrong sort order It gets weirder and weirder. "E" disappeared and "É" has been added after "Z" (7.4.0 - r28660). The language is set to "dutch". I believe it sorted correct wth the language set to "english". I will try that today.
If I set the interface language to 'english' it sorts as expected. If the interface language has been set to "dutch" the sort order is wrong.
I upgraded from 7.3.4 to 7.4.1 - r28693 yesterday and in the artist list ÅÄÖåäö are not sorted correctly (last), but with their "clean" equivalents (AOao). Also the sorting of those characters in browse music folder is broken. ÅÄÖ are here sorted first for some reason. Sorting was correct with 7.3.4. I am on Swedish Win XP with SC (or is it SS now?) and the SBC set to Swedish.
(In reply to comment #51) > I upgraded from 7.3.4 to 7.4.1 - r28693 yesterday and in the artist list > ÅÄÖåäö are not sorted correctly (last), but with their "clean" > equivalents (AOao). > Also the sorting of those characters in browse music folder is broken. ÅÄÖ > are here sorted first for some reason. > Sorting was correct with 7.3.4. > I am on Swedish Win XP with SC (or is it SS now?) and the SBC set to Swedish. You're experiencing the behaviour I want, I'm experiencing the behavior you want...
Please remove this from the 'fixed' list in the 7.4.0 release notes. It isn't.
*** Bug 14555 has been marked as a duplicate of this bug. ***
I could not reproduce this in 7.5.0 - r29192
(In reply to comment #51) > I upgraded from 7.3.4 to 7.4.1 - r28693 yesterday and in the artist list > ÅÄÖåäö are not sorted correctly (last), but with their "clean" > equivalents (AOao). > > Also the sorting of those characters in browse music folder is broken. ÅÄÖ > are here sorted first for some reason. > > Sorting was correct with 7.3.4. > > I am on Swedish Win XP with SC (or is it SS now?) and the SBC set to Swedish. With 7.4.2 r29220 a change of language in SBS between Swedish and English and back seems to have finally cured this issue, compare https://bugs-archive.lyrion.org/show_bug.cgi?id=10114#c8.
(In reply to comment #56) > (In reply to comment #51) > > I upgraded from 7.3.4 to 7.4.1 - r28693 yesterday and in the artist list > > ÅÄÖåäö are not sorted correctly (last), but with their "clean" > > equivalents (AOao). > > > > Also the sorting of those characters in browse music folder is broken. ÅÄÖ > > are here sorted first for some reason. > > > > Sorting was correct with 7.3.4. > > > > I am on Swedish Win XP with SC (or is it SS now?) and the SBC set to Swedish. > > With 7.4.2 r29220 a change of language in SBS between Swedish and English and > back seems to have finally cured this issue, compare > https://bugs-archive.lyrion.org/show_bug.cgi?id=10114#c8. Correction: The reported issue when browsing the music folder still persists.
The sorting order is wrong again. 7.5.0 - r30028
Created attachment 6493 [details] É after Z
If you're running embedded with SQLite, then yes it's a known issue that I'm working on.
No, I'm not running the SQLite version, I'm running the MySQL version. The problem was fixed, but it's back now.
Still happening on 7.5.0 - r30326 (É after Z)
7.4.x milestone is in the past
I don't see the É after Z problem in 7.5.0 - r30373, AFTER completely uninstalling/installing Squeezebox Server.
It's back again, but with different behavior on the controller (7.6.0 r8716) & webinterface (7.6.0 - r30667). On the controller artists starting with É have been sorted after A, but before B. On the webinterface artists starting with É (and also artists starting with E) have been sorted after Z.
I've removed all 'international' characters from the FILEname, not the tag. Sorting is now as expected. (But SBS should handle international characters correctly)
Please disregard my last comment. The problem still exists, even when all filenames don't have extended/special/international characters.
Also present in 7.6.0 - r31215
Filename sorting is different to artist, album or title sorting. For filename sorting please see bug 14906. The original version of this bug was fixed in 7.3.3. The new behaviour is covered by bug 14800.
Hmm, I may have got the bit about this one ever having been fixed but I would still like to use bug 14906 and bug 14800 to track these issues further.
It's back again in the 7.6 trunk...
Created attachment 7261 [details] á after Z
Dennis - please open a new bug which clearly states that this is about the page bar in the web UI only. This bug started as something different. Thanks!
Created new bug report: https://bugs-archive.lyrion.org/show_bug.cgi?id=17205