Bugzilla – Bug 16956
Searches do not match characters with diacritics (SQLite)
Last modified: 2011-05-06 10:51:43 UTC
For example, searching for 'nina' no longer matches 'niña'. Still works in 7.6 when running MySQL as the database.
Yep i do welcome the changes that makes sbs distinguishe betwenn Björk and Bjork and Jose Gonsales or José González . But the limited search options is annoying . The controller or Ir interface does not have ö or é among other things. Web-UI has what chars you can find on the keyboard so it is not entirely impossible to search which it is on the players rigth now. iPeng or SqueezePad can use what you have for keys on your ipad.
are you saying you want searches to respect diacritics?
(In reply to comment #2) > are you saying you want searches to respect diacritics? No ! For the moment it DOES meaning that you can search for Björk by typing Bjork on a controller or Squeezebox 3 directly. All name should be wysiwyg *except for when searching* for the player UI and different computers keyboards may not have all special sign readily avaible. And also as unfamiliar with many languages you migth find what you are searching for re knowing what kind of ' '~^` to use . So search should not respect diacritics etc so searching for Bjork should turn up Björk.
unfortunately, it seems andy doesn't want to discuss it any further, but in the other bug many issues which appear legit to me, such as the ones jim raises, are left unanswered, such as the difference in behavior between sqlite and mysql. a point jim made in that bug reflects what i consider to be the "american view" which is that when displaying, we likely would not want our borjks differentiated. i argued for an option, so people could have it either way, but for some reason, phil and others insist it be their way or the highway, only their way is valid i guess, (rubbish! haha). i still don't know if case differences will be differentiated. i don't think such questions should be called "whining." its irritating actually.
The other bug is closed. The search behavior had nothing to do with the changes made. That's why this bug has been opened.
i am not saying the changes in the other bug caused this bug. and btw, i'm impressed you were able to diagnose this bug, as being caused by (or existing in) sqlite as opposed to mysql. still, it seems unwise to fix that bug before fixing this one. and i think you raised the same point i did, which was that what was (most of the time) cosmetic, will now be functional.
Ok when can we that use SQlite have functionality back regarding search ? JJ has tested and confirmed that it works in MySQL ?
I just did some testing by rolling back 7.6 trunk to a couple of earlier revs: r31415 07-Oct-2010 r31143 30-Jul-2010 Both versions exhibit the same behavior, so it looks like the problem is inherent in SQLite, not a result of the recent ICU work. Results from doing a few web searches suggest that ICU can be used to overcome the problem. CCing Alan.
forgive my ignorance, but what is ICU?
http://site.icu-project.org/
Referring to https://bugs-archive.lyrion.org/show_bug.cgi?id=10049 I'd suggest that the most sensible (if technically challenging) solution would be to respect diacritics if they have been specified in the search so that a search for "nät" would return "Långa Nätter" but not "Nat King Cole", but a search for "nat" would return both "Långa Nätter" and "Nat King Cole", i.e. a diacritic search character can only match the same diacritic but a non-diacritic search character can match any diacritic variant of the same character. There are a large number of users (thinking US and UK here as two good examples) where the keyboards don't have any diacritics and yet could well own music by Beyoncé, Björk or Motörhead (to cite a few examples) but won't find it practical to have to cut and paste accents or to start Character Map (or the equivalent on non-Windows OSes) to be able to search for artists in their libraries. And even users in Europe that have keyboards with diacritics on them relevant to their own languages may own a copy of David Gilmour's "Live in Gdańsk" but not have the relevant accent available on their keyboard if they are in western Europe. There is also the problem that the Logitech devices themselves don't offer access to the diacritics. This is a major issue for anyone with anything other than straight A-Z tags in their libraries
Like most people, I am used to search behavior I can have with google search engine, gmail search, Windows 7 explorer search, etc ... I guess anybody expecting sensible results would say the same. When I search for "liege", I want to find items containing "Liège"
== Auto-comment from SVN commit #32352 to the slim repo by agrundman == == http://svn.slimdevices.com/slim?view=revision&revision=32352 == Fixed bug 16956, transliterate titlesearch/namesearch column values and search inputs
This fix works for album and track titles, but not artist names. I did a full clear/rescan before testing.
I agree with Jim. Trying with svn 32374: "Faure" doesn't find "Fauré", "Dvorak" doesn't find "Dvořák".
Things are a bit more screwed up than I realized. I'm looking at the database SEARCH columns in the albums, tracks and contributors tables. In contributors: - the string is capitalized - punctuation is removed - diacritics are not transliterated In albums and tracks: - the string is not capitalized - punctuation is not removed - diacritics are transliterated There's really no reason to capitalize the SEARCH column (or the SORT column, for that matter). One other odd thing I noticed is that the album title 100° and Rising becomes the following in TITLESEARCH: 100deg and Rising If a user expects non alphanumeric characters to be removed, that strange handling of the degrees character will make that title difficult to search for. In 7.5 trunk I see that the degree character is left intact, which is also wrong.
== Auto-comment from SVN commit #32377 to the slim repo by agrundman == == http://svn.slimdevices.com/slim?view=revision&revision=32377 == Bug 16956, fix contributor namesearch value to transliterate properly
Please do a complete wipe and rescan and let me know if that fixes the problem. Yes, you will get 'deg' because all Unicode chars are being transliterated. This is an edge case that I'm not that worried about, and would be difficult to solve.
That fixes the diacritics problem. Did the changes to fix this introduce the other problem, where punctuation and special characters are no longer removed, or is that intentional?
The search value is now the same as sort but with transliteration. Before, search was always the same as sort. Nothing else should have changed.
Closing this bug. I'll let someone else find the other.
Andy, are trailing spaces and case differences now respected/differentiated?
The search column is all-caps. Spaces aren't changed from what's in the title column.
Hmmm... search still can't find the little known Artist R.E.M. (or "REM") on my r32379/XP/SQLite setup
The search string 'r.e.m.' finds "R.E.M.", while 'rem' does not (and never has). However, as I pointed out above in comment 16, NAMESERACH no longer has punctuation removed, so searching for 'r e m', which previously worked, no longer does. Andy's assertion that nothing else has changed is incorrect. Where or how it changed I have no idea. The diacritics problem, though, is fixed.
(In reply to comment #25) > The search string 'r.e.m.' finds "R.E.M.", while 'rem' does not (and never > has). However, as I pointed out above in comment 16, NAMESERACH no longer has > punctuation removed, so searching for 'r e m', which previously worked, no > longer does. > Andy's assertion that nothing else has changed is incorrect. Where or how it > changed I have no idea. The diacritics problem, though, is fixed. Really... well actually " r e m" is the ONLY way of finding the ARTIST R.E.M. - which is just plain wrong however anyone wants to look at it. R.E.M. finds the ALBUM "Best of R.E.M" This is silly. For SEARCHING, All diacritics should be transliterated, all punctuation should be removed, all trailing/leading spaces removed, all multiple embedded spaces changed to single space and finally all searching should be case-insensitive... and of course searching for artists/albums/songs should be consistent! THIS IS ALL PART OF THE SAME PROBLEM (namely that searching is broken)
It still does not work for me.. "Jose" does not find José González /squeezeboxserver-7.6.0-0.1.32379.noarch.rpm Do I need to rescan the whole library :-/ ?
Yes of course you need to rescan the whole library...
(In reply to comment #12) > Like most people, I am used to search behavior I can have with google search > engine, gmail search, Windows 7 explorer search, etc ... > > I guess anybody expecting sensible results would say the same. > > When I search for "liege", I want to find items containing "Liège" yes----and thats what we're doing
(In reply to comment #29) > (In reply to comment #12) > ... > yes----and thats what we're doing Why are you replying to one year old comments in a resolved bug?