Bug 16956 - Searches do not match characters with diacritics (SQLite)
: Searches do not match characters with diacritics (SQLite)
Status: CLOSED FIXED
Product: Logitech Media Server
Classification: Unclassified
Component: Database
: 7.6.0
: PC Windows Server 2003
: P1 critical with 14 votes (vote)
: 7.6.0
Assigned To: Andy Grundman
:
Depends on:
Blocks: 10049
  Show dependency treegraph
 
Reported: 2011-02-11 15:43 UTC by Jim McAtee
Modified: 2011-05-06 10:51 UTC (History)
4 users (show)

See Also:
Category: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jim McAtee 2011-02-11 15:43:10 UTC
For example, searching for 'nina' no longer matches 'niña'.  Still works in 7.6 when running MySQL as the database.
Comment 1 Mikael Nyberg 2011-02-11 22:01:42 UTC
Yep i do welcome the changes that makes sbs distinguishe betwenn Björk and Bjork and Jose Gonsales or José González .

But the limited search options is annoying .

The controller or Ir interface does not have ö or é among other things.

Web-UI has what chars you can find on the keyboard so it is not entirely impossible to search which it is on the players rigth now.

iPeng or SqueezePad can use what you have for keys on your ipad.
Comment 2 Mike Walsh 2011-02-11 22:33:13 UTC
are you saying you want searches to respect diacritics?
Comment 3 Mikael Nyberg 2011-02-12 01:12:39 UTC
(In reply to comment #2)
> are you saying you want searches to respect diacritics?

No !

For the moment it DOES meaning that you can search for Björk by typing Bjork on a controller or Squeezebox 3 directly.

All name should be wysiwyg *except for when searching* for the player UI and different computers keyboards may not have all special sign readily avaible.

And also as unfamiliar with many languages you migth find what you are searching for re knowing what kind of ' '~^` to use .

So search should not respect diacritics etc so searching for Bjork should turn up Björk.
Comment 4 Mike Walsh 2011-02-12 02:23:03 UTC
unfortunately, it seems andy doesn't want to discuss it any further, but in the other bug many issues which appear legit to me, such as the ones jim raises, are left unanswered, such as the difference in behavior between sqlite and mysql.

a point jim made in that bug reflects what i consider to be the "american view" which is that when displaying, we likely would not want our borjks differentiated.

i argued for an option, so people could have it either way, but for some reason, phil and others insist it be their way or the highway, only their way is valid i guess, (rubbish! haha).  i still don't know if case differences will be differentiated.  i don't think such questions should be called "whining."  its irritating actually.
Comment 5 Jim McAtee 2011-02-12 02:56:16 UTC
The other bug is closed. The search behavior had nothing to do with the changes made. That's why this bug has been opened.
Comment 6 Mike Walsh 2011-02-12 03:04:03 UTC
i am not saying the changes in the other bug caused this bug.  and btw, i'm impressed you were able to diagnose this bug, as being caused by (or existing in) sqlite as opposed to mysql.

still, it seems unwise to fix that bug before fixing this one.  and i think you raised the same point i did, which was that what was (most of the time) cosmetic, will now be functional.
Comment 7 Mikael Nyberg 2011-03-10 10:47:08 UTC
Ok when can we that use SQlite have functionality back regarding search ?

JJ has tested and confirmed that it works in MySQL ?
Comment 8 Jim McAtee 2011-03-10 14:57:18 UTC
I just did some testing by rolling back 7.6 trunk to a couple of earlier revs:

r31415 07-Oct-2010
r31143 30-Jul-2010

Both versions exhibit the same behavior, so it looks like the problem is inherent in SQLite, not a result of the recent ICU work. Results from doing a few web searches suggest that ICU can be used to overcome the problem. CCing Alan.
Comment 9 Mike Walsh 2011-03-10 15:29:10 UTC
forgive my ignorance, but what is ICU?
Comment 10 Andy Grundman 2011-03-10 15:31:49 UTC
http://site.icu-project.org/
Comment 11 p_lemonde 2011-04-04 10:57:35 UTC
Referring to https://bugs-archive.lyrion.org/show_bug.cgi?id=10049 I'd suggest that the most sensible (if technically challenging) solution would be to respect diacritics if they have been specified in the search so that a search for "nät" would return "Långa Nätter" but not "Nat King Cole", but a search for "nat" would return both "Långa Nätter" and "Nat King Cole", i.e. a diacritic search character can only match the same diacritic but a non-diacritic search character can match any diacritic variant of the same character.

There are a large number of users (thinking US and UK here as two good examples) where the keyboards don't have any diacritics and yet could well own music by Beyoncé, Björk or Motörhead (to cite a few examples) but won't find it practical to have to cut and paste accents or to start Character Map (or the equivalent on non-Windows OSes) to be able to search for artists in their libraries.

And even users in Europe that have keyboards with diacritics on them relevant to their own languages may own a copy of David Gilmour's "Live in Gdańsk" but not have the relevant accent available on their keyboard if they are in western Europe.

There is also the problem that the Logitech devices themselves don't offer access to the diacritics.

This is a major issue for anyone with anything other than straight A-Z tags in their libraries
Comment 12 Nicolas 2011-04-06 08:30:17 UTC
Like most people, I am used to search behavior I can have with google search engine, gmail search, Windows 7 explorer search, etc ...

I guess anybody expecting sensible results would say the same.

When I search for "liege", I want to find items containing "Liège"
Comment 13 SVN Bot 2011-04-26 08:00:18 UTC
 == Auto-comment from SVN commit #32352 to the slim repo by agrundman ==
 == http://svn.slimdevices.com/slim?view=revision&revision=32352 ==

Fixed bug 16956, transliterate titlesearch/namesearch column values and search inputs
Comment 14 Jim McAtee 2011-04-26 12:50:05 UTC
This fix works for album and track titles, but not artist names. I did a full clear/rescan before testing.
Comment 15 Gordon Harris 2011-04-28 17:43:58 UTC
I agree with Jim.  Trying with svn 32374: "Faure" doesn't find "Fauré", "Dvorak" doesn't find "Dvořák".
Comment 16 Jim McAtee 2011-04-29 02:40:46 UTC
Things are a bit more screwed up than I realized. I'm looking at the database SEARCH columns in the albums, tracks and contributors tables.

In contributors:
 - the string is capitalized
 - punctuation is removed
 - diacritics are not transliterated

In albums and tracks:
 - the string is not capitalized
 - punctuation is not removed
 - diacritics are transliterated

There's really no reason to capitalize the SEARCH column (or the SORT column, for that matter).

One other odd thing I noticed is that the album title

100° and Rising

becomes the following in TITLESEARCH:

100deg and Rising

If a user expects non alphanumeric characters to be removed, that strange handling of the degrees character will make that title difficult to search for. In 7.5 trunk I see that the degree character is left intact, which is also wrong.
Comment 17 SVN Bot 2011-04-29 10:59:36 UTC
 == Auto-comment from SVN commit #32377 to the slim repo by agrundman ==
 == http://svn.slimdevices.com/slim?view=revision&revision=32377 ==

Bug 16956, fix contributor namesearch value to transliterate properly
Comment 18 Andy Grundman 2011-04-29 11:00:41 UTC
Please do a complete wipe and rescan and let me know if that fixes the problem.

Yes, you will get 'deg' because all Unicode chars are being transliterated. This is an edge case that I'm not that worried about, and would be difficult to solve.
Comment 19 Jim McAtee 2011-04-29 12:39:29 UTC
That fixes the diacritics problem.

Did the changes to fix this introduce the other problem, where punctuation and special characters are no longer removed, or is that intentional?
Comment 20 Andy Grundman 2011-04-29 12:48:28 UTC
The search value is now the same as sort but with transliteration. Before, search was always the same as sort. Nothing else should have changed.
Comment 21 Jim McAtee 2011-04-29 13:19:18 UTC
Closing this bug. I'll let someone else find the other.
Comment 22 Mike Walsh 2011-04-29 20:43:14 UTC
Andy,

are trailing spaces and case differences now respected/differentiated?
Comment 23 Andy Grundman 2011-04-30 10:27:13 UTC
The search column is all-caps. Spaces aren't changed from what's in the title column.
Comment 24 Phil Leigh 2011-04-30 15:05:08 UTC
Hmmm... search still can't find the little known Artist R.E.M. (or "REM") on my r32379/XP/SQLite setup
Comment 25 Jim McAtee 2011-04-30 15:24:14 UTC
The search string 'r.e.m.' finds "R.E.M.", while 'rem' does not (and never has). However, as I pointed out above in comment 16, NAMESERACH no longer has punctuation removed, so searching for 'r e m', which previously worked, no longer does.

Andy's assertion that nothing else has changed is incorrect. Where or how it changed I have no idea. The diacritics problem, though, is fixed.
Comment 26 Phil Leigh 2011-04-30 15:39:59 UTC
(In reply to comment #25)
> The search string 'r.e.m.' finds "R.E.M.", while 'rem' does not (and never
> has). However, as I pointed out above in comment 16, NAMESERACH no longer has
> punctuation removed, so searching for 'r e m', which previously worked, no
> longer does.
> Andy's assertion that nothing else has changed is incorrect. Where or how it
> changed I have no idea. The diacritics problem, though, is fixed.

Really...

well actually " r e m" is the ONLY way of finding the ARTIST R.E.M. - which is just plain wrong however anyone wants to look at it.

R.E.M. finds the ALBUM "Best of R.E.M"

This is silly. For SEARCHING, All diacritics should be transliterated, all punctuation should be removed, all trailing/leading spaces removed, all multiple embedded spaces changed to single space and finally all searching should be case-insensitive... and of course searching for artists/albums/songs should be consistent!

THIS IS ALL PART OF THE SAME PROBLEM (namely that searching is broken)
Comment 27 Mikael Nyberg 2011-05-01 04:52:49 UTC
It still does not work for me..

"Jose" does not find   José González

/squeezeboxserver-7.6.0-0.1.32379.noarch.rpm

Do I need to rescan the whole library :-/ ?
Comment 28 Andy Grundman 2011-05-01 09:16:12 UTC
Yes of course you need to rescan the whole library...
Comment 29 Paul Chandler 2011-05-03 14:55:41 UTC
(In reply to comment #12)
> Like most people, I am used to search behavior I can have with google search
> engine, gmail search, Windows 7 explorer search, etc ...
> 
> I guess anybody expecting sensible results would say the same.
> 
> When I search for "liege", I want to find items containing "Liège"

yes----and thats what we're doing
Comment 30 Jim McAtee 2011-05-03 15:01:19 UTC
(In reply to comment #29)
> (In reply to comment #12)
> ...
> yes----and thats what we're doing

Why are you replying to one year old comments in a resolved bug?