Bug 13811 - Search doesn't ignore umlauts anymore
: Search doesn't ignore umlauts anymore
Status: CLOSED FIXED
Product: Logitech Media Server
Classification: Unclassified
Component: Web Interface
: 7.4.0
: PC SuSE Linux
: P2 normal with 2 votes (vote)
: 7.4.0
Assigned To: Spies Steven
:
Depends on: 13600
Blocks:
  Show dependency treegraph
 
Reported: 2009-09-02 01:31 UTC by Kari Lempiainen
Modified: 2009-10-05 14:34 UTC (History)
3 users (show)

See Also:
Category: Bug


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Kari Lempiainen 2009-09-02 01:31:29 UTC
In the previous version of server when performing a search with umlauts, the umlauts were dropped. Now when I search for "rückertlieder" I have to enter ü, previously the search worked with "ruckertlieder". This is quite annoying with names with e, è and é, if you don't remember the correct hat...

Kari

Version: 7.4 - r28396 @ Tue Sep 1 04:02:17 PDT 2009
Hostname: trane
Server IP Address: 192.168.1.3
Server HTTP Port Number: 9000
Operating system: SuSE - EN - utf8
Platform Architecture: x86_64-linux
Perl Version: 5.8.8 - x86_64-linux-thread-multi
MySQL Version: 5.0.26
Comment 1 Moonbase 2009-09-02 02:39:08 UTC
I agree stripping umlauts (and accented characters) can come in handy when searching. For some European countries, at least. Alas, imagine stripping to some "minor character" in, say, Mandarin, Hangeul, Russian... As you can see, "reducing" characters in a world with so many languages is not easy.

An additional thing to consider is that people living in country A might have set their interface language to country A's language but still have songs in country B,C and D's language. (And want to search for them.)

Things like the controller would probably only "wheel" through the characters of the selected language (mine wheels only through the English characters, albeit I live in Germany and thus "äöüÄÖÜß" are missing).

I propose using a more general "wildcard" character instead, like "?" (for one unknown character) and "*" (for multiple unknown characters) — this is what we use with filesystems ever since.

So, in your case, one could look for Mahler's "R?ckert-Lieder" instead. Or maybe even "r*ert*lieder" if one didn't know if it was written with "ck" or only "k" and one didn't know if it was written with or without a hyphen ("*" standing for zero to n unknown characters in this case).

This type of searching should of course be the same in all Web UI, soft- and hardware player interfaces. And it would be easy to implement, too, since Perl knows about RegExp's and both MySQL and SQLite understand wildcard searches.

For ease of use, I would also propose that devices that use a scrollable ("wheelable") character set use the uppercase set of characters of the selected interface language, plus numbers and punctuation symbols, followed by the lowercase set of the selected interface language, probably followed by some "agreed-upon" base character set (i.e., USASCII).

Thus, Russian, for example, would present Russian characters and numbers in the Controller's interface, followed by the latin A-Z.

This would make searching in the user's language easier (assumed he has most of his titles in the local language), plus allow searching for "foreign" ASCII-labelled titles.

Using wildcards like "?" and "*" in ANY language set would make finding things with foreign characters much easier (like, say, French accents).
Comment 2 Moonbase 2009-09-02 02:54:02 UTC
Thinking about it, I'd probably scrap the lowercase characters altogether for selection-based devices (wheel interface, touch screen, virtual keyboard).

The search itself will usually be performed case-insensitive anyway, so why clutter the interface with lots of characters that aren't actually needed?
Comment 3 Kari Lempiainen 2009-09-02 02:58:59 UTC
Good suggestions.  

As I said the basic accent dropping used to work. The last build I definitely it working from www-ui was 26229.
Comment 4 Michael Herger 2009-09-02 08:13:13 UTC
*** Bug 13809 has been marked as a duplicate of this bug. ***
Comment 5 Mikael Nyberg 2009-09-02 09:04:04 UTC
Ignoring umlauts will be just fine for me, but i can still use them rigth ?
accent's is very important to ignore, unless you want to spend half hour googling for the correct spelling, We that don't speak any Latin languages have no clue..
But still we have latin music on our drives.

We can not expect non computer geeks to understand wildcard search especially the "?" variant.
But i think this should be included as well, for those who can use it.
That would make everybody happy.
Comment 6 Jim McAtee 2009-09-20 00:52:41 UTC
I did some digging this evening and found the following:

- This works in 7.3.4

- It worked in 7.4 Trunk up until the merge (<= r27972)

- It's broken after the merge, with search columns of datatype TEXT

- This does not work with search columns of datatype BLOB


Assuming TEXT columns:

Something bad is happening to the accented characters when they're being stored in the database.  When I view the namesort field for contributor Mel Tormé in my database client (SQLyog), prior to the merge I see

TORMé MEL

After the merge I see

TORMÉ MEL

If I change the columns to BLOB, these characters look correct, but 'E' does not match 'é' in a binary column.
Comment 7 Kari Lempiainen 2009-09-23 14:20:04 UTC
The change #28582 on bug 13600 seems to have fixed the searching...
Comment 8 Andy Grundman 2009-09-23 14:22:34 UTC
Yes this is fixed now.
Comment 9 James Richardson 2009-10-05 14:34:54 UTC
This bug has been marked as fixed in the 7.4.0 release version of SqueezeBox Server!
    * SqueezeCenter: 28672
    * Squeezebox 2 and 3: 130
    * Transporter: 80
    * Receiver: 65
    * Boom: 50
    * Controller: 7790
    * Radio: 7790  

Please see the Release Notes for all the details: http://wiki.slimdevices.com/index.php/Release_Notes

If you haven't already, please download and install the new version from http://www.logitechsqueezebox.com/support/download-squeezebox-server.html

If you are still experiencing this problem, feel free to reopen the bug with your new comments and we'll have another look.