Bug 10324 – Artist with accented characters in ARTIST tag not showing correctly

Bug 10324 - Artist with accented characters in ARTIST tag not showing correctly

Summary:

Artist with accented characters in ARTIST tag not showing correctly

Status:	CLOSED FIXED

Product:	Logitech Media Server
Classification:	Unclassified
Component:	Scanner
Version:	7.3.1
Platform:	PC Windows XP

Importance:	P2 normal with 7 votes (vote)
Target Milestone:	7.6.0
Assigned To:	Andy Grundman

URL:
Keywords:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2008-12-14 09:29 UTC by Philip Meyer
Modified:	2011-05-09 10:28 UTC (History)
CC List:	6 users (show)

See Also:
Category:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Philip Meyer 2008-12-14 09:29:54 UTC

I have an album with ARTIST=José Gonzaléz.

SC is displaying the artist as "Jose Gonzalez" i.e. without accented characters.

I checked the mp3 file - I only have id3v2.3 tags, and the artist tag definitely has the accents.  The filename has accented characters, and these are displayed in the "Location:" property in the SC song info page.

Other artists with accented chars appear correctly.

I think the issue is that I have had other songs on compilation albums by ARTIST=Jose Gonzalez.  SC notices that this is equivalent to José Gonzaléz, and shares the same artist record, rather than creating two distinct artists.

All of my songs by Jose Gonzalez have now been retagged to consistently be "José Gonzaléz", and a scan for new/changed files has been performed.  But the artist record in SC still retains the original name without accents.  I assume a full wipe+rescan would fix it.

Incidentally, the namesort does have the accents, but name does not.

I think the scanner should merge information into the same artist record (a bit like the first ARTISTSORT is merged into the contributor record).  However, the subtlety here is that if it merges an artist into an existing contributor record, it should replace the artist name if there are accents in the new name.

Comment 1 James Richardson 2008-12-17 12:32:42 UTC

Michael, is this yours to handle?

Comment 2 James Richardson 2008-12-22 10:04:55 UTC

Philip can you please provide Sample Files?  one with the accented characters and one with out.

Comment 3 Philip Meyer 2008-12-22 15:30:05 UTC

I don't think there's a need for example mp3 files - it's easy to repeat I think:

1. Create a song file with ARTIST=eee
2. Scan
3. Create another song file with ARTIST=ééé
4. Scan new/changed files

You will still have only one artist called eee, with both songs associated to that artist, because SC sees these as the same artist.

I think the artist name with accents is the most correct name for the artist, and should take precidence over the plain artist name.

Comment 4 Michael Herger 2008-12-23 00:31:05 UTC

Phil - I sincerely hope you don't expect us to implement linguistical rules as to what spelling to prefer over what other? IMHO this is a data error: wrong tags lead to unexpected result.

Comment 5 Philip Meyer 2008-12-23 02:10:16 UTC

>I sincerely hope you don't expect us to implement linguistical rules as
>to what spelling to prefer over what other?
>
No, I'm sincerely not expecting that!

>IMHO this is a data error: wrong tags lead to unexpected result.
>
I am all for getting tags fixed.  In fact I did - I changed all of my "Jose Gonzalez" track artists to "José Gonzaléz" because some songs did have incorrect tags, but a scan for new/changed files failed to fix the SC library (full rescan needed, but shouldn't be necessary).

To summarise, my two point is that the SC scanner treats "Jose Gonzalez" and "José Gonzaléz" as the same artist; the first version of the artist name to be scanned from a song wins, meaning that rescans have non-deterministic outcomes.

I'd be happy if the two artist names were always considered different artists.  Then I could simply spot them in the Browse Artist list and fix tags.

Or, always treat them as the same artist, but the scanner replace names with no accented characters with names with accented characters if it finds two different versions, rather than arbitrarily taking the first artist name from the first full scan.

Comment 6 James Richardson 2009-01-08 09:26:26 UTC

Dean: Can you chime in on this issue?  I don't know what QA can do about it.

Comment 7 Blackketter Dean 2009-01-08 09:43:06 UTC

Since we're folding accented and non-accented characters together, it seems reasonable to prefer the one with accents.  Should be just a regex away, no?  :)

Comment 8 Michael Herger 2009-01-15 00:57:34 UTC

> Since we're folding accented and non-accented characters together, it seems
> reasonable to prefer the one with accents.  Should be just a regex away, no? 

I'm sorry, no. We'd need a spell checker. You can add accents where they aren't needed. Dvoràk or Dvorák or Dvorak? None of them is correct.

Comment 9 Blackketter Dean 2009-01-15 07:44:36 UTC

I'm not suggesting that we add any diacritics, but if we are folding together some with and some without, we prefer the ones with and not the ones without.

There are edge cases (like your example.  in that case, we'd take one of the ones with the diacritics and not take the one without.

Do you see this as dangerous as well?

(Indeed the ultimate "right" thing to do is to fix your tags...)

Comment 10 Michael Herger 2009-01-15 07:58:17 UTC

> Do you see this as dangerous as well?

Not dangerous, but useless. Adding code, wasting time for a solution which will never be more accurate than now. Really, we can't fix this. Adding this workaround would just provoke new bug reports complaining about its inability to fix the Dvorak case. Let's not even get started.

Comment 11 Philip Meyer 2009-01-15 12:01:17 UTC

*Something* needs to happen, because as it stands, it's broken.

If you read the first few messages again, I reported that after a scan for new/changed files, SC can get in a mess, as it finds an artist name with accents but links the song to an artist without accents.

It sounded to me as if SC is already matching an artist with accents with it's plaintext equivalent that already exists, and uses that record.  It stores the accented version in the namesort column, but doesn't change the artist name.  Therefore, as it is already doing that, it could store the new artist name with accents in that record that matched.

I'm all for changing my tags to be consistent, but at the same time if you say you don't want SC to accept the version of the artist name with the accents as the same, you should stop SC from ever matching artist names that aren't exactly the same.

i.e. An alternative to be consistent, would be for *any* differences in artist names to be stored as different artist records.

Comment 12 Michael Herger 2009-01-15 14:30:58 UTC

> you should stop SC from ever matching artist names that aren't
> exactly the same.

I couldn't have said it better! We should never have tried to match misspelled names, as we'll never be able to do it correctly. Let's stop here before it gets even worse. Please.

Comment 13 Philip Meyer 2009-01-16 00:07:33 UTC

Okay, I'd be happy if the scanner did that, rather than attempt to match up.

I could then see artists that are duplicated because of missing accents and correct those tags.

Comment 14 Philip Meyer 2009-01-19 14:23:38 UTC

Care to elaborate on why "won't fix"?

NB. The status is not resolved.  The bug is still there.  Perhaps you didn't understand the problem, or perhaps it's not easily fixable?

I thought Michael had just agreed in #12 that it should always perform exact matches, not equivalents.  José Gonzalez != Jose Gonzaléz.  At the moment it does, but that means that there is a non-deterministic element to the scanner - depending which file is accessed first/last, defines what will be displayed, and is another case of a scan for new/changed files not working - produced different results to a full scan.

Comment 15 Michael Herger 2009-01-19 23:55:30 UTC

This means we stick what we have right now: try to match arists together without trying to understand what the correct spelling is. Please accept that this is so much border case and so easy to fix at the source that we won't fix it. Thanks for your understanding.

Comment 16 Mike Walsh 2009-03-31 21:39:20 UTC

just a datapoint:

winamp has decided to ignore diacritcs, from their site:

"1. Improved: [ml_local] Media Library searches now ignore diacritics:
For example, a search for "Einsturzende" will now return results for both Einsturzende and Einstürzende;
a search for "Bjork" will return results for both Bjork and Björk, etc."

i don't think its just searches, it seems to be that winamp makes no distinction at all anywhere, in sorts and lists, etc...

its an interesting question...  should SC treat the two bjorks as one, or two distinct and different entries?

i'm not sure i know, perhaps it should be a toggle option, but one thing seems true to me...  IF you are going to tell users to fix any problems in their tags, thats fine, HOWEVER when they do so they should be able to have SC reflect the fixes with just a normal scan for new and changed music, a full clear and rescan shouldn't be required.

Comment 17 Blackketter Dean 2009-04-01 07:25:12 UTC

I agree with Winamp, the current behavior, though imperfect, is acceptable.

Comment 18 Philip Meyer 2009-04-01 13:08:38 UTC

I also agree with WinAmp, and indeed that is was SC does when searching. A search for Gonzalez or Gonzaléz will find songs by either Gonzalez or Gonzaléz. That is very acceptable - it's the correct thing to do.

However, if you have two such versions of the artist in tags, it depends which one was scanned first/last as to what you can expect to see reported against songs.

If you change one of the tags to be the same as the other, the scanner won't detect the change in your tags, so you have to do a full clear and rescan to pick the change up. It was a design goal I believe of SC to avoid the need for full rescans to fix tag problems - a scan for new/changed files should work.

It's hard for a user to find and fix incorrect tags, if the software is not consistent in how it treats tag data. I think Jose Gonzalez and José Gonzaléz should be treated as two different artists. Apparently (although I haven't tested it), SC does detect differences in upper/lower case. So Gonzalez != gonzalez, but Gonzalez == Gonzaléz.

If exact equality was always used when processing tag content, then it's obvious when a song is incorrectly tagged, as two artists will be displayed, and one can be fixed using a tag editor, and would be fixed by doing a scan for new/changed files.

Should use exact equality for scanning tags - adding/updating DB content, and then use equivalent equality (diacritics or case) for searching.

At the moment, it makes the job of detecting bad tags and fixing the library very hard/time consuming, and inconsistent/non-determinable - each time a full rescan is performed, the library content could be different.

Comment 19 Mike Walsh 2009-04-04 00:23:40 UTC

so it sounds like to me you are saying that you don't care which 'version' of "Gonzalez" SC chooses to display if there are multiple versions via case or diacritics or what have you...  your beef, is that SC should update the DB properly with just a "new/changed scan" when changes are made to the tags to make all versions exactly equal.

is that correct?

if so, i fully agree with it.

Comment 20 Philip Meyer 2009-04-04 00:49:58 UTC

I'm saying:

1. Tags should be treated exactly as they are in the source file.  If any characters in the artist name are different, that should be a different artist.  Gonzaléz != Gonzalez != gonzalez != gonzaléz.

2. If I fix an artist tag to make it consistent with other artist tags (eg. correct one case where Gonzalez should have been Gonzaléz) then a scan for new/changed files should pick up that change and correct the database content.

3. Each time I do a full rescan, my music DB should have the same content.  At the moment, I could end up with a single artist called Gonzaléz, Gonzalez, gonzalez or gonzaléz, depending which source file gets scanned first.

Anyway, not much point talking about it here, as the bug has been "resolved".

Comment 21 Mike Walsh 2009-04-04 01:36:36 UTC

i guess i'm still interested in this b/c i agree with you that this bug shouldn't be closed/resolved, i think u said in the forums they didn't understand or missed a point you were making, and i agree, i think they did miss a point you were making.

i think a bug filed under "scanner doesn't update DB properly on new/changed tags" might get more traction. seems similar to how it fails when comp tags are updated, another bug you filed.

i'll expound, you said:

"I'm saying:

1. Tags should be treated exactly as they are in the source file. If any
characters in the artist name are different, that should be a different artist.
Gonzaléz != Gonzalez != gonzalez != gonzaléz."

this seems to contradict what you said earlier, when u agreed with how winamp handles it. its not just searches, winamp will display/sort it only once, (not sure by what criteria/circumstance it picks which one to use). same for case.

it also seems odd to me that you think there should be two separate artists but both should be returned as the result of one search. maybe thats not "abnormal" but how would SC know the two were the same? and if they are the same, why treat them as two separate artists to begin with?

i understand what you are saying about not knowing via SC you have divergent tags, and its further complicated by #2 and #3, but perhaps SC could use a marker of some kind to denote when this is the situation, instead of listing each vagary? seems esp appropriate for if its a case issue.

"
2. If I fix an artist tag to make it consistent with other artist tags (eg.
correct one case where Gonzalez should have been Gonzaléz) then a scan for
new/changed files should pick up that change and correct the database content.
"

complete agreement and very similar to the comp tag bug you filed.

even if they won't reopen this bug, point #2 is worthy of its own new specific bug.

"
3. Each time I do a full rescan, my music DB should have the same content. At
the moment, I could end up with a single artist called Gonzaléz, Gonzalez,
gonzalez or gonzaléz, depending which source file gets scanned first.
"

just curious, is it the one scanned first, or the one scanned last?

anyway, i'm not sure i understand the problem here... i know you want each variation listed separately, but assuming that won't happen, whats the issue here in #3?

"
Anyway, not much point talking about it here, as the bug has been "resolved".
"

i think the various issues just got too conflated, but i feel there is definitely still a bug here with #2, that perhaps should be put into a new bug?

Comment 22 Philip Meyer 2009-04-04 03:46:17 UTC

>i think u said in the forums they didn't understand or missed a point
>you were making, and i agree, i think they did miss a point you were making.
I think because I originally asked if SC could record the artist name that had the most diacritic characters in, rather than the first/last artist tag that is scanned.
That wasn't a particularly intelligent suggestion for fixing the problem; better is to treat every slight change in artist name as a different artist, and leave it up to the user to correct tags.
I think Michael agreed to that, but the point was missed that there was a problem to be fixed ;-)

>it also seems odd to me that you think there should be two separate artists but
>both should be returned as the result of one search.  maybe thats not
>"abnormal" but how would SC know the two were the same?  and if they are the
>same, why treat them as two separate artists to begin with?
>
I'm saying that Gonzalez and Golzaléz should be two different artists in the database, but if you search for Gonzalez, it should find both of those artists, returning two results, the same as if you searched for "Gonzal", it would find both.  I'm saying that SC should always treat them as two artists.

Some diacritics are hard to type in, certainly from an IR remote!  Search already does this.  I have an artist called "Mario Schönwälder"; if I search for "Schonwalder" it finds that artist. 


>just curious, is it the one scanned first, or the one scanned last?
>
If I knew what order the scanner reads files, then I'd be able to answer the question ;-)

>anyway, i'm not sure i understand the problem here...
>
At the moment, the scanner treats Gonzaléz as the same as Gonzalez, so songs by either ARTIST string are treated as the same artist record.  But, with each full scan, I think it's the artist text string from the first song scanned that is stored in the artist record.  So, when you Browse Artists, you may see the artist reported differently each time a scan is done.  If you play a song that was tagged as by "Gonzaléz", it may be reported as by "Gonzalez".  If there are 100 songs with "Gonzaléz" and one with "Gonzalez", all could be shown in the app as "Gonzalez", and you would have to look at all songs in a tag editor to find the incorrect one and do a full rescan to fix it.  That's easy if all songs are in one folder, but the artist could appear as a guest on another artists album, so it can be hard to find/spot.

Comment 23 Gordon Harris 2009-07-18 12:45:19 UTC

I'm with Phil on this one.  Example: two Renaissance Portuguese composers: Duarte Lôbo and Alonso Lobo.  No problem at present, as the two show up as

Lobo, A
Lôbo, D

But what happens when Alonso's brother Dwayne takes up composing?  How do we distinguish between "Lobo, D" (ie Dwayne) and "Lôbo, D" (Duarte) without having to resort to making an exception and putting the whole first name in the tags?

So, my vote is that a diacritic difference be a "real" difference that ought to result in a separate artist record.

Comment 24 Mike Walsh 2009-07-18 12:57:38 UTC

Gordon,

i don''t agree.  while there is no doubt this bug should be reopened and the issues in the thread above should be fixed to some degree, i don't think that diacritics should be respected, by default anyway, to mean separate and different.

frankly, in your example, thats what first names are for.

Comment 25 Gordon Harris 2009-07-18 15:54:00 UTC

Yeah, but Lôbo and Lobo are different NAMES, for gods sake.  Diacritics aren't meaningless.

Comment 26 Chris Owens 2009-07-18 20:18:53 UTC

In my opinion, from keeping track of this bug and its cousins again and again since I joined the company in the Slim Devices days, the only solution to this bug worth implementing is allowing a completely customizable sort order per-user.

This would allow users to adjust the sort order to exactly their preference. For instance, as an American English speaker, it makes good sense to me (and to Mike Walsh, I would guess) that diacritics, accent marks, umlauts and whatever ought to be sorted in with their "underlying" letter, because that's how my brain remembers them. For instance, if I'm searching for "Lôbo" I want to find it immediately after "Lobo" and certainly not after "Lucky" or even "Lzubrodsk".

Depending on the language, however, these marks can have various levels of significance. In many, they do count as separate letters in the alphabet. Take an example character é. It is its own letter in some alphabets (Icelandic, Hungarian), it is an 'e' with an optional(!) mark to prevent homonym confusion in Danish, and it is a mark of a variant pronunciation in some languages (Spanish, Vietnamese).

In addition, people with different levels of fluency in different languages may want the letters that make up that alphabet sorted in ways that make sense to them. A bilingual Russian/English speaker may want the U+0414 character to come before or after the English 'D' but an English speaker may want all the non-latin characters to come after the letters he's familiar with. A Polish speaker may want q, v, and x, which are not part of their alphabet, to similarly come last.

I have no idea how Japanese people, who seem happy to adopt any character set that follows them home in the rain, might want to sort their characters, either.

I believe that this is currently implemented at the database level, so it may not be easy, but personally I feel like that is the solution we should be ultimately working towards.

Comment 27 Mike Walsh 2009-07-18 20:51:51 UTC

Chris,

i def think this bug should be reopened, or if not that, the various conflated issues should be sorted out one by one and made into new bugs.

u raise many important and interesting points re: internationalization issues, esp regarding sorting.  i wonder if there is any universal standard?

in any case, i think the following issues need dealt with:

1. the scanners lack of picking up changes on "new/changed" scans when diacritics are involved

2. a possible preference as to respecting diacritics or not, (ie. is Bjork and Björk the same, or different?)

3. some debate and wiki publishing as to the proper sort order of all possible premutations (which may involve winners and losers, but to have a public benchmark SC uses would be key and necessary, meaning even tho a diacritic could be in two places sortwise in theory, SC always has it "here")

Gordon,

the problem is most people, esp we vapid americans, don't care about or use diacritics, and so SC will get a bunch of grief support wise since the majority of users are thus.

i tend to think the answer is a preference, (oh dreaded preference!), whic is off by default, meaning diacritics aren't respected.  if you want them to be, turn it on, and it is so.  i would imagine that in my usage, as an illiterate yank, i would have it off.

Comment 28 Philip Meyer 2009-07-18 23:52:06 UTC

Chris, the problem reported here isn't so much a sorting issue as a display issue.

What is entered in tags should be what you see displayed in the app.  If I enter Bjork as the artist for most songs, and incorrectly enter Björk for another track, I'd expect to see that consistently in the app.

I would not expect one scan to report Bjork for all instances, and another rescan to show Björk for all instances.

I think the majority of people would expect them to be treated as different artists, as there is an understanding that what you put in tags is what you get.  If they don't like it, change the tags to be consistent.

Having an option would make the app really messy and may be hard to achieve, and probably a headache for support.

The best thing is to do what other apps do; which is also a good idea if you want to synchronise metadata with iTunes, MusicIP, etc.

Foobar2000, iTunes and MusicIP treat "José González" and "Jose Gonzalez" as two different artists.  If I search for "Jose", it finds "Jose" and "José".  SqueezeCenter would be best suited if it did the same.

Comment 29 Phil Leigh 2009-07-19 00:13:38 UTC

If I might interject/contribute?
I spent years working on human name matching systems for a major corp that worked in 227 countries and 173 languages. I also spent nearly a full day of my time recently helping 2 forum members who got bitten by the consequences of how SC works at the moment. My advice is:

1) forget about sort order - let the DBMS handle that. It's not the issue here.
2) the string you are searching for and the universe of strings that are matched against must be normalised to ignore both diacritic and case variations. This is how the "average" global human thinks, in the digital age.

3) When scanning and displaying search results, do not conflate any spelling,  diacritic or case variation in the ARTIST tag - So eric clapton and Eric Clapton are 2 (two) artists, as are éric clapton and eric clapton. This removes the non-deterministic nature of the current situation and exposes the true value of the tags to the end user, who then has the choice to fix/alter the tags based on their own knowledge...which the software must NOT try and second guess. This way, Scans will always return predictable results and a single "rogue" tag will not randomly infect ones entire database. The user can elect to fix their tags appropriately. SC can NEVER know what "correct" is, so don't even try and guess. If a user has artist tags of "ABBA", "Abba" and "abba" then show them. We don't know which is correct in any absolute sense. Humans hate non-deterministic behaviour (in this case, adding a new title to the database may or may not change the artist list in a predictable way, depending on what is already in the database!). 

The "average Joe" might be slightly puzzled as to why the "stupid computer" can't tell that "ABBA", "Abba" and "abba" or éric clapton and eric clapton aren't the same - but they will be even more puzzled when an erroneous composer tag of "Frank Zappa?" on a completely unrelated album changes the artists on their 50 Frank Zappa albums to "Frank Zappa?"...(this was a real case). 

5 minutes in any Tag Editor would fix most problems. At the moment it might take ages to find the underlying cause of strange issues with Artist names.

Less is more. Humans prefer simple problems with simple solutions - in this case the Support FAQ will always be "Fix your tags - what you see is what you get".

You are always saying the codebase is too bloated - then take some of this pseudo-intelligence out. 

Sorry,  didn't mean to rant, just trying to help
Phil

Comment 30 Chris Owens 2009-07-19 17:57:14 UTC

I'm always appreciative of attempts to simplify and clarify a complex issue, Phil!

I think I am in agreement with your proposal.

Comment 31 Mike Walsh 2009-07-20 02:32:27 UTC

meaning what?

personally, i don't think what 'the phils' want is what "most people" would want.  it would be to many people, including me, a manufactured nuisance.  why should i be forced to "fix" my tags if other apps work fine as is?

i think the way things are now should be the default, and if deemed appropriate, add an option, dreaded tho it may be (which is silly), to allow people to have diacritics and case even, respected.  i fully support this as AN OPTION, not as one way or the other ONLY.

one can always argue against options with the same boogeyman reasons, but sometimes they are called for.

Comment 32 Philip Meyer 2009-07-20 05:22:46 UTC

Mike, I tried a few other apps, and they all work as I would expect, i.e. artists with diacritic differences in their name are seen as different artists.

As SqueezeCenter (Squeezebox Server) integrates with other services, this in itself can cause issues in the app.  eg. iTunes and MusicIP could have two artists "ABBA" and "ÁBBA", but SC would see it as one artist (non-deterministic which one you would see).

The problem with the way things are now is that it's non-deterministic; if you have songs by artists called "ABBA", "ÁBBA", "abba", "Abba", etc, which version of the artist would you expect to see when you view a list of artists?  At the moment, you would get a single artist, but it could be any one of those names, and even worse, if you fix the tags so they all have "ABBA" as the artist, a scan for new/changed files would not change what is reported in the app.

In this example, it is obviously a typo in tag data that "ÁBBA" should be "ABBA", and the correct course of action would be to fix it (eg. change Tag data and rescan).

It is not even easy to find which song has the bad tag using SC, because SC thinks they are the same artist.

Comment 33 Mike Walsh 2009-07-20 06:21:14 UTC

Phil, i understand the issue, and agree it is an issue, you should have a way to correct it, i couldn't agree more in fact; i just diagree that your fix needs to be * forced * on everyone, ergo my belief it should be "optional."

also, winamp doesn't work like that, and neither does WMP (from what i recall).

btw, while obviously related, i think we all know and agree that the scanner is hopelessly broken picking up changes to tags when only doing a new/changed rescan, so thats more a tangental issue that probably will be fixed separately with Andy's new scanner code, (or so we hope, as it affects art changes, comp tag changes, etc...)

Comment 34 Philip Meyer 2009-07-20 12:07:13 UTC

I don't see why it should be optional.  It's currently broken, and has non-deterministic characteristics.  I mean what would the options be called: "exact equality tag matching" or "Random tag matching - you could get anything".

If it were fixed such that the last version of the artist read from tags were written to the artist record, then at least making a change to an artist would affect the content of the Squeeze Center database following a scan for changes.  At the moment, a change to diacritics in an artist tag will never be picked up until a clear and rescan was performed.

Comment 35 Chris Owens 2009-07-20 18:14:14 UTC

The main question I have remaining is: how did we end up where we are?  We clearly store the artist names correctly in the database, because the different ABBA and ÁBBA still have a non-deterministic chance of showing up.

So how and why, then, are the artists 'normalized' in the artist?  This seems to me where the bug lies.

Comment 36 Mike Walsh 2009-07-20 21:30:44 UTC

Chris,

i'm not sure if this is what you are asking, but i think the randomness isn't really random, its based on whatever SC reads last (or first) and since it doesn't delineate, it just uses that one to mean both.  so its the ile scanning order the scanner uses, and whichever, (first or last), gets used from the DB.

Phil,

you said:

"I don't see why it should be optional.  It's currently broken,"

thats a matter of opinion depending on what goals someone wants to achieve.

your fix screws people who DON'T CARE to have SC discern between the bjorks!

options aren't evil.  they allow both parties to suit themselves.

"and has non-deterministic characteristics.  I mean what would the options be called:
"exact equality tag matching" or "Random tag matching - you could get
anything"

well that depends, are they releasing a snide ver of SC?

i already alluded what the option should be called:

"Respect diacritcs (and maybe case?) as a delineating factor for denoting and sorting."  choices would be "Disabled" the default, and "Enabled" so people like you could opt in.  

my default is ALREADY the way it is and has been, and while i agree there IS an issue, its one that doesn't bother a great many folks, thus the lack of forum action or posts to this bug, a bug that had ONLY ME voting for it until the last two days or so!

"If it were fixed such that the last version of the artist read from tags were
written to the artist record, then at least making a change to an artist would
affect the content of the Squeeze Center database following a scan for changes.
 At the moment, a change to diacritics in an artist tag will never be picked up
until a clear and rescan was performed."

agreed these are problems, but again the hope is these are scanner bugs that Andy's new code fixes and so are really tertiary.

Comment 37 Philip Meyer 2009-07-21 00:39:42 UTC

Chris - no the database doesn't store artists names correctly, which is why the bug occurs.  i.e. the scanner reads an artist name from a tag, then determines whether it needs to create a new artist record or refer to an existing artist.

The artist name loosely matches an existing artist, so it uses that record.

I would imagine that the same issue would happen if importing/syncing music library from iTunes or MusicIP - there could be two similar artist names in one of those sources, but upon importing it could merge them into one.  That could cause follow-on issues (eg. passing rating information back to one of those artists).

Comment 38 Phil Leigh 2009-07-22 12:12:15 UTC

To Mike:
To be clear, I sincerely believe that the DEFAULT behaviour should be to show tags as they really are. Tag conflation is an advanced choice IMHO - I know you see it the other way.

However, I would be happy the current approach default AND an option if you want/need to disengage the totally non-deterministic ("Frank Zappa?", anyone)pseudo-intelligence of tag conflation. At least then the solution to the problems people have in this area would be reasonably straightforward:

"Turn off the Smart Alec tag handling, see your tags the way they really are and fix 'em"
Regards
Phil

I know it ain't easy. :-)

Comment 39 Mike Walsh 2009-07-27 21:28:42 UTC

Phil L,

i respect your view completely, but from where i sit it seems to me that the devs have decided (in the last year and a half or so) to try to make SC friendly to dopes, ie. people who want it to "just work" and don't want all their taging errors (if u want to call them that) and so on pointed out to them.  maybe i have an incorrect read on that, but its something i think i've correctly noticed.  i also think most people don't care at all to have their differing bjorks discerned, delineated, and dissected.

the display/scanner issue should be fixed, i see that as a separate issue.

i do think and support the notion that you should at the very least have the option to "see tags as they truly are."

thx.

Comment 40 Dennis Mutsaers 2009-08-03 11:46:29 UTC

(In reply to comment #30)
> I'm always appreciative of attempts to simplify and clarify a complex issue,
> Phil!
> I think I am in agreement with your proposal.

I think the one and only correct solution is Phil's approach... (But I'm not from the US, ;-) )

Comment 41 sbessant 2009-08-06 02:15:45 UTC

+1 from me - I had a variant of the "Frank Zappa?" problem, that drove me mad trying it figure it out. 

In my case it was "miles Davis" as an artist.  I tried everything to make it "Miles Davis" but failed.  Phil helped me fix it by directing me to it likely being another tag that SC was scanning and using as artist - in this instance, it turned out that it was a composer tag in a track by a different artist (closer to the beginning of the alphabet if that is relevant in terms of priority when scanning), that had the capitalization error in it.  

I wasted a lot of time manipulating the tags on the (correctly tagged) Miles Davis files, when I should have been looking somewhere else - and would have been able to do so immediately if SC had given me two separate artists.

Comment 42 Mike Walsh 2009-08-06 10:47:14 UTC

things at this point are just being repeated.

the fact is, like it or not, most people aren't this anal.  regardless, for those who are, there should be an option, i completely agree with that.

however, the idea there is only "ONE correct" approach is nonsense.  different people have different approaces with differing goals, thats why we have options, and thats why i'm in favor of having both methodologies supported.

Comment 43 Philip Meyer 2009-08-06 11:54:28 UTC

>however, the idea there is only "ONE correct" approach is nonsense.
No, it's not (in this case).

It seems perfectly reasonable that what you put in your tags is what you should expect to see in the app.  This is what happens in other apps (iTunes, Foobar2000, MusicIP which I use all treat Jose and José as different artists) and that leads to problems with sharing library content - eg. iTunes/MusicIP inport/export.

I've responded to loads of people recently about this very issue.  It would reduce confusion and calls for help if what you see is what you get.

Comment 44 Mike Walsh 2009-08-06 12:12:01 UTC

and no one would accuse you of being anal, right phil?

winamp doesn't do it like that, neither does WMP (as i recall).

your inability to see that everyone doesn't need to do it like you, on any number of things, is already well established.  some want options for everyone to be happy, [me] some are only happy if everyone does it via the ONLY option they bless and sanctify. [guess who]

you see phil, having an OPTION wouldn't STOP YOU from doing what you're already doing.

Comment 45 Philip Meyer 2009-08-06 13:50:04 UTC

I tried WMP, and that too works the same as iTunes, Foobar2000, MusicIP, etc.  Any difference in artist tag means it is reported as a different artist in the WMP music library. 

I don't have WinAmp installed any more, so I can't try that one.

I have no problem with options, when they are sensible.

But, this is not particularly sensible, I feel.  It currently doesn't work very well.  It's confusing people.  No other app has a similar option for case sensitive/insensitive grouping of metadata - to have such a setting is quite geeky, when Logitech are trying to move away from geekyness.

It's not about me - in fact, I have no current problem to be solved here, because I have already gone through the pain of correcting all my artist tags (I think) to be consistent.  So no, I'm not doing this for myself, but trying to fix the problem for others.

Can't you reply to anything without turning discussions into slagging matches?

Comment 46 Mike Walsh 2009-08-06 14:36:19 UTC

oh gimmie a break.  some people are anal, right?  obviously you are.  and some aren't.  i'd say i am, but not to this degree.  its NOT a slight, its a description of how important details are or aren't to someone.  NO OFFENSE INTENDED.

having said that, you don't respect other POV.  its your way or the highway.  my POV by comparison, allows for both methods, (ie. respect diacritics and/or case, or not) and i might point out, that it ALREADY works the way i think most people would want/expect it, which btw, IS the "less geeky" way.

unlike you, that doesn't stop me from supporting an option to have it work otherwise, for those who want it the way you do.

Comment 47 Philip Meyer 2009-08-06 14:44:33 UTC

You have your view, I have mine.  Stop telling me what I am or am not.  Stop filling this bug report with non-constructive whinging.  Put it in the forum if you must.  This is not the place.

Comment 48 Mikael Nyberg 2010-01-31 03:49:57 UTC

Can this be reopened Bjork is not Björk, this is an error and therefore a bug that needs fixing.

Sbs should not try to second guess artist at all thats up to me and my tagging abilities, all artist should be their own artist, not some strangely merged oxymoron ? this is to much "smartness" in the software accept the artist or album artist tag for what it is, the artist the end user intended it to be.

When i got my 15 different Tjajkovskijs or Bartok after a scan, I could just adjust my tags and rescan, how could this ever be considered a problem that the server had to second guess and fix? this is the bug.

Comment 49 Gordon Harris 2010-01-31 08:43:18 UTC

I agree.  The present system rewards lazy taggers and penalizes careful ones.  As it stands, it's very hard for the conscientious careful tagger to figure out where he/she went wrong.  How hard would it be to have an option to turn off the 'lexical folding'?

Comment 50 Mikael Nyberg 2010-01-31 08:59:11 UTC

(In reply to comment #49)
> I agree.  The present system rewards lazy taggers and penalizes careful ones. 
> As it stands, it's very hard for the conscientious careful tagger to figure out
> where he/she went wrong.  How hard would it be to have an option to turn off
> the 'lexical folding'?

It penalize lazy taggers even more.

Insted of an obvious problem, example

Bjork
Björk

Dang some of my Björk albums have wrong tags, and i can see which ones great.

Now we have a complete mystery.

Bjork

Wot ? all albums have the right tags, except the one tune in a movie soundtrack you ripped before getting any proper albums, you find that album 14 days later after elaborate detective work.

My take on it is that nobody wins as it is now, i can not find any user case that benefits by the way it works now. I'll take the obvious problem over the mystery tag any day. The person with an more relaxed attitude would probably not care if he had several entries for the same artist "I'll fix that later" ?

Comment 51 Mike Walsh 2010-01-31 12:44:56 UTC

there is clearly a difference between europeans and americans on this subject.  (and i'm not saying agreement of the two groups is monolithic either, but in general holds true)

americans simply aren't that concerned with diacritics.  europeans clearly are.

instead of all of us arguing over it, why not have an option?

if europeans want their bjorks differentiated, they should be able to do so, and if americans don't want them differentiated, they should NOT be forced to have them so.

i really don't see whats controversial about that POV.  other issues are in this bug of course, but i see them as separate issues.

Comment 52 Mikael Nyberg 2010-01-31 12:54:56 UTC

An option is ok with me, but logitech seems to dislike to many settings in the software ?
Options are not evil they could be somewhere under advanced. it could default to what it is now, to not disturb everybody happy with the current order.

Comment 53 Chris Owens 2010-02-01 11:06:55 UTC

Obviously the current target of 7.3.3 is not going to work.

Comment 54 Chris Owens 2010-02-08 09:56:51 UTC

These are all bugs that have been marked 'fixed' or 'closed' but still have the bug_meeting keyword.  They were not showing up in James's bug meeting search.

Please let me know (chris_owens@logitech.com) if these need to be brought up at the next bug meeting.  In the meantime I will fix the search.

Comment 55 Philip Meyer 2010-02-08 17:20:47 UTC

This issue is not fixed, and Andy agreed in a recent forum thread that it ought to be changed.

If it wasn't showing up in the bug meeting search before, it certainly won't now that the "bug_meeting" keyword has been removed!  Should the bug be reopened and the keyword added back?

Comment 56 Chris Owens 2010-02-08 18:23:45 UTC

So what is the action here?  To add an option for handling new artists 'fuzzily' or strictly?  New options aren't going in any time soon.

I'd prefer to simplify and simply handle them strictly and create a new DB entry for differently spelled artists.

Comment 57 Michael Herger 2010-02-08 21:37:04 UTC

As I was the one to add the bug_meeting flag (and forget about it...): I think we should stop normalizing artist names. Get your tags straight and forget the magic.

Comment 58 Michael Herger 2010-02-08 21:37:48 UTC

Let's try this again :-)

Comment 59 Philip Meyer 2010-02-14 04:21:14 UTC

In addition, I see that punctuation is being treated specially in genres.

I have been trying to standardise genre tags.  e.g. I have a mix of "Folk-Rock" and "Folk Rock".  These are all lumped into the same genre, which depending on the order of a full rescan will be one or the other (non deterministic).

Whilst this may seem like a minor point, it can be a problem if the genre name is used for other things.  eg. Stored as a favorite, the genre name might not be found after a full rescan.

A better example of the problem is with integration with MusicIP which does differentiate between "Folk Rock" and "Folk-Rock", so a mix on genre="Folk Rock" won't play "Folk-Rock" songs.

It would be better if the scanner didn't ignore punctuation in genre tags - leave it up to the user to standardise the tags.

Comment 60 Chris Owens 2010-02-18 09:29:23 UTC

Andy to change the handling so that accented artists are separated into new artists.

Comment 61 Andy Grundman 2010-02-18 09:30:31 UTC

7.5 if there's time.

Comment 62 Chris Owens 2010-03-08 11:17:27 UTC

Moving P3 and lower bugs to next release target

Comment 63 Andy Grundman 2010-08-28 04:58:23 UTC

*** Bug 16488 has been marked as a duplicate of this bug. ***

Comment 64 Jassel 2011-02-09 14:19:26 UTC

This bug is bugging me right now.
I have albums of two different artists, one named "MUM" and others "múm".
Right now they are displayed as being the same artist

Comment 65 Lars Simonsen 2011-02-11 04:11:54 UTC

(In reply to comment #12)
> > you should stop SC from ever matching artist names that aren't
> > exactly the same.
> 
> I couldn't have said it better! We should never have tried to match misspelled
> names, as we'll never be able to do it correctly. Let's stop here before it
> gets even worse. Please.

+1 

Fix the error by removing the code that matches names (KISS). Bonus: Get better functionality (WYSIWYG).

Comment 66 SVN Bot 2011-02-11 09:00:51 UTC

 == Auto-comment from SVN commit #31901 to the slim repo by agrundman ==
 == http://svn.slimdevices.com/slim?view=revision&revision=31901 ==

Fixed bug 10324, don't merge artists or genres unless they have the exact same name

Comment 67 Mike Walsh 2011-02-11 13:11:31 UTC

so Bjork and Björk are now different artists as far as SBS is concerned, and its on the user to fix their tags.  gonna annoy some americans but so be it.

but what else?  will SBS now say Bjork and bjork are separate artists?  (meaning, a difference in case; or does it already?)

until now, SBS has always, rightly imo, ignored/truncated trailing spaces, is that now going to change as well?  that would be awful!

Comment 68 Andy Grundman 2011-02-11 13:35:25 UTC

It's now doing an exact match on the artist/genre name.  Can't please everyone...

Comment 69 Mike Walsh 2011-02-11 14:13:19 UTC

meaning that case differences will result in separate artists, and that trailing spaces, which have always been ignored, now suddenly won't be?

if so, respecting trailing spaces not only is dumb, but will not AT ALL be obvious as to whats causing duplicate appearances.

Comment 70 Jim McAtee 2011-02-11 14:31:42 UTC

I have no real problems with saying 'José Muldoon' is not the same as 'Jose Muldoon', although they're almost certainly intended to be the same.  A larger concern is the effect that this change could have on user searches:

- Running SQLite, 'jose' no longer matches 'José'.

- Running MySQL, it still does. :)

Maybe there's a way to fix the behavior in SQLite.  If it can't be fixed then to my thinking this change is creating a usability problem to fix a cosmetic one.

Comment 71 Jim McAtee 2011-02-11 14:35:50 UTC

And what does it really _fix_?

On the one hand someone asks why José Muldoon appears as Jose Muldoon in their library and you tell them to check their tags and fix them if they want to correct the problem.  With this change the same user asks why they now have both José Muldoon and Jose Muldoon appearing their library and you tell them to check their tags and fix them if they want to correct the problem.

You've just traded one problem for another, arguably bigger one. It will bother _many_ more people than the original cosmetic problem. I can guarantee it.

Comment 72 Uwe Schröder 2011-02-11 14:37:31 UTC

I totally welcome this change. There are several artists that are only
distinguished by case (like SToA vs. sToa, or MIA vs. Mia) and therefore
currently mixed up.

Making trailing spaces significant could be used as an unobtrusive way to
distinguish between artists whose names are exactly the same; for example, one
could tag "Morgenstern" and "Morgenstern " instead of "Morgenstern" and
"Morgenstern (2)", so I welcome that too.

However, as far as I know string comparisons are case-insensitive by default in
MySQL, so this might require a change in the database backend as well to
actually work.

Comment 73 Andy Grundman 2011-02-11 14:39:16 UTC

This does not change the way searches work.  The database has both 'name' and
'namesearch' fields, namesearch is an all-caps normalized version of name. 
Previously namesearch was being used when looking up an artist, all I did was
change that lookup to use the name field instead.

Leading/trailing spaces are still stripped before the name is saved, as before.

Please stop spreading FUD in this bug without even trying the change or looking at the checkin.  Take it to the forum if you want to keep whining about it.

Comment 74 Uwe Schröder 2011-02-11 14:41:54 UTC

(In reply to comment #71)
> And what does it really _fix_?

It fixes the case when different artists are correctly tagged by the user, but mixed up by SBS nevertheless.

Comment 75 Mike Walsh 2011-02-11 15:09:01 UTC

Andy, i was just asking * questions * based on what you said.

you said "EXACT" match...  well, that would/could include spaces, so i just sought clarification.

i just wanted to know what the new expectations were for expected SBS behavior with this change, thats all.  i don't consider that whining.  i have no desire to annoy you, you're the best logitech has imo.  SBS behavior is rather complex and undocumented, so i just wanted to know how u intended things to work now.

i'm still confused as to whether case changes will or won't be respected?  i'm happy to hear spaces won't be.

btw, i'll leave you and Jim to sort his issues out, but my reading of what he wrote seemed to say that a search worked one way on mysql, and another on sqlite.

again, i appreciate your efforts and clarifications, as always.

Comment 76 Philip Meyer 2011-02-11 15:22:03 UTC

(In reply to comment #73)
Thanks Andy for finding some time to fix this issue; I will appreciate it (if I ever get 7.6 to work as well as 7.5.x).

Any chance of merging change to 7.5.x in the mean time?

Comment 77 Jim McAtee 2011-02-11 15:45:27 UTC

(In reply to comment #73)
> This does not change the way searches work.

I did test it, but didn't realize the SQLite search bug already existed. Bug 16956.

Comment 78 Mikael Nyberg 2011-02-11 22:02:47 UTC

Yep i do welcome the changes that makes sbs distinguishe betwenn Björk and Bjork and Jose Gonsales or José González .

But the limited search options is annoying .

The controller or Ir interface does not have ö or é among other things.

Web-UI has what chars you can find on the keyboard so it is not entirely impossible to search which it is on the players rigth now.

iPeng or SqueezePad can use what you have for keys on your ipad.

Hopefully thet other bug is going to deal with how to find Björk with your controller :)

Comment 79 peer 2011-02-12 02:41:42 UTC

(In reply to comment #71)
> And what does it really _fix_?
> 
> You've just traded one problem for another, arguably bigger one. It will bother
> _many_ more people than the original cosmetic problem. I can guarantee it.

I think Philip Meyer, Gordon Harris and others have already explained what the problem is and what it fixes. I'll give another example: 

Would you consider the difference between Dan Johnson and Don Johnson to be insignificant and cosmetic? If so, do you care if SBS presents Don Johnson as Dan Johnson, or Dan Johnson as Don Johnson? Or would you like to at least to control  which spelling wins?

Please understand that there are other letters than a-z. If SBS is to serve people with another native language than English, listening to artists named in another language than English, this fix needs to be applied. It is not a cosmetic issue. 

Thanks for fixing this!

-- Peer

Comment 80 Mike Walsh 2011-02-12 02:53:48 UTC

like i said earlier, americans view this one way, europeans another.

an option [to differentiate diacritics or not] would serve both.

jim is right that a lot of people, over here anyway, will not appreciate having their bjorks differentiated.  as he said, its trading a mostly [but not always] cosmetic issue for a very real functional issue.

btw, other questions remain...  the sqlite/mysql search problems, will case be a differentiation point, will the scanner/DB/display update properly when changes to tags are made, will searches work properly (even if sqlite is fixed), etc...

my intention is not to whine as andy put it, i just seek clarifications and want to add my view, and i think thats more than reasonable.

Comment 81 peer 2011-02-12 03:23:41 UTC

(In reply to comment #80)
> like i said earlier, americans view this one way, europeans another.
> 
> an option [to differentiate diacritics or not] would serve both.
> 
> jim is right that a lot of people, over here anyway, will not appreciate having
> their bjorks differentiated.  as he said, its trading a mostly [but not always]
> cosmetic issue for a very real functional issue.

Yes, and there are other people than americans and europeans :-)

But I agree that one might want a personal setting to drive this; I too would often like to ignore what I consider to be "cosmetic" differences in spelling, either because my tags are in a mess, or because I don't know the proper spelling, or because I lack certain characters on my control (keyboard, remote, ...). 

The issue here is where this is addressed. IMO these "personal preferences" belong in the UI, and apply when you search or browse. Any decent UI will let you search and browse even if you mispel some words ;-)

In absence of a personal setting, I would probably be quite happy if the UI ignored all these differences when searching, as long as the DB respects the differences and all entries are presented properly.

Comment 82 Lars Simonsen 2011-02-12 03:37:16 UTC

Name matching should happen in real time when the user performs a search, not when the system builds the database.

Yes, some people will always be annoyed when error-hiding code is removed. They should not be annoyed that the errors are showing, though; they should be annoyed that the errors were hidden in the first place. 

It may be necessary to explain this to people, but it shouldn't be all that difficult.

Comment 83 Philip Meyer 2011-02-12 03:54:30 UTC

The difference is that it will now be easy to discover problems with tags and find the faulty files to change.  The user can enter what they want in tags and know with confidence that they will find them in the music library exactly the same.

Whereas before when the user discovered a fault in the music library content, it was really hard to find the matching source music file that caused that fault.

SBS now matches the way that most other software treats diacritics - what you tag is what you see, but searching will still be loose approximation, so it is easy to find things.

Comment 84 peer 2011-02-12 15:35:15 UTC

(In reply to comment #83)
(In reply to comment #82)
Agree!

Comment 85 Mike Walsh 2011-02-12 18:20:01 UTC

i just want to make clear that i don't "disagree" ...rather, i simply made the case for an option, but it won't bother me if they don't do it, it will be their problem to contend with, not mine.  i still would like to know how case is/will be handled, and i def think bug 16956 should have gotten resolved prior to this one, since searching is not handled properly/as described here.

Comment 86 Mike Walsh 2011-03-09 22:47:46 UTC

(In reply to comment #73)
> This does not change the way searches work.  The database has both 'name' and
> 'namesearch' fields, namesearch is an all-caps normalized version of name. 
> Previously namesearch was being used when looking up an artist, all I did was
> change that lookup to use the name field instead.
> Leading/trailing spaces are still stripped before the name is saved, as before.
> Please stop spreading FUD in this bug without even trying the change or looking
> at the checkin.  Take it to the forum if you want to keep whining about it.

i haven't checked it, and maybe things happening aren't b/c of this bug, HOWEVER there are reported problems with case, spaces, etc...

http://forums.slimdevices.com/showthread.php?t=86176

can someone please explain whats going on?

Comment 87 Paul Chandler 2011-05-09 10:28:35 UTC

Correctly searched on artists with AND without accecnt characters----Search found the match with accents even if I did not use them in the search (this is correct)