Bug 14386 - Scanning and playback broken on flacs with file names containing certain character combinations
: Scanning and playback broken on flacs with file names containing certain char...
Status: CLOSED FIXED
Product: Logitech Media Server
Classification: Unclassified
Component: Scanner
: 7.4.0
: All All
: P1 normal with 1 vote (vote)
: 7.5.0
Assigned To: Andy Grundman
:
Depends on: 14328
Blocks:
  Show dependency treegraph
 
Reported: 2009-09-28 22:49 UTC by Gordon Harris
Modified: 2010-04-08 17:26 UTC (History)
4 users (show)

See Also:
Category: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Gordon Harris 2009-09-28 22:49:59 UTC
Please see: http://forums.slimdevices.com/showthread.php?t=68419 for examples.

1). For flacs with embedded cuesheets, the 7.4 scanner incorrectly UTF8 percent encodes the url value for tracks IF the full path contains diacritic characters in BOTH the file name and a folder name.  Generally, the scanner is double-UTF8 encoding the folder name portion of the url.

2). For flacs with embedded cuesheets, SBS cannot "see" a flac whose file name contains an "&" ampersand for artwork extraction or playback, even though the url value stored in the db is correct.  However, if the filename containing the ampersand ALSO contains a diacritic, OR if a folder name in the file's path contains a diacritic, THEN the file with the "&" can be "seen".  Except not if BOTH the filename and a folder name contain diacritics.  Then bug 1) above gets triggered.
Comment 1 Gordon Harris 2009-09-28 23:46:09 UTC
Here are urls to zip files containing flacs which demonstrate the problem:

http://www.hegardtfoundation.org/flacstuff/linux_problem_flacs.zip

http://www.hegardtfoundation.org/flacstuff/win_problem_flacs.zip
Comment 2 Marc Auslander 2009-09-29 06:13:24 UTC
See bug_14328
Comment 3 Gordon Harris 2009-09-29 10:44:55 UTC
I'm not sure what it means to target the milestone as 7.5.  Does this mean that this won't make it into any 7.4.x releases or the 7.4 trunk?  This bug means that 10% of my library is unusable with 7.4.
Comment 4 Andy Grundman 2009-09-29 10:48:49 UTC
I agree this should be fixed in a 7.4.x release.
Comment 5 Gordon Harris 2009-09-30 11:21:53 UTC
I'm starting to suspect that another part of the problem here is that the FLAC cuesheet parser in the new scanner code is having trouble with the FILE entry.  I believe that in past versions of SC, Dan solved this problem by basically ignoring the FILE data, always substituting the actual filename as picked up by the scanner.
Comment 6 Andy Grundman 2009-09-30 11:26:36 UTC
How can you ignore the FILE entry, how do you know which file the cue sheet is for if you do that?
Comment 7 Gordon Harris 2009-09-30 11:54:36 UTC
How do you know what the file name actually is?  Because it's an embedded cuesheet.  The scanner is reading it from a flac file. Presumably, the scanner knows what file it's currently reading.  If I was better at searching bugzilla or the check-ins, I'd find the changes that Dan made when he abandoned relying on FILE.  (At least, that's my recollection of what he did.  I remember that it seemed to solve a lot of ills.)

Anyway, it's true that *a few* of my embedded cuesheets contain FILE data that don't match the actual file name.  And in general, I don't have a problem with the scanner enforcing FILE == filename, though on the face of it, this seems a little discriminatory: you don't require an MP3 to be retagged just because someone changes it's filename.  But I do think that requiring FILE == filename is potentially a can of worms.  A "compliant" cuesheet embedded in a flac will always be UTF8 encoded.  But the underlying file system where the file resides may not be (e.g. Windows).  It seems like it would be easier for the scanner to just use the actual file name it "knows" vs having to decide to translate the FILE utf8 data or not.

SC 7.0 through 7.3.x haven't had issues with FILE != actual filename and all my embedded cuesheets have been parsed by the scanner flawlessly.  So I have to believe that the old scanner code ignored FILE in favor of the actual filename of the flac being scanned.

BTW, I think FILE==x is a subset of the bug I'm reporting here.  It's the double-UTF8 encoding of folder names in the db track table url field when the file includes diacritics too that's causing most of the problems for me.  You might want to look at what Michael went through with bug 7547 ( https://bugs-archive.lyrion.org/show_bug.cgi?id=7547 ).
Comment 8 Andy Grundman 2009-09-30 12:14:16 UTC
OK, I agree, embedded cue sheets can ignore FILE (standalone ones can't).

I do see the encoding issue with artwork.  Is artwork the only issue with these files?  They seem to scan fine other than that.
Comment 9 Gordon Harris 2009-10-01 15:05:52 UTC
Artwork + playback for me.  They won't play if the url field points to something non-existent.  But, yes, all the other metadata seems to parse out perfectly.
Comment 10 Andrey 2009-10-06 00:50:37 UTC
Also have a big library of FLACs with non-Latin characters in URL.
I have all symptoms described by Gordon. Here is some additional info.
Only FLACs with embedded cue-sheets are affected. And only if non-Latin character in the file path. FLACs as individual tracks aren't affected. FLACs with all Latin path but non-Latin file names are Ok. I think Slim::Utils::Misc::fixPath procedure is a root cause.
Please check excerpt from my scanner.log
00:23:39.0916 - correct path UTF-8 encoded
00:23:39.1031 - correctly encoded path and URL
00:23:39.1046 - correct path and URL
00:23:39.1060 - correct
00:23:39.1390 - correct
00:23:39.1401 - correct
00:23:39.1427 - WRONG. See how %D characters in URL changed to %C - %C is incorrect. Though flac file name in URL is still properly encoded (see bunch of %D) but path is incorrectly encoded now (see bunch of %C).
00:23:39.1437 - correct
00:23:39.1657 - path is incorrect file name is correct
00:23:39.1675 - path is incorrect file name is correct again

I hope this will help to pin point the problem.


[09-10-06 00:23:39.0916] Slim::Utils::Scanner::scanDirectory (328) Scanning: /home/anry/testm/Russian/��аво/1992 - �о�ков�кий би�/�о�ков�кий би�.flac
[09-10-06 00:23:39.1031] Slim::Utils::Misc::pathFromFileURL (250) Got /home/anry/testm/Russian/%D0%91%D1%80%D0%B0%D0%B2%D0%BE/1992%20-%20%D0%9C%D0%BE%D1%81%D0%BA%D0%BE%D0%B2%D1%81%D0%BA%D0%B8%D0%B9%20%D0%B1%D0%B8%D1%82/%D0%9C%D0%BE%D1%81%D0%BA%D0%BE%D0%B2%D1%81%D0%BA%D0%B8%D0%B9%20%D0%B1%D0%B8%D1%82.flac from file url file:///home/anry/testm/Russian/%D0%91%D1%80%D0%B0%D0%B2%D0%BE/1992%20-%20%D0%9C%D0%BE%D1%81%D0%BA%D0%BE%D0%B2%D1%81%D0%BA%D0%B8%D0%B9%20%D0%B1%D0%B8%D1%82/%D0%9C%D0%BE%D1%81%D0%BA%D0%BE%D0%B2%D1%81%D0%BA%D0%B8%D0%B9%20%D0%B1%D0%B8%D1%82.flac
[09-10-06 00:23:39.1046] Slim::Utils::Misc::pathFromFileURL (267) Extracted: /home/anry/testm/Russian/��аво/1992 - �о�ков�кий би�/�о�ков�кий би�.flac from file:///home/anry/testm/Russian/%D0%91%D1%80%D0%B0%D0%B2%D0%BE/1992%20-%20%D0%9C%D0%BE%D1%81%D0%BA%D0%BE%D0%B2%D1%81%D0%BA%D0%B8%D0%B9%20%D0%B1%D0%B8%D1%82/%D0%9C%D0%BE%D1%81%D0%BA%D0%BE%D0%B2%D1%81%D0%BA%D0%B8%D0%B9%20%D0%B1%D0%B8%D1%82.flac
[09-10-06 00:23:39.1060] Slim::Utils::Scanner::scanDirectory (340) Adding file:///home/anry/testm/Russian/%D0%91%D1%80%D0%B0%D0%B2%D0%BE/1992%20-%20%D0%9C%D0%BE%D1%81%D0%BA%D0%BE%D0%B2%D1%81%D0%BA%D0%B8%D0%B9%20%D0%B1%D0%B8%D1%82/%D0%9C%D0%BE%D1%81%D0%BA%D0%BE%D0%B2%D1%81%D0%BA%D0%B8%D0%B9%20%D0%B1%D0%B8%D1%82.flac to database.
[09-10-06 00:23:39.1390] Slim::Utils::Misc::stripRel (594) Original: /home/anry/testm/Russian/Ð�Ñ�аво/1992 - Ð�оÑ�ковÑ�кий биÑ�/Московский бит.flac
[09-10-06 00:23:39.1401] Slim::Utils::Misc::stripRel (600) Stripped: /home/anry/testm/Russian/Ð�Ñ�аво/1992 - Ð�оÑ�ковÑ�кий биÑ�/Московский бит.flac
[09-10-06 00:23:39.1427] Slim::Utils::Misc::fixPath (577) Fixed: Московский бит.flac to file:///home/anry/testm/Russian/%C3%90%C2%91%C3%91%C2%80%C3%90%C2%B0%C3%90%C2%B2%C3%90%C2%BE/1992%20-%20%C3%90%C2%9C%C3%90%C2%BE%C3%91%C2%81%C3%90%C2%BA%C3%90%C2%BE%C3%90%C2%B2%C3%91%C2%81%C3%90%C2%BA%C3%90%C2%B8%C3%90%C2%B9%20%C3%90%C2%B1%C3%90%C2%B8%C3%91%C2%82/%D0%9C%D0%BE%D1%81%D0%BA%D0%BE%D0%B2%D1%81%D0%BA%D0%B8%D0%B9%20%D0%B1%D0%B8%D1%82.flac
[09-10-06 00:23:39.1437] Slim::Utils::Misc::fixPath (579) Base: /home/anry/testm/Russian/��аво/1992 - �о�ков�кий би�
[09-10-06 00:23:39.1657] Slim::Utils::Misc::pathFromFileURL (250) Got /home/anry/testm/Russian/%C3%90%C2%91%C3%91%C2%80%C3%90%C2%B0%C3%90%C2%B2%C3%90%C2%BE/1992%20-%20%C3%90%C2%9C%C3%90%C2%BE%C3%91%C2%81%C3%90%C2%BA%C3%90%C2%BE%C3%90%C2%B2%C3%91%C2%81%C3%90%C2%BA%C3%90%C2%B8%C3%90%C2%B9%20%C3%90%C2%B1%C3%90%C2%B8%C3%91%C2%82/%D0%9C%D0%BE%D1%81%D0%BA%D0%BE%D0%B2%D1%81%D0%BA%D0%B8%D0%B9%20%D0%B1%D0%B8%D1%82.flac from file url file:///home/anry/testm/Russian/%C3%90%C2%91%C3%91%C2%80%C3%90%C2%B0%C3%90%C2%B2%C3%90%C2%BE/1992%20-%20%C3%90%C2%9C%C3%90%C2%BE%C3%91%C2%81%C3%90%C2%BA%C3%90%C2%BE%C3%90%C2%B2%C3%91%C2%81%C3%90%C2%BA%C3%90%C2%B8%C3%90%C2%B9%20%C3%90%C2%B1%C3%90%C2%B8%C3%91%C2%82/%D0%9C%D0%BE%D1%81%D0%BA%D0%BE%D0%B2%D1%81%D0%BA%D0%B8%D0%B9%20%D0%B1%D0%B8%D1%82.flac
[09-10-06 00:23:39.1675] Slim::Utils::Misc::pathFromFileURL (267) Extracted: /home/anry/testm/Russian/�����°�²�¾/1992 - ���¾���º�¾�²���º�¸�¹ �±�¸��/�о�ков�кий би�.flac from file:///home/anry/testm/Russian/%C3%90%C2%91%C3%91%C2%80%C3%90%C2%B0%C3%90%C2%B2%C3%90%C2%BE/1992%20-%20%C3%90%C2%9C%C3%90%C2%BE%C3%91%C2%81%C3%90%C2%BA%C3%90%C2%BE%C3%90%C2%B2%C3%91%C2%81%C3%90%C2%BA%C3%90%C2%B8%C3%90%C2%B9%20%C3%90%C2%B1%C3%90%C2%B8%C3%91%C2%82/%D0%9C%D0%BE%D1%81%D0%BA%D0%BE%D0%B2%D1%81%D0%BA%D0%B8%D0%B9%20%D0%B1%D0%B8%D1%82.flac
Comment 11 Andrey 2009-10-14 21:53:02 UTC
I saw couple of non-Latin character encoding issues were fixed in 7.4.1 r28825
and gave it a try. Scanner bug described here is still presented in 7.4.1 r28825
Can you take a look at it please? Can you reproduce it? If not let me know and I can provide more information.
Comment 12 Gordon Harris 2009-11-11 08:32:29 UTC
I'm now not fully confident that my initial issue #2 (about filenames with "&" in them) is 100% reproduce-able.  Issue #1, though, is completely reproduce-able and the evidence for the bug is to be plainly seen in the track.url field in the db...i.e. you can see the double-utf8 encoding of the directory name part of the path.

I'll file a separate bug for issue #2 if I'm still seeing that with any consistency after issue #1 gets addressed.
Comment 13 Andy Grundman 2009-11-11 09:00:47 UTC
Is the only difference between the Linux and Windows versions of your zip files in the filename encoding?  Can you give more details about how these zips were created, etc?

$ unzip -l win_problem_flacs.zip 
Archive:  win_problem_flacs.zip
  Length     Date   Time    Name
 --------    ----   ----    ----
        0  09-29-09 00:01   Music_test/a_Medieval/
        0  09-29-09 00:19   Music_test/a_Medieval/Du Fay, G/
   200269  09-29-09 00:17   Music_test/a_Medieval/Du Fay, G/L'Arbre de Mai - Chansons & dances au temps de Guillaume Dufay - All?gorie.flac
   182604  09-29-09 00:17   Music_test/a_Medieval/Du Fay, G/O gemma, lux - Huelgas Ensemble.flac
        0  09-29-09 00:03   Music_test/b_Renaissance/
        0  09-29-09 00:19   Music_test/b_Renaissance/Cabez?n, A/
   185601  09-29-09 00:17   Music_test/b_Renaissance/Cabez?n, A/Instrumental Works - Hesp?rion XX, Jordi Savall.flac
   178626  09-29-09 00:17   Music_test/b_Renaissance/Cabez?n, A/Tientos & Glosados - Ensemble Accentus, Thomas Wimmer.flac
        0  09-29-09 00:19   Music_test/b_Renaissance/Kerle, J/
   199432  09-29-09 00:17   Music_test/b_Renaissance/Kerle, J/Missa Da pacem Domine & Motets - Huelgas-Ensemble, Paul Van Nevel.flac
        0  09-29-09 00:02   Music_test/f_French_Baroque/
        0  09-29-09 00:19   Music_test/f_French_Baroque/Cl?rambault, L-N/
   187635  09-29-09 00:18   Music_test/f_French_Baroque/Cl?rambault, L-N/Les d?esses outrag?es - Agn?s Mellon, Barcarole.flac
   202516  09-29-09 00:18   Music_test/f_French_Baroque/Cl?rambault, L-N/The Triumph of Apollo - Les Elments.flac
        0  09-29-09 00:02   Music_test/l_Modern_Central_European/
        0  09-29-09 00:19   Music_test/l_Modern_Central_European/Bart?k, B/
   175195  09-29-09 00:18   Music_test/l_Modern_Central_European/Bart?k, B/Clarinet Trios - Peterkov?, Demeterov?, Cibulkov?.flac
   175022  09-29-09 00:18   Music_test/l_Modern_Central_European/Bart?k, B/Concertos - Berliner Philharmoniker, Pierre Boulez.flac
        0  09-29-09 00:19   Music_test/l_Modern_Central_European/Dohn?nyi, E/
   187608  09-29-09 00:18   Music_test/l_Modern_Central_European/Dohn?nyi, E/Chamber Works - Piano Quintet, Op. 1 & Sextet Op. 37 - Andr?s Schiff, Tak?cs Quartet.flac
   230395  09-29-09 00:18   Music_test/l_Modern_Central_European/Dohn?nyi, E/Orchestral Works - English Sinfonia, John Farrer.flac
        0  09-29-09 00:42   Music_test/
 --------                   -------
  2104903                   22 files

$ unzip -l linux_problem_flacs.zip 
Archive:  linux_problem_flacs.zip
  Length     Date   Time    Name
 --------    ----   ----    ----
        0  09-29-09 02:20   Music_test/
        0  09-29-09 02:20   Music_test/a_Medieval/
        0  09-29-09 02:20   Music_test/a_Medieval/Du Fay, G/
   200269  09-29-09 02:17   Music_test/a_Medieval/Du Fay, G/L'Arbre de Mai - Chansons & dances au temps de Guillaume Dufay - Allégorie.flac
   182604  09-29-09 02:17   Music_test/a_Medieval/Du Fay, G/O gemma, lux - Huelgas Ensemble.flac
        0  09-29-09 02:20   Music_test/b_Renaissance/
        0  09-29-09 02:20   Music_test/b_Renaissance/Cabezón, A/
   185601  09-29-09 02:17   Music_test/b_Renaissance/Cabezón, A/Instrumental Works - Hespèrion XX, Jordi Savall.flac
   178626  09-29-09 02:17   Music_test/b_Renaissance/Cabezón, A/Tientos & Glosados - Ensemble Accentus, Thomas Wimmer.flac
        0  09-29-09 02:20   Music_test/b_Renaissance/Kerle, J/
   199432  09-29-09 02:17   Music_test/b_Renaissance/Kerle, J/Missa Da pacem Domine & Motets - Huelgas-Ensemble, Paul Van Nevel.flac
        0  09-29-09 02:20   Music_test/f_French_Baroque/
        0  09-29-09 02:20   Music_test/f_French_Baroque/Clérambault, L-N/
   187635  09-29-09 02:18   Music_test/f_French_Baroque/Clérambault, L-N/Les déesses outragées - Agnès Mellon, Barcarole.flac
   202516  09-29-09 02:18   Music_test/f_French_Baroque/Clérambault, L-N/The Triumph of Apollo - Les Elments.flac
        0  09-29-09 02:25   Music_test/l_Modern_Central_European/
        0  09-29-09 02:20   Music_test/l_Modern_Central_European/Bartók, B/
   175195  09-29-09 02:18   Music_test/l_Modern_Central_European/Bartók, B/Clarinet Trios - Peterková, Demeterová, Cibulková.flac
   175022  09-29-09 02:18   Music_test/l_Modern_Central_European/Bartók, B/Concertos - Berliner Philharmoniker, Pierre Boulez.flac
        0  09-29-09 02:20   Music_test/l_Modern_Central_European/Dohnányi, E/
   187608  09-29-09 02:18   Music_test/l_Modern_Central_European/Dohnányi, E/Chamber Works - Piano Quintet, Op. 1 & Sextet Op. 37 - András Schiff, Takács Quartet.flac
   230395  09-29-09 02:18   Music_test/l_Modern_Central_European/Dohnányi, E/Orchestral Works - English Sinfonia, John Farrer.flac
 --------                   -------
  2104903                   22 files
Comment 14 Gordon Harris 2009-11-11 09:41:13 UTC
Yes, the only difference is in the filename encoding.  The flac files are identical in both zips.

linux_problem_flacs.zip was created on a Fedora 10 system using 'zip' off of a utf8 filesystem.

win_problem_flacs.zip was created on a Vista64 system using 'WinZip32.exe' from a NTFS drive.

I'm not sure how Winzip stores filenames, but I don't think it's in UTF8 since you can see that unzip seems to have trouble with the diacritics, substituting a "?" in place of the expected character when listing the zip contents.  And as you can see, unzip displays the expected paths when listing linux_problem_flacs.zip.

winwin_problem_flacs.zip unzips just fine on my Windows 7 system and I tested linux_problem_flacs.zip on my Fedora system before I posted it.  It should unzip just fine on linux or osx systems using unzip.

PS:  All the flacs in the zips are valid and ought to play.  But the ones with diacritics in BOTH the directory names and in the filenames won't play and won't get their artwork extracted with the 7.4.x & 7.5 scanner.  Those with no diacritics anywhere in the directory names scan and play.  Those with diacritics in the directory name but not in the file name also scan & play.  Again, it's only with the ones that contain diacritics both in the directory name and the filename that get the mangled track.url value.
Comment 15 Andy Grundman 2009-11-11 09:46:44 UTC
OK thanks I'm adding some of these tracks to my test suite.
Comment 16 Andy Grundman 2009-11-11 10:29:48 UTC
Yep, looks like the following situations are indeed broken:

UTF-8 path, ASCII filename (double-encodes the path)
UTF-8 path, UTF-8 filename (double-encodes the path only)

Cabez%C3%83%C2%B3n,%20A instead of Cabez%C3%B3n,%20A

ASCII path, ASCII filename ok (obviously)
ASCII path, UTF-8 filename ok

Should be easy to fix now. :)
Comment 17 Gordon Harris 2009-11-11 11:14:12 UTC
I hate to throw another twist in here, but now I'm no longer seeing this problem on Windows.

Using Version: 7.4.2 - r29220 @ Tue Nov 10 04:05:23 PST 2009 on Windows 7 and the win_problem_flacs.zip from above, all the flac scan, play and get their artwork extracted.

I'm still seeing this with 7.4.2 svn 29234 on Fedora 11 with the linux_problem_flacs.zip unziped to a ext3 file system, though.

Has something changed with the windows code since I filed this bug?  My recollection is that I was seeing this problem on windows too.
Comment 18 SVN Bot 2009-11-11 12:13:18 UTC
 == Auto-comment from SVN commit #29237 to the slim repo by andy ==
 == https://svn.slimdevices.com/slim?view=revision&revision=29237 ==

Fixed bug 14386, FLAC CUE parsing was using a raw UTF-8 path when it needs a decoded version
Comment 19 SVN Bot 2009-11-11 12:17:02 UTC
 == Auto-comment from SVN commit #29239 to the slim repo by andy ==
 == https://svn.slimdevices.com/slim?view=revision&revision=29239 ==

Bug 14386, added test cases using gharris's test files
Comment 20 Gordon Harris 2009-11-12 08:35:46 UTC
Andy: svn 29237 seems to have fixed this problem.  Thanks very much.  However, the embedded cuesheet FILE != filename on disk problem remains.  Would you prefer I file a separate bug report on that?
Comment 21 Andy Grundman 2009-11-12 08:47:16 UTC
Great, yeah please file a new bug for the other problem.
Comment 22 Andrey 2009-11-13 23:35:16 UTC
I've installed 7.4.2 - r29251 from this link http://downloads.slimdevices.com/nightly/7.4/sc/29251/squeezeboxserver-7.4.2-0.1.29251.noarch.rpm
and issue with UTF-8 path double encoding is still there. Is this right URL to get fixed version?
I also have a test files. Tried to attach them to this bug bud it didn't go through due to their size (1.2Gb).
Comment 23 Gordon Harris 2009-11-14 07:05:48 UTC
Andrey: you might want to wait a bit more before testing this.  I've proposed a patch for bug 15105 that also addresses this bug.  Hopefully, Andy will asses my patch sometime next week.
Comment 24 Chris Owens 2010-04-08 17:26:13 UTC
This bug has been marked fixed in a released version of Squeezebox Server or the accompanying firmware or mysqueezebox.com release.

If you are still seeing this issue, please let us know!