Bugzilla – Bug 14386
Scanning and playback broken on flacs with file names containing certain character combinations
Last modified: 2010-04-08 17:26:13 UTC
Please see: http://forums.slimdevices.com/showthread.php?t=68419 for examples. 1). For flacs with embedded cuesheets, the 7.4 scanner incorrectly UTF8 percent encodes the url value for tracks IF the full path contains diacritic characters in BOTH the file name and a folder name. Generally, the scanner is double-UTF8 encoding the folder name portion of the url. 2). For flacs with embedded cuesheets, SBS cannot "see" a flac whose file name contains an "&" ampersand for artwork extraction or playback, even though the url value stored in the db is correct. However, if the filename containing the ampersand ALSO contains a diacritic, OR if a folder name in the file's path contains a diacritic, THEN the file with the "&" can be "seen". Except not if BOTH the filename and a folder name contain diacritics. Then bug 1) above gets triggered.
Here are urls to zip files containing flacs which demonstrate the problem: http://www.hegardtfoundation.org/flacstuff/linux_problem_flacs.zip http://www.hegardtfoundation.org/flacstuff/win_problem_flacs.zip
See bug_14328
I'm not sure what it means to target the milestone as 7.5. Does this mean that this won't make it into any 7.4.x releases or the 7.4 trunk? This bug means that 10% of my library is unusable with 7.4.
I agree this should be fixed in a 7.4.x release.
I'm starting to suspect that another part of the problem here is that the FLAC cuesheet parser in the new scanner code is having trouble with the FILE entry. I believe that in past versions of SC, Dan solved this problem by basically ignoring the FILE data, always substituting the actual filename as picked up by the scanner.
How can you ignore the FILE entry, how do you know which file the cue sheet is for if you do that?
How do you know what the file name actually is? Because it's an embedded cuesheet. The scanner is reading it from a flac file. Presumably, the scanner knows what file it's currently reading. If I was better at searching bugzilla or the check-ins, I'd find the changes that Dan made when he abandoned relying on FILE. (At least, that's my recollection of what he did. I remember that it seemed to solve a lot of ills.) Anyway, it's true that *a few* of my embedded cuesheets contain FILE data that don't match the actual file name. And in general, I don't have a problem with the scanner enforcing FILE == filename, though on the face of it, this seems a little discriminatory: you don't require an MP3 to be retagged just because someone changes it's filename. But I do think that requiring FILE == filename is potentially a can of worms. A "compliant" cuesheet embedded in a flac will always be UTF8 encoded. But the underlying file system where the file resides may not be (e.g. Windows). It seems like it would be easier for the scanner to just use the actual file name it "knows" vs having to decide to translate the FILE utf8 data or not. SC 7.0 through 7.3.x haven't had issues with FILE != actual filename and all my embedded cuesheets have been parsed by the scanner flawlessly. So I have to believe that the old scanner code ignored FILE in favor of the actual filename of the flac being scanned. BTW, I think FILE==x is a subset of the bug I'm reporting here. It's the double-UTF8 encoding of folder names in the db track table url field when the file includes diacritics too that's causing most of the problems for me. You might want to look at what Michael went through with bug 7547 ( https://bugs-archive.lyrion.org/show_bug.cgi?id=7547 ).
OK, I agree, embedded cue sheets can ignore FILE (standalone ones can't). I do see the encoding issue with artwork. Is artwork the only issue with these files? They seem to scan fine other than that.
Artwork + playback for me. They won't play if the url field points to something non-existent. But, yes, all the other metadata seems to parse out perfectly.
Also have a big library of FLACs with non-Latin characters in URL. I have all symptoms described by Gordon. Here is some additional info. Only FLACs with embedded cue-sheets are affected. And only if non-Latin character in the file path. FLACs as individual tracks aren't affected. FLACs with all Latin path but non-Latin file names are Ok. I think Slim::Utils::Misc::fixPath procedure is a root cause. Please check excerpt from my scanner.log 00:23:39.0916 - correct path UTF-8 encoded 00:23:39.1031 - correctly encoded path and URL 00:23:39.1046 - correct path and URL 00:23:39.1060 - correct 00:23:39.1390 - correct 00:23:39.1401 - correct 00:23:39.1427 - WRONG. See how %D characters in URL changed to %C - %C is incorrect. Though flac file name in URL is still properly encoded (see bunch of %D) but path is incorrectly encoded now (see bunch of %C). 00:23:39.1437 - correct 00:23:39.1657 - path is incorrect file name is correct 00:23:39.1675 - path is incorrect file name is correct again I hope this will help to pin point the problem. [09-10-06 00:23:39.0916] Slim::Utils::Scanner::scanDirectory (328) Scanning: /home/anry/testm/Russian/Ð�Ñ�аво/1992 - Ð�оÑ�ковÑ�кий биÑ�/Ð�оÑ�ковÑ�кий биÑ�.flac [09-10-06 00:23:39.1031] Slim::Utils::Misc::pathFromFileURL (250) Got /home/anry/testm/Russian/%D0%91%D1%80%D0%B0%D0%B2%D0%BE/1992%20-%20%D0%9C%D0%BE%D1%81%D0%BA%D0%BE%D0%B2%D1%81%D0%BA%D0%B8%D0%B9%20%D0%B1%D0%B8%D1%82/%D0%9C%D0%BE%D1%81%D0%BA%D0%BE%D0%B2%D1%81%D0%BA%D0%B8%D0%B9%20%D0%B1%D0%B8%D1%82.flac from file url file:///home/anry/testm/Russian/%D0%91%D1%80%D0%B0%D0%B2%D0%BE/1992%20-%20%D0%9C%D0%BE%D1%81%D0%BA%D0%BE%D0%B2%D1%81%D0%BA%D0%B8%D0%B9%20%D0%B1%D0%B8%D1%82/%D0%9C%D0%BE%D1%81%D0%BA%D0%BE%D0%B2%D1%81%D0%BA%D0%B8%D0%B9%20%D0%B1%D0%B8%D1%82.flac [09-10-06 00:23:39.1046] Slim::Utils::Misc::pathFromFileURL (267) Extracted: /home/anry/testm/Russian/Ð�Ñ�аво/1992 - Ð�оÑ�ковÑ�кий биÑ�/Ð�оÑ�ковÑ�кий биÑ�.flac from file:///home/anry/testm/Russian/%D0%91%D1%80%D0%B0%D0%B2%D0%BE/1992%20-%20%D0%9C%D0%BE%D1%81%D0%BA%D0%BE%D0%B2%D1%81%D0%BA%D0%B8%D0%B9%20%D0%B1%D0%B8%D1%82/%D0%9C%D0%BE%D1%81%D0%BA%D0%BE%D0%B2%D1%81%D0%BA%D0%B8%D0%B9%20%D0%B1%D0%B8%D1%82.flac [09-10-06 00:23:39.1060] Slim::Utils::Scanner::scanDirectory (340) Adding file:///home/anry/testm/Russian/%D0%91%D1%80%D0%B0%D0%B2%D0%BE/1992%20-%20%D0%9C%D0%BE%D1%81%D0%BA%D0%BE%D0%B2%D1%81%D0%BA%D0%B8%D0%B9%20%D0%B1%D0%B8%D1%82/%D0%9C%D0%BE%D1%81%D0%BA%D0%BE%D0%B2%D1%81%D0%BA%D0%B8%D0%B9%20%D0%B1%D0%B8%D1%82.flac to database. [09-10-06 00:23:39.1390] Slim::Utils::Misc::stripRel (594) Original: /home/anry/testm/Russian/Ð�Ñ�аво/1992 - Ð�оÑ�ковÑ�кий биÑ�/Московский бит.flac [09-10-06 00:23:39.1401] Slim::Utils::Misc::stripRel (600) Stripped: /home/anry/testm/Russian/Ð�Ñ�аво/1992 - Ð�оÑ�ковÑ�кий биÑ�/Московский бит.flac [09-10-06 00:23:39.1427] Slim::Utils::Misc::fixPath (577) Fixed: Московский бит.flac to file:///home/anry/testm/Russian/%C3%90%C2%91%C3%91%C2%80%C3%90%C2%B0%C3%90%C2%B2%C3%90%C2%BE/1992%20-%20%C3%90%C2%9C%C3%90%C2%BE%C3%91%C2%81%C3%90%C2%BA%C3%90%C2%BE%C3%90%C2%B2%C3%91%C2%81%C3%90%C2%BA%C3%90%C2%B8%C3%90%C2%B9%20%C3%90%C2%B1%C3%90%C2%B8%C3%91%C2%82/%D0%9C%D0%BE%D1%81%D0%BA%D0%BE%D0%B2%D1%81%D0%BA%D0%B8%D0%B9%20%D0%B1%D0%B8%D1%82.flac [09-10-06 00:23:39.1437] Slim::Utils::Misc::fixPath (579) Base: /home/anry/testm/Russian/Ð�Ñ�аво/1992 - Ð�оÑ�ковÑ�кий биÑ� [09-10-06 00:23:39.1657] Slim::Utils::Misc::pathFromFileURL (250) Got /home/anry/testm/Russian/%C3%90%C2%91%C3%91%C2%80%C3%90%C2%B0%C3%90%C2%B2%C3%90%C2%BE/1992%20-%20%C3%90%C2%9C%C3%90%C2%BE%C3%91%C2%81%C3%90%C2%BA%C3%90%C2%BE%C3%90%C2%B2%C3%91%C2%81%C3%90%C2%BA%C3%90%C2%B8%C3%90%C2%B9%20%C3%90%C2%B1%C3%90%C2%B8%C3%91%C2%82/%D0%9C%D0%BE%D1%81%D0%BA%D0%BE%D0%B2%D1%81%D0%BA%D0%B8%D0%B9%20%D0%B1%D0%B8%D1%82.flac from file url file:///home/anry/testm/Russian/%C3%90%C2%91%C3%91%C2%80%C3%90%C2%B0%C3%90%C2%B2%C3%90%C2%BE/1992%20-%20%C3%90%C2%9C%C3%90%C2%BE%C3%91%C2%81%C3%90%C2%BA%C3%90%C2%BE%C3%90%C2%B2%C3%91%C2%81%C3%90%C2%BA%C3%90%C2%B8%C3%90%C2%B9%20%C3%90%C2%B1%C3%90%C2%B8%C3%91%C2%82/%D0%9C%D0%BE%D1%81%D0%BA%D0%BE%D0%B2%D1%81%D0%BA%D0%B8%D0%B9%20%D0%B1%D0%B8%D1%82.flac [09-10-06 00:23:39.1675] Slim::Utils::Misc::pathFromFileURL (267) Extracted: /home/anry/testm/Russian/Ã�Â�Ã�Â�Ã�°Ã�²Ã�¾/1992 - Ã�Â�Ã�¾Ã�Â�Ã�ºÃ�¾Ã�²Ã�Â�Ã�ºÃ�¸Ã�¹ Ã�±Ã�¸Ã�Â�/Ð�оÑ�ковÑ�кий биÑ�.flac from file:///home/anry/testm/Russian/%C3%90%C2%91%C3%91%C2%80%C3%90%C2%B0%C3%90%C2%B2%C3%90%C2%BE/1992%20-%20%C3%90%C2%9C%C3%90%C2%BE%C3%91%C2%81%C3%90%C2%BA%C3%90%C2%BE%C3%90%C2%B2%C3%91%C2%81%C3%90%C2%BA%C3%90%C2%B8%C3%90%C2%B9%20%C3%90%C2%B1%C3%90%C2%B8%C3%91%C2%82/%D0%9C%D0%BE%D1%81%D0%BA%D0%BE%D0%B2%D1%81%D0%BA%D0%B8%D0%B9%20%D0%B1%D0%B8%D1%82.flac
I saw couple of non-Latin character encoding issues were fixed in 7.4.1 r28825 and gave it a try. Scanner bug described here is still presented in 7.4.1 r28825 Can you take a look at it please? Can you reproduce it? If not let me know and I can provide more information.
I'm now not fully confident that my initial issue #2 (about filenames with "&" in them) is 100% reproduce-able. Issue #1, though, is completely reproduce-able and the evidence for the bug is to be plainly seen in the track.url field in the db...i.e. you can see the double-utf8 encoding of the directory name part of the path. I'll file a separate bug for issue #2 if I'm still seeing that with any consistency after issue #1 gets addressed.
Is the only difference between the Linux and Windows versions of your zip files in the filename encoding? Can you give more details about how these zips were created, etc? $ unzip -l win_problem_flacs.zip Archive: win_problem_flacs.zip Length Date Time Name -------- ---- ---- ---- 0 09-29-09 00:01 Music_test/a_Medieval/ 0 09-29-09 00:19 Music_test/a_Medieval/Du Fay, G/ 200269 09-29-09 00:17 Music_test/a_Medieval/Du Fay, G/L'Arbre de Mai - Chansons & dances au temps de Guillaume Dufay - All?gorie.flac 182604 09-29-09 00:17 Music_test/a_Medieval/Du Fay, G/O gemma, lux - Huelgas Ensemble.flac 0 09-29-09 00:03 Music_test/b_Renaissance/ 0 09-29-09 00:19 Music_test/b_Renaissance/Cabez?n, A/ 185601 09-29-09 00:17 Music_test/b_Renaissance/Cabez?n, A/Instrumental Works - Hesp?rion XX, Jordi Savall.flac 178626 09-29-09 00:17 Music_test/b_Renaissance/Cabez?n, A/Tientos & Glosados - Ensemble Accentus, Thomas Wimmer.flac 0 09-29-09 00:19 Music_test/b_Renaissance/Kerle, J/ 199432 09-29-09 00:17 Music_test/b_Renaissance/Kerle, J/Missa Da pacem Domine & Motets - Huelgas-Ensemble, Paul Van Nevel.flac 0 09-29-09 00:02 Music_test/f_French_Baroque/ 0 09-29-09 00:19 Music_test/f_French_Baroque/Cl?rambault, L-N/ 187635 09-29-09 00:18 Music_test/f_French_Baroque/Cl?rambault, L-N/Les d?esses outrag?es - Agn?s Mellon, Barcarole.flac 202516 09-29-09 00:18 Music_test/f_French_Baroque/Cl?rambault, L-N/The Triumph of Apollo - Les Elments.flac 0 09-29-09 00:02 Music_test/l_Modern_Central_European/ 0 09-29-09 00:19 Music_test/l_Modern_Central_European/Bart?k, B/ 175195 09-29-09 00:18 Music_test/l_Modern_Central_European/Bart?k, B/Clarinet Trios - Peterkov?, Demeterov?, Cibulkov?.flac 175022 09-29-09 00:18 Music_test/l_Modern_Central_European/Bart?k, B/Concertos - Berliner Philharmoniker, Pierre Boulez.flac 0 09-29-09 00:19 Music_test/l_Modern_Central_European/Dohn?nyi, E/ 187608 09-29-09 00:18 Music_test/l_Modern_Central_European/Dohn?nyi, E/Chamber Works - Piano Quintet, Op. 1 & Sextet Op. 37 - Andr?s Schiff, Tak?cs Quartet.flac 230395 09-29-09 00:18 Music_test/l_Modern_Central_European/Dohn?nyi, E/Orchestral Works - English Sinfonia, John Farrer.flac 0 09-29-09 00:42 Music_test/ -------- ------- 2104903 22 files $ unzip -l linux_problem_flacs.zip Archive: linux_problem_flacs.zip Length Date Time Name -------- ---- ---- ---- 0 09-29-09 02:20 Music_test/ 0 09-29-09 02:20 Music_test/a_Medieval/ 0 09-29-09 02:20 Music_test/a_Medieval/Du Fay, G/ 200269 09-29-09 02:17 Music_test/a_Medieval/Du Fay, G/L'Arbre de Mai - Chansons & dances au temps de Guillaume Dufay - Allégorie.flac 182604 09-29-09 02:17 Music_test/a_Medieval/Du Fay, G/O gemma, lux - Huelgas Ensemble.flac 0 09-29-09 02:20 Music_test/b_Renaissance/ 0 09-29-09 02:20 Music_test/b_Renaissance/Cabezón, A/ 185601 09-29-09 02:17 Music_test/b_Renaissance/Cabezón, A/Instrumental Works - Hespèrion XX, Jordi Savall.flac 178626 09-29-09 02:17 Music_test/b_Renaissance/Cabezón, A/Tientos & Glosados - Ensemble Accentus, Thomas Wimmer.flac 0 09-29-09 02:20 Music_test/b_Renaissance/Kerle, J/ 199432 09-29-09 02:17 Music_test/b_Renaissance/Kerle, J/Missa Da pacem Domine & Motets - Huelgas-Ensemble, Paul Van Nevel.flac 0 09-29-09 02:20 Music_test/f_French_Baroque/ 0 09-29-09 02:20 Music_test/f_French_Baroque/Clérambault, L-N/ 187635 09-29-09 02:18 Music_test/f_French_Baroque/Clérambault, L-N/Les déesses outragées - Agnès Mellon, Barcarole.flac 202516 09-29-09 02:18 Music_test/f_French_Baroque/Clérambault, L-N/The Triumph of Apollo - Les Elments.flac 0 09-29-09 02:25 Music_test/l_Modern_Central_European/ 0 09-29-09 02:20 Music_test/l_Modern_Central_European/Bartók, B/ 175195 09-29-09 02:18 Music_test/l_Modern_Central_European/Bartók, B/Clarinet Trios - Peterková, Demeterová, Cibulková.flac 175022 09-29-09 02:18 Music_test/l_Modern_Central_European/Bartók, B/Concertos - Berliner Philharmoniker, Pierre Boulez.flac 0 09-29-09 02:20 Music_test/l_Modern_Central_European/Dohnányi, E/ 187608 09-29-09 02:18 Music_test/l_Modern_Central_European/Dohnányi, E/Chamber Works - Piano Quintet, Op. 1 & Sextet Op. 37 - András Schiff, Takács Quartet.flac 230395 09-29-09 02:18 Music_test/l_Modern_Central_European/Dohnányi, E/Orchestral Works - English Sinfonia, John Farrer.flac -------- ------- 2104903 22 files
Yes, the only difference is in the filename encoding. The flac files are identical in both zips. linux_problem_flacs.zip was created on a Fedora 10 system using 'zip' off of a utf8 filesystem. win_problem_flacs.zip was created on a Vista64 system using 'WinZip32.exe' from a NTFS drive. I'm not sure how Winzip stores filenames, but I don't think it's in UTF8 since you can see that unzip seems to have trouble with the diacritics, substituting a "?" in place of the expected character when listing the zip contents. And as you can see, unzip displays the expected paths when listing linux_problem_flacs.zip. winwin_problem_flacs.zip unzips just fine on my Windows 7 system and I tested linux_problem_flacs.zip on my Fedora system before I posted it. It should unzip just fine on linux or osx systems using unzip. PS: All the flacs in the zips are valid and ought to play. But the ones with diacritics in BOTH the directory names and in the filenames won't play and won't get their artwork extracted with the 7.4.x & 7.5 scanner. Those with no diacritics anywhere in the directory names scan and play. Those with diacritics in the directory name but not in the file name also scan & play. Again, it's only with the ones that contain diacritics both in the directory name and the filename that get the mangled track.url value.
OK thanks I'm adding some of these tracks to my test suite.
Yep, looks like the following situations are indeed broken: UTF-8 path, ASCII filename (double-encodes the path) UTF-8 path, UTF-8 filename (double-encodes the path only) Cabez%C3%83%C2%B3n,%20A instead of Cabez%C3%B3n,%20A ASCII path, ASCII filename ok (obviously) ASCII path, UTF-8 filename ok Should be easy to fix now. :)
I hate to throw another twist in here, but now I'm no longer seeing this problem on Windows. Using Version: 7.4.2 - r29220 @ Tue Nov 10 04:05:23 PST 2009 on Windows 7 and the win_problem_flacs.zip from above, all the flac scan, play and get their artwork extracted. I'm still seeing this with 7.4.2 svn 29234 on Fedora 11 with the linux_problem_flacs.zip unziped to a ext3 file system, though. Has something changed with the windows code since I filed this bug? My recollection is that I was seeing this problem on windows too.
== Auto-comment from SVN commit #29237 to the slim repo by andy == == https://svn.slimdevices.com/slim?view=revision&revision=29237 == Fixed bug 14386, FLAC CUE parsing was using a raw UTF-8 path when it needs a decoded version
== Auto-comment from SVN commit #29239 to the slim repo by andy == == https://svn.slimdevices.com/slim?view=revision&revision=29239 == Bug 14386, added test cases using gharris's test files
Andy: svn 29237 seems to have fixed this problem. Thanks very much. However, the embedded cuesheet FILE != filename on disk problem remains. Would you prefer I file a separate bug report on that?
Great, yeah please file a new bug for the other problem.
I've installed 7.4.2 - r29251 from this link http://downloads.slimdevices.com/nightly/7.4/sc/29251/squeezeboxserver-7.4.2-0.1.29251.noarch.rpm and issue with UTF-8 path double encoding is still there. Is this right URL to get fixed version? I also have a test files. Tried to attach them to this bug bud it didn't go through due to their size (1.2Gb).
Andrey: you might want to wait a bit more before testing this. I've proposed a patch for bug 15105 that also addresses this bug. Hopefully, Andy will asses my patch sometime next week.
This bug has been marked fixed in a released version of Squeezebox Server or the accompanying firmware or mysqueezebox.com release. If you are still seeing this issue, please let us know!