Nero AAC 3.1.0.2 Problem ------------------------ On Saturday, December 24th, 2005, it has come to my attention that a bug found by Francis Niechcial (guruboolez on Hydrogenaudio Forums) in Nero AAC 3.1.0.2 and later confirmed by Ivan Dimkovic (real name on Hydrogenaudio Forums) and Juha Laaksonheimo (JohnV on Hydrogenaudio Forums) might lead to an unfair ranking of the encoder. Francis Niechcial describes the following problem: Nero AAC, while achieving a target bitrate of around 140 kbps across all tested samples, shows an un-natural bitrate distribution curve for the individual samples. The first 10 to 20 seconds of encoded material have bitrates around 155 kbps, followed by a sudden drop to 130 kbps for the remaining duration. The bitrate boost during the first two thirds (approx.) of the samples most likely also results in better perceptual quality of the encoded material. And indeed, Francis noticed that something has to be wrong with the encoder he was testing after listening to the sample LesJoursHeureux (sample 10) and discovering a sudden and unexpected quality drop after the 10th or 11th second of the track. After this sample, Francis put more attention to possible, similar quality drops and thought he found other cases of this unusual issue. He attached his comments to the result file and asked me to check if all problems found were linked to the same encoder, which I was able to confirm. At first, we believed that the bitrate-boost is caused by a very high run-in time, but then Ivan Dimkovic said that at an average of 128 kbps, the ABR buffer size should be 300 kilobits, which is 2.34375 seconds. At this point, I would like to quote the relevant lines from my ICQ conversation with him: Ivan Dimkovic: ABR buffer size = 300 kbits Ivan Dimkovic: 2.34375 Ivan Dimkovic: seconds of run-in if the average rate is 128 kbps After excluding the run-in time as cause for this behavior, Ivan did some more research and found why the problem occurs. The following lines are extracted from an IRC conversation with him, Juha, Francis and me: JohnV: well.. I'll see it like this: there was a bug in ABR bitreservour allocation. [...] idimkovic2: the bug with 3.1.0.2 was idimkovic2: one parameter in the function was wrong (heh it is always the case, ask developers idimkovic2: ;) idimkovic2: and bti reservoir was drained to approx 10-15 kbits So, as you can see, the issue discovered by Francis while listening to a Nero encoding leads back to a bug in the ABR bit-reservoir allocation function that causes the bit-reservoir to drain too fast. Instead of using the reservoir more or less constantly across the whole file, Nero is "wasting" all bits on the first two thirds of the sample. The Nero AAC developers claim that such an over-coding at the beginning of the track also implies an under-coding for the rest of the sample since the encoder doesn't have enough bits left in the bit-reservoir to use for parts where it would normally allocate more. Moreover, the developers claim that this problem, although present in the Nero AAC encoder I had for testing, should not have such a big impact because people are supposed to listen to the whole sample and give a mark for the quality achieved over the whole file. Basically, their point is this: if user A listens to sample 1 and notices a quality drop after seconds 20 to 30, his mark for the encoder will reflect the more or less "bad" quality the encoder produced. While this might be true, we have two grave problems. The first one is that all samples encoded with Nero AAC do not represent real-life behavior. There is a serious difference between how Nero encoded the tested samples and how it encodes full tracks. The testing conditions (i.e. direct encoding of short samples) directly interferes with the encoder's performance. In our case, about two thirds of encoded material (samples) have an artificially increased bitrate (thus increased quality), leaving only one third of (sub-)optimal encoded data at best; and when samples are really short, there is even no possible sub-optimal part. As example, let's look at the sample "Yello" which is only 9 seconds long. As consequence of the short duration, Nero didn't even had the time to drain the bit-reservoir entirely - the whole song is actually over-coded. In real-life however, the ~20 seconds of over-coding correspond to only a tiny fraction of the whole data since we are encoding several minutes (or even more than an hour for complete CD images). Treating Nero like the rest of the contenders is not fair because we only have a very limited correlation with real-life usage. Our testing conditions are reproducing the behavior occurring at the beginning of a full track and in real-life our samples wouldn't get the same bonus bitrate as our samples received for the test. The second problem is a direct consequence of the first one: the tested Nero encodings are not only different from what any user would get, they are also (artificially) better. Therefore, some artifacts at the beginning of the sample might have been lowered or even "masked" by the increased bitrate. My point is that if Nero encoded the first 20 seconds of a sample using a "normal" bitrate of 130 kbps like it would normally do, quality would be lower and some artifacts might have been more audible. As a consequence, it would definitely be unfair to compare Nero with the other encoders because Nero's behavior was changed by the testing conditions and didn't use its psychoacoustic model to decide to use a higher bitrate; it was a bug's fault. Therefore, if Vorbis, iTunes and LAME for example had an artifact because they used a sub-optimal bitrate but Nero didn't show a problem, people might rate Vorbis, iTunes and Nero even worse because they would say "Hey, encoder X (Nero) did better!". Because of the mentioned problems (unfairness, no real-life relevance...) and after discussing the issue with Francis, Roberto Amorim (rjamorim on Hydrogenaudio Forums) and Darryl Miyaguchi (ff123 on Hydrogenaudio Forums) thoroughly, I decided, against Ivan's and Juha's suggestion, to exclude Nero from the test. Ivan and Juha told me that meanwhile the bug is fixed so I am looking forward to seeing Nero compete again in another listening test. I would also like to state clearly that I do not accuse the Nero developers of cheating whatsoever. It was an unfortunate situation, but we learned our lesson: never use an encoder that was released a few days or even hours in a public listening test. :) Last but not least, I would like to thank to everyone for the great support and interest in making this test as fair as possible. Special thanks go to Francis, Darryl, Roberto, Ivan and Juha!