Nero AAC 3.1.0.2 Problem
------------------------

On Saturday, December 24th, 2005, it has come to my attention that a bug found
by Francis Niechcial (guruboolez on Hydrogenaudio Forums) in Nero AAC 3.1.0.2
and later confirmed by Ivan Dimkovic (real name on Hydrogenaudio Forums) and
Juha Laaksonheimo (JohnV on Hydrogenaudio Forums) might lead to an unfair
ranking of the encoder.

Francis Niechcial describes the following problem: Nero AAC, while achieving a
target bitrate of around 140 kbps across all tested samples, shows an un-natural
bitrate distribution curve for the individual samples. The first 10 to 20
seconds of encoded material have bitrates around 155 kbps, followed by a sudden
drop to 130 kbps for the remaining duration. The bitrate boost during the first
two thirds (approx.) of the samples most likely also results in better perceptual
quality of the encoded material.
And indeed, Francis noticed that something has to be wrong with the encoder he
was testing after listening to the sample LesJoursHeureux (sample 10) and
discovering a sudden and unexpected quality drop after the 10th or 11th second
of the track. After this sample, Francis put more attention to possible, similar
quality drops and thought he found other cases of this unusual issue. He attached
his comments to the result file and asked me to check if all problems found were
linked to the same encoder, which I was able to confirm.

At first, we believed that the bitrate-boost is caused by a very high run-in time,
but then Ivan Dimkovic said that at an average of 128 kbps, the ABR buffer size
should be 300 kilobits, which is 2.34375 seconds. At this point, I would like to
quote the relevant lines from my ICQ conversation with him:

  Ivan Dimkovic: ABR buffer size = 300 kbits
  Ivan Dimkovic: 2.34375
  Ivan Dimkovic: seconds of run-in  if the average rate is 128 kbps

After excluding the run-in time as cause for this behavior, Ivan did some more
research and found why the problem occurs. The following lines are extracted from
an IRC conversation with him, Juha, Francis and me:

  JohnV: well.. I'll see it like this: there was a bug in ABR bitreservour
         allocation.
  [...]
  idimkovic2: the bug with 3.1.0.2 was
  idimkovic2: one parameter in the function was wrong (heh it is always the
              case, ask developers
  idimkovic2: ;)
  idimkovic2: and bti reservoir was drained to approx 10-15 kbits

So, as you can see, the issue discovered by Francis while listening to a Nero
encoding leads back to a bug in the ABR bit-reservoir allocation function that
causes the bit-reservoir to drain too fast. Instead of using the reservoir more
or less constantly across the whole file, Nero is "wasting" all bits on the first
two thirds of the sample.
The Nero AAC developers claim that such an over-coding at the beginning of the
track also implies an under-coding for the rest of the sample since the encoder
doesn't have enough bits left in the bit-reservoir to use for parts where it would
normally allocate more.
Moreover, the developers claim that this problem, although present in the Nero AAC
encoder I had for testing, should not have such a big impact because people are
supposed to listen to the whole sample and give a mark for the quality achieved
over the whole file. Basically, their point is this: if user A listens to sample 1
and notices a quality drop after seconds 20 to 30, his mark for the encoder
will reflect the more or less "bad" quality the encoder produced.

While this might be true, we have two grave problems.
The first one is that all samples encoded with Nero AAC do not represent real-life
behavior. There is a serious difference between how Nero encoded the tested
samples and how it encodes full tracks. The testing conditions (i.e. direct
encoding of short samples) directly interferes with the encoder's performance. In
our case, about two thirds of encoded material (samples) have an artificially
increased bitrate (thus increased quality), leaving only one third of (sub-)optimal
encoded data at best; and when samples are really short, there is even no possible
sub-optimal part. As example, let's look at the sample "Yello" which is only 9
seconds long. As consequence of the short duration, Nero didn't even had the time
to drain the bit-reservoir entirely - the whole song is actually over-coded.
In real-life however, the ~20 seconds of over-coding correspond to only a tiny
fraction of the whole data since we are encoding several minutes (or even more
than an hour for complete CD images). Treating Nero like the rest of the contenders
is not fair because we only have a very limited correlation with real-life usage.
Our testing conditions are reproducing the behavior occurring at the beginning of
a full track and in real-life our samples wouldn't get the same bonus bitrate
as our samples received for the test.
The second problem is a direct consequence of the first one: the tested Nero
encodings are not only different from what any user would get, they are also
(artificially) better. Therefore, some artifacts at the beginning of the sample
might have been lowered or even "masked" by the increased bitrate. My point is that
if Nero encoded the first 20 seconds of a sample using a "normal" bitrate of
130 kbps like it would normally do, quality would be lower and some artifacts
might have been more audible. As a consequence, it would definitely be unfair to
compare Nero with the other encoders because Nero's behavior was changed by the
testing conditions and didn't use its psychoacoustic model to decide to use a
higher bitrate; it was a bug's fault. Therefore, if Vorbis, iTunes and LAME for
example had an artifact because they used a sub-optimal bitrate but Nero didn't
show a problem, people might rate Vorbis, iTunes and Nero even worse because they
would say "Hey, encoder X (Nero) did better!".

Because of the mentioned problems (unfairness, no real-life relevance...) and
after discussing the issue with Francis, Roberto Amorim (rjamorim on Hydrogenaudio
Forums) and Darryl Miyaguchi (ff123 on Hydrogenaudio Forums) thoroughly, I decided,
against Ivan's and Juha's suggestion, to exclude Nero from the test. Ivan and Juha
told me that meanwhile the bug is fixed so I am looking forward to seeing Nero
compete again in another listening test.
I would also like to state clearly that I do not accuse the Nero developers of
cheating whatsoever. It was an unfortunate situation, but we learned our lesson:
never use an encoder that was released a few days or even hours in a public
listening test. :)

Last but not least, I would like to thank to everyone for the great support and
interest in making this test as fair as possible. Special thanks go to Francis,
Darryl, Roberto, Ivan and Juha!