Blame view

ffmpeg-4.2.2/doc/snow.txt 21 KB
aac5773f   hucm   功能基本完成,接口待打磨
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
  =============================================
  Snow Video Codec Specification Draft 20080110
  =============================================
  
  Introduction:
  =============
  This specification describes the Snow bitstream syntax and semantics as
  well as the formal Snow decoding process.
  
  The decoding process is described precisely and any compliant decoder
  MUST produce the exact same output for a spec-conformant Snow stream.
  For encoding, though, any process which generates a stream compliant to
  the syntactical and semantic requirements and which is decodable by
  the process described in this spec shall be considered a conformant
  Snow encoder.
  
  Definitions:
  ============
  
  MUST    the specific part must be done to conform to this standard
  SHOULD  it is recommended to be done that way, but not strictly required
  
  ilog2(x) is the rounded down logarithm of x with basis 2
  ilog2(0) = 0
  
  Type definitions:
  =================
  
  b   1-bit range coded
  u   unsigned scalar value range coded
  s   signed scalar value range coded
  
  
  Bitstream syntax:
  =================
  
  frame:
      header
      prediction
      residual
  
  header:
      keyframe                            b   MID_STATE
      if(keyframe || always_reset)
          reset_contexts
      if(keyframe){
          version                         u   header_state
          always_reset                    b   header_state
          temporal_decomposition_type     u   header_state
          temporal_decomposition_count    u   header_state
          spatial_decomposition_count     u   header_state
          colorspace_type                 u   header_state
          if (nb_planes > 2) {
              chroma_h_shift              u   header_state
              chroma_v_shift              u   header_state
          }
          spatial_scalability             b   header_state
          max_ref_frames-1                u   header_state
          qlogs
      }
      if(!keyframe){
          update_mc                       b   header_state
          if(update_mc){
              for(plane=0; plane<nb_plane_types; plane++){
                  diag_mc                 b   header_state
                  htaps/2-1               u   header_state
                  for(i= p->htaps/2; i; i--)
                      |hcoeff[i]|         u   header_state
              }
          }
          update_qlogs                    b   header_state
          if(update_qlogs){
              spatial_decomposition_count u   header_state
              qlogs
          }
      }
  
      spatial_decomposition_type          s   header_state
      qlog                                s   header_state
      mv_scale                            s   header_state
      qbias                               s   header_state
      block_max_depth                     s   header_state
  
  qlogs:
      for(plane=0; plane<nb_plane_types; plane++){
          quant_table[plane][0][0]        s   header_state
          for(level=0; level < spatial_decomposition_count; level++){
              quant_table[plane][level][1]s   header_state
              quant_table[plane][level][3]s   header_state
          }
      }
  
  reset_contexts
      *_state[*]= MID_STATE
  
  prediction:
      for(y=0; y<block_count_vertical; y++)
          for(x=0; x<block_count_horizontal; x++)
              block(0)
  
  block(level):
      mvx_diff=mvy_diff=y_diff=cb_diff=cr_diff=0
      if(keyframe){
          intra=1
      }else{
          if(level!=max_block_depth){
              s_context= 2*left->level + 2*top->level + topleft->level + topright->level
              leaf                        b   block_state[4 + s_context]
          }
          if(level==max_block_depth || leaf){
              intra                       b   block_state[1 + left->intra + top->intra]
              if(intra){
                  y_diff                  s   block_state[32]
                  cb_diff                 s   block_state[64]
                  cr_diff                 s   block_state[96]
              }else{
                  ref_context= ilog2(2*left->ref) + ilog2(2*top->ref)
                  if(ref_frames > 1)
                      ref                 u   block_state[128 + 1024 + 32*ref_context]
                  mx_context= ilog2(2*abs(left->mx - top->mx))
                  my_context= ilog2(2*abs(left->my - top->my))
                  mvx_diff                s   block_state[128 + 32*(mx_context + 16*!!ref)]
                  mvy_diff                s   block_state[128 + 32*(my_context + 16*!!ref)]
              }
          }else{
              block(level+1)
              block(level+1)
              block(level+1)
              block(level+1)
          }
      }
  
  
  residual:
      residual2(luma)
      if (nb_planes > 2) {
          residual2(chroma_cr)
          residual2(chroma_cb)
      }
  
  residual2:
      for(level=0; level<spatial_decomposition_count; level++){
          if(level==0)
              subband(LL, 0)
          subband(HL, level)
          subband(LH, level)
          subband(HH, level)
      }
  
  subband:
      FIXME
  
  nb_plane_types = gray ? 1 : 2;
  
  Tag description:
  ----------------
  
  version
      0
      this MUST NOT change within a bitstream
  
  always_reset
      if 1 then the range coder contexts will be reset after each frame
  
  temporal_decomposition_type
      0
  
  temporal_decomposition_count
      0
  
  spatial_decomposition_count
      FIXME
  
  colorspace_type
      0   unspecified YCbCr
      1   Gray
      2   Gray + Alpha
      3   GBR
      4   GBRA
      this MUST NOT change within a bitstream
  
  chroma_h_shift
      log2(luma.width / chroma.width)
      this MUST NOT change within a bitstream
  
  chroma_v_shift
      log2(luma.height / chroma.height)
      this MUST NOT change within a bitstream
  
  spatial_scalability
      0
  
  max_ref_frames
      maximum number of reference frames
      this MUST NOT change within a bitstream
  
  update_mc
      indicates that motion compensation filter parameters are stored in the
      header
  
  diag_mc
      flag to enable faster diagonal interpolation
      this SHOULD be 1 unless it turns out to be covered by a valid patent
  
  htaps
      number of half pel interpolation filter taps, MUST be even, >0 and <10
  
  hcoeff
      half pel interpolation filter coefficients, hcoeff[0] are the 2 middle
      coefficients [1] are the next outer ones and so on, resulting in a filter
      like: ...eff[2], hcoeff[1], hcoeff[0], hcoeff[0], hcoeff[1], hcoeff[2] ...
      the sign of the coefficients is not explicitly stored but alternates
      after each coeff and coeff[0] is positive, so ...,+,-,+,-,+,+,-,+,-,+,...
      hcoeff[0] is not explicitly stored but found by subtracting the sum
      of all stored coefficients with signs from 32
      hcoeff[0]= 32 - hcoeff[1] - hcoeff[2] - ...
      a good choice for hcoeff and htaps is
      htaps= 6
      hcoeff={40,-10,2}
      an alternative which requires more computations at both encoder and
      decoder side and may or may not be better is
      htaps= 8
      hcoeff={42,-14,6,-2}
  
  
  ref_frames
      minimum of the number of available reference frames and max_ref_frames
      for example the first frame after a key frame always has ref_frames=1
  
  spatial_decomposition_type
      wavelet type
      0 is a 9/7 symmetric compact integer wavelet
      1 is a 5/3 symmetric compact integer wavelet
      others are reserved
      stored as delta from last, last is reset to 0 if always_reset || keyframe
  
  qlog
      quality (logarithmic quantizer scale)
      stored as delta from last, last is reset to 0 if always_reset || keyframe
  
  mv_scale
      stored as delta from last, last is reset to 0 if always_reset || keyframe
      FIXME check that everything works fine if this changes between frames
  
  qbias
      dequantization bias
      stored as delta from last, last is reset to 0 if always_reset || keyframe
  
  block_max_depth
      maximum depth of the block tree
      stored as delta from last, last is reset to 0 if always_reset || keyframe
  
  quant_table
      quantization table
  
  
  Highlevel bitstream structure:
  ==============================
   --------------------------------------------
  |                   Header                   |
   --------------------------------------------
  |    ------------------------------------    |
  |   |               Block0               |   |
  |   |             split?                 |   |
  |   |     yes              no            |   |
  |   |  .........         intra?          |   |
  |   | : Block01 :    yes         no      |   |
  |   | : Block02 :  .......   ..........  |   |
  |   | : Block03 : :  y DC : : ref index: |   |
  |   | : Block04 : : cb DC : : motion x : |   |
  |   |  .........  : cr DC : : motion y : |   |
  |   |              .......   ..........  |   |
  |    ------------------------------------    |
  |    ------------------------------------    |
  |   |               Block1               |   |
  |                    ...                     |
   --------------------------------------------
  | ------------   ------------   ------------ |
  || Y subbands | | Cb subbands| | Cr subbands||
  ||  ---  ---  | |  ---  ---  | |  ---  ---  ||
  || |LL0||HL0| | | |LL0||HL0| | | |LL0||HL0| ||
  ||  ---  ---  | |  ---  ---  | |  ---  ---  ||
  ||  ---  ---  | |  ---  ---  | |  ---  ---  ||
  || |LH0||HH0| | | |LH0||HH0| | | |LH0||HH0| ||
  ||  ---  ---  | |  ---  ---  | |  ---  ---  ||
  ||  ---  ---  | |  ---  ---  | |  ---  ---  ||
  || |HL1||LH1| | | |HL1||LH1| | | |HL1||LH1| ||
  ||  ---  ---  | |  ---  ---  | |  ---  ---  ||
  ||  ---  ---  | |  ---  ---  | |  ---  ---  ||
  || |HH1||HL2| | | |HH1||HL2| | | |HH1||HL2| ||
  ||    ...     | |    ...     | |    ...     ||
  | ------------   ------------   ------------ |
   --------------------------------------------
  
  Decoding process:
  =================
  
                                           ------------
                                          |            |
                                          |  Subbands  |
                     ------------         |            |
                    |            |         ------------
                    |  Intra DC  |               |
                    |            |    LL0 subband prediction
                     ------------                |
                                  \        Dequantization
   -------------------             \             |
  |  Reference frames |             \           IDWT
  | -------   ------- |    Motion    \           |
  ||Frame 0| |Frame 1|| Compensation  .   OBMC   v      -------
  | -------   ------- | --------------. \------> + --->|Frame n|-->output
  | -------   ------- |                                 -------
  ||Frame 2| |Frame 3||<----------------------------------/
  |        ...        |
   -------------------
  
  
  Range Coder:
  ============
  
  Binary Range Coder:
  -------------------
  The implemented range coder is an adapted version based upon "Range encoding:
  an algorithm for removing redundancy from a digitised message." by G. N. N.
  Martin.
  The symbols encoded by the Snow range coder are bits (0|1). The
  associated probabilities are not fix but change depending on the symbol mix
  seen so far.
  
  
  bit seen | new state
  ---------+-----------------------------------------------
      0    | 256 - state_transition_table[256 - old_state];
      1    |       state_transition_table[      old_state];
  
  state_transition_table = {
    0,   0,   0,   0,   0,   0,   0,   0,  20,  21,  22,  23,  24,  25,  26,  27,
   28,  29,  30,  31,  32,  33,  34,  35,  36,  37,  37,  38,  39,  40,  41,  42,
   43,  44,  45,  46,  47,  48,  49,  50,  51,  52,  53,  54,  55,  56,  56,  57,
   58,  59,  60,  61,  62,  63,  64,  65,  66,  67,  68,  69,  70,  71,  72,  73,
   74,  75,  75,  76,  77,  78,  79,  80,  81,  82,  83,  84,  85,  86,  87,  88,
   89,  90,  91,  92,  93,  94,  94,  95,  96,  97,  98,  99, 100, 101, 102, 103,
  104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 114, 115, 116, 117, 118,
  119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 133,
  134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149,
  150, 151, 152, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164,
  165, 166, 167, 168, 169, 170, 171, 171, 172, 173, 174, 175, 176, 177, 178, 179,
  180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 190, 191, 192, 194, 194,
  195, 196, 197, 198, 199, 200, 201, 202, 202, 204, 205, 206, 207, 208, 209, 209,
  210, 211, 212, 213, 215, 215, 216, 217, 218, 219, 220, 220, 222, 223, 224, 225,
  226, 227, 227, 229, 229, 230, 231, 232, 234, 234, 235, 236, 237, 238, 239, 240,
  241, 242, 243, 244, 245, 246, 247, 248, 248,   0,   0,   0,   0,   0,   0,   0};
  
  FIXME
  
  
  Range Coding of integers:
  -------------------------
  FIXME
  
  
  Neighboring Blocks:
  ===================
  left and top are set to the respective blocks unless they are outside of
  the image in which case they are set to the Null block
  
  top-left is set to the top left block unless it is outside of the image in
  which case it is set to the left block
  
  if this block has no larger parent block or it is at the left side of its
  parent block and the top right block is not outside of the image then the
  top right block is used for top-right else the top-left block is used
  
  Null block
  y,cb,cr are 128
  level, ref, mx and my are 0
  
  
  Motion Vector Prediction:
  =========================
  1. the motion vectors of all the neighboring blocks are scaled to
  compensate for the difference of reference frames
  
  scaled_mv= (mv * (256 * (current_reference+1) / (mv.reference+1)) + 128)>>8
  
  2. the median of the scaled left, top and top-right vectors is used as
  motion vector prediction
  
  3. the used motion vector is the sum of the predictor and
     (mvx_diff, mvy_diff)*mv_scale
  
  
  Intra DC Prediction:
  ====================
  the luma and chroma values of the left block are used as predictors
  
  the used luma and chroma is the sum of the predictor and y_diff, cb_diff, cr_diff
  to reverse this in the decoder apply the following:
  block[y][x].dc[0] = block[y][x-1].dc[0] +  y_diff;
  block[y][x].dc[1] = block[y][x-1].dc[1] + cb_diff;
  block[y][x].dc[2] = block[y][x-1].dc[2] + cr_diff;
  block[*][-1].dc[*]= 128;
  
  
  Motion Compensation:
  ====================
  
  Halfpel interpolation:
  ----------------------
  Halfpel interpolation is done by convolution with the halfpel filter stored
  in the header:
  
  horizontal halfpel samples are found by
  H1[y][x] =    hcoeff[0]*(F[y][x  ] + F[y][x+1])
              + hcoeff[1]*(F[y][x-1] + F[y][x+2])
              + hcoeff[2]*(F[y][x-2] + F[y][x+3])
              + ...
  h1[y][x] = (H1[y][x] + 32)>>6;
  
  vertical halfpel samples are found by
  H2[y][x] =    hcoeff[0]*(F[y  ][x] + F[y+1][x])
              + hcoeff[1]*(F[y-1][x] + F[y+2][x])
              + ...
  h2[y][x] = (H2[y][x] + 32)>>6;
  
  vertical+horizontal halfpel samples are found by
  H3[y][x] =    hcoeff[0]*(H2[y][x  ] + H2[y][x+1])
              + hcoeff[1]*(H2[y][x-1] + H2[y][x+2])
              + ...
  H3[y][x] =    hcoeff[0]*(H1[y  ][x] + H1[y+1][x])
              + hcoeff[1]*(H1[y+1][x] + H1[y+2][x])
              + ...
  h3[y][x] = (H3[y][x] + 2048)>>12;
  
  
                     F   H1  F
                     |   |   |
                     |   |   |
                     |   |   |
                     F   H1  F
                     |   |   |
                     |   |   |
                     |   |   |
     F-------F-------F-> H1<-F-------F-------F
                     v   v   v
                    H2   H3  H2
                     ^   ^   ^
     F-------F-------F-> H1<-F-------F-------F
                     |   |   |
                     |   |   |
                     |   |   |
                     F   H1  F
                     |   |   |
                     |   |   |
                     |   |   |
                     F   H1  F
  
  
  unavailable fullpel samples (outside the picture for example) shall be equal
  to the closest available fullpel sample
  
  
  Smaller pel interpolation:
  --------------------------
  if diag_mc is set then points which lie on a line between 2 vertically,
  horizontally or diagonally adjacent halfpel points shall be interpolated
  linearly with rounding to nearest and halfway values rounded up.
  points which lie on 2 diagonals at the same time should only use the one
  diagonal not containing the fullpel point
  
  
  
             F-->O---q---O<--h1->O---q---O<--F
             v \           / v \           / v
             O   O       O   O   O       O   O
             |         /     |     \         |
             q       q       q       q       q
             |     /         |         \     |
             O   O       O   O   O       O   O
             ^ /           \ ^ /           \ ^
            h2-->O---q---O<--h3->O---q---O<--h2
             v \           / v \           / v
             O   O       O   O   O       O   O
             |     \         |         /     |
             q       q       q       q       q
             |         \     |     /         |
             O   O       O   O   O       O   O
             ^ /           \ ^ /           \ ^
             F-->O---q---O<--h1->O---q---O<--F
  
  
  
  the remaining points shall be bilinearly interpolated from the
  up to 4 surrounding halfpel and fullpel points, again rounding should be to
  nearest and halfway values rounded up
  
  compliant Snow decoders MUST support 1-1/8 pel luma and 1/2-1/16 pel chroma
  interpolation at least
  
  
  Overlapped block motion compensation:
  -------------------------------------
  FIXME
  
  LL band prediction:
  ===================
  Each sample in the LL0 subband is predicted by the median of the left, top and
  left+top-topleft samples, samples outside the subband shall be considered to
  be 0. To reverse this prediction in the decoder apply the following.
  for(y=0; y<height; y++){
      for(x=0; x<width; x++){
          sample[y][x] += median(sample[y-1][x],
                                 sample[y][x-1],
                                 sample[y-1][x]+sample[y][x-1]-sample[y-1][x-1]);
      }
  }
  sample[-1][*]=sample[*][-1]= 0;
  width,height here are the width and height of the LL0 subband not of the final
  video
  
  
  Dequantization:
  ===============
  FIXME
  
  Wavelet Transform:
  ==================
  
  Snow supports 2 wavelet transforms, the symmetric biorthogonal 5/3 integer
  transform and an integer approximation of the symmetric biorthogonal 9/7
  daubechies wavelet.
  
  2D IDWT (inverse discrete wavelet transform)
  --------------------------------------------
  The 2D IDWT applies a 2D filter recursively, each time combining the
  4 lowest frequency subbands into a single subband until only 1 subband
  remains.
  The 2D filter is done by first applying a 1D filter in the vertical direction
  and then applying it in the horizontal one.
   ---------------    ---------------    ---------------    ---------------
  |LL0|HL0|       |  |   |   |       |  |       |       |  |       |       |
  |---+---|  HL1  |  | L0|H0 |  HL1  |  |  LL1  |  HL1  |  |       |       |
  |LH0|HH0|       |  |   |   |       |  |       |       |  |       |       |
  |-------+-------|->|-------+-------|->|-------+-------|->|   L1  |  H1   |->...
  |       |       |  |       |       |  |       |       |  |       |       |
  |  LH1  |  HH1  |  |  LH1  |  HH1  |  |  LH1  |  HH1  |  |       |       |
  |       |       |  |       |       |  |       |       |  |       |       |
   ---------------    ---------------    ---------------    ---------------
  
  
  1D Filter:
  ----------
  1. interleave the samples of the low and high frequency subbands like
  s={L0, H0, L1, H1, L2, H2, L3, H3, ... }
  note, this can end with a L or a H, the number of elements shall be w
  s[-1] shall be considered equivalent to s[1  ]
  s[w ] shall be considered equivalent to s[w-2]
  
  2. perform the lifting steps in order as described below
  
  5/3 Integer filter:
  1. s[i] -= (s[i-1] + s[i+1] + 2)>>2; for all even i < w
  2. s[i] += (s[i-1] + s[i+1]    )>>1; for all odd  i < w
  
  \ | /|\ | /|\ | /|\ | /|\
   \|/ | \|/ | \|/ | \|/ |
    +  |  +  |  +  |  +  |   -1/4
   /|\ | /|\ | /|\ | /|\ |
  / | \|/ | \|/ | \|/ | \|/
    |  +  |  +  |  +  |  +   +1/2
  
  
  Snow's 9/7 Integer filter:
  1. s[i] -= (3*(s[i-1] + s[i+1])         + 4)>>3; for all even i < w
  2. s[i] -=     s[i-1] + s[i+1]                 ; for all odd  i < w
  3. s[i] += (   s[i-1] + s[i+1] + 4*s[i] + 8)>>4; for all even i < w
  4. s[i] += (3*(s[i-1] + s[i+1])            )>>1; for all odd  i < w
  
  \ | /|\ | /|\ | /|\ | /|\
   \|/ | \|/ | \|/ | \|/ |
    +  |  +  |  +  |  +  |   -3/8
   /|\ | /|\ | /|\ | /|\ |
  / | \|/ | \|/ | \|/ | \|/
   (|  + (|  + (|  + (|  +   -1
  \ + /|\ + /|\ + /|\ + /|\  +1/4
   \|/ | \|/ | \|/ | \|/ |
    +  |  +  |  +  |  +  |   +1/16
   /|\ | /|\ | /|\ | /|\ |
  / | \|/ | \|/ | \|/ | \|/
    |  +  |  +  |  +  |  +   +3/2
  
  optimization tips:
  following are exactly identical
  (3a)>>1 == a + (a>>1)
  (a + 4b + 8)>>4 == ((a>>2) + b + 2)>>2
  
  16bit implementation note:
  The IDWT can be implemented with 16bits, but this requires some care to
  prevent overflows, the following list, lists the minimum number of bits needed
  for some terms
  1. lifting step
  A= s[i-1] + s[i+1]                              16bit
  3*A + 4                                         18bit
  A + (A>>1) + 2                                  17bit
  
  3. lifting step
  s[i-1] + s[i+1]                                 17bit
  
  4. lifiting step
  3*(s[i-1] + s[i+1])                             17bit
  
  
  TODO:
  =====
  Important:
  finetune initial contexts
  flip wavelet?
  try to use the wavelet transformed predicted image (motion compensated image) as context for coding the residual coefficients
  try the MV length as context for coding the residual coefficients
  use extradata for stuff which is in the keyframes now?
  implement per picture halfpel interpolation
  try different range coder state transition tables for different contexts
  
  Not Important:
  compare the 6 tap and 8 tap hpel filters (psnr/bitrate and subjective quality)
  spatial_scalability b vs u (!= 0 breaks syntax anyway so we can add a u later)
  
  
  Credits:
  ========
  Michael Niedermayer
  Loren Merritt
  
  
  Copyright:
  ==========
  GPL + GFDL + whatever is needed to make this a RFC