Why software genlock at 60 FPS does matter!

Since MediaMaster 1.1 we have revamped our video engine and particularly the synchronization and multi threading.

We now perform what can be called software genlock to ensure the best possible fluidity if your machine has a multi core processor. Genock is the action of locking the frequency of a media on a reference signal or clock. There is a nice description on that on wikipedia.

When the software must present a frame you can cut the work that has to be done in 3 parts : getting the video frames from the disk, uploading them to the graphic card and doing the composition / blending of the pixels for the presentation.

Because of the way disks are working and because the time that is required by a codec to convert the data from the disk into a decompressed frame is not constant this can create fluidity problems.

So at each new frame the software wakes up and start working sequentially on the 3 phases. In a classical real-time video processing application this will work like this:

Traditional video application

Traditional video application

This graph show an application trying to play a video loop encoded at 30 fps on a monitor running at 60 fps. In a perfect world the application should present each frames of the movie exactly twice.

There are 2 problems with the traditional way of doing the video processing:

  • The time base is synchronized to the clock of the computer and so there is a drift between the monitor frequency and the internal clock of the computer. This means that even if your computer would be extremely powerful you will see small hiccups once in a while.
  • When the software start working to display a new frame it has only in this example 1/60th of second to read a video frame, upload it and present it to the user. The available time is dependent on the fps of the monitor and not on the fps of the video source. So the higher the fps of the monitor, the more stress you give to your systems.
  • The processing done by MediaMaster is much more elegant. Since 1.1 we have 3 modes, the original one, a buffered mode and a frame blending mode.

    In this article I focus on the buffered mode, I will write more about frame blending latter.

    So in buffered mode the graphic card has always one frame in advance ready to be composed. As soon as a frame has been processed and presented to the user the software read and upload the next video frame in advance.

    The other thing that is done in buffered mode is that the clock of the content is not taken anymore from the computer clock but rather from the monitor frequency. To my knowledge most other media players are not doing something as subtle as this.

    So if we keep the same timings as in the first example but simply shuffle them according to the way the buffered mode is working here is the result:

    Software genlocked video

    Software genlocked video

    Because the movie clock is genlocked to the monitor and because we have always one frame in advance we play the movie with a perfect regularity and timing : 1 1 2 2 3 3 4 4 … each frame is played exactly twice.

    If you are still not too sure why it’s nice to be genlocked at 60 fps here is a recording we did in real time from MediaMaster when running 2 layers. The lower one is at 30 fps and the top one is at 60 fps. The loop running at 60 fps is part of our test content, it’s a loop running a ramp that allows us to see visually if the system is perfectly genlocked. we capture the output with fraps.

    If you apply effects to the content you play, the effects will be rendered at 60 fps and this is why it’s so important to have a perfect synchronization, your eyes will have the feeling that everything is crisp and fluid.

    In order to see the videos in this article you need to have QuickTime installed and the first video should play smooth on a recent laptop or on a LCD monitor set at 60 fps.

    So here is the original screen grab that I just scalled down in order to show it in this article, the fps is still 60 fps:

    Now in order for you to be able to see the difference here is the same loop at 30 fps:

    I continue to lower down the fps and now we are at 20 fps:

    Here is the more degraded version at 15 fps:

    If you are curious to test this with any software video mixer that can play QuickTime movies here are our test files:

    Horizontal ramp, 2 seconds at 60 fps:

    Here is a vertical ramp at 60 fps:

    And a zooming rectangle at 60 fps:

    Those movies can be downloaded for you to test your systems, you just need a software that support QuickTime *.mov files. For best results you should loop those files and let them run for a while. In a VJ application you can simply add them as the latest layer of your composition with an addition blending and you will see if your system is powerful enough and well designed.

    Feel free to stress test our software and compare it with others, we think that we did very nice with MediaMaster and there is a demo version for you to test on the ArKaos web site.

    Preview of interface to drive a network of players

    I am toying with the idea of a network of players synchronized to a central coordinator since a few months and I have already shared those movies:

    New synch experiments at WWDC

    Exploring an idea about distributing content on a network of players…

    Our goal at ArKaos is to build a new range of products based on such architecture but we want to move forward step by steps and the first public step will be a simple interface to drive a limited pool of players connected to the same network.

    Here you can preview what will look like the interface. This application is not even alpha code but I am happy to share it with those of you that are curious.

    I used the opportunity when working on this project to experiment with the cocoa tools of Mac OS X. I am a big fan of cross platform programming and I worked with wxWidgets since a few years. Unfortunately because Apple did obsolete carbon we are now looking for new ways of creating our interfaces. I am happy that just within a few days I was able to create this already complex interface without writing too much code. In the end it’s true, interface builder rules and cocoa is a great idea!

    Ok now to my prototype, the application is still useless but it demonstrate how to write a simple cue player. You can create list of events (cues) and assign them to computer keys. Then the idea is that when you will press those computer keys those players listening to the network events will start playing those video loops.

    So at this time what can be done here is:
    – importing static pictures to the cells on the left by drag and dropping from the finder.
    – playing with the + / – buttons of the cue editor to add and remove steps of a cue.
    – drag and drop visuals from the left cells to the cells of a cue.
    – you can edit the layer position, start time and duration of a cue step. You need at this time to use the enter key to validate a new time or duration.

    Here is a simple picture to show the interface in action:

    Coordinator

    Coordinator

    Just drag and drop a few pictures on the left cells, create a few cues steps with the + and – buttons. The final prototype will be able to play movie loops across 3 zones of maximum 6 projectors. It will be possible to stack 4 layers of visuals on each zones. This is why by example the popup says z1l1, it means zone 1 layer 1.

    I made a quick build if you want to play with this preview app, for Mas OS X only at the moment, download it here:

    2009_08_CocoaCoordinator.dmg (208 KB)

    Perfectly fluid animations…

    At ArKaos we work a lot on the fluidity of the animations produced by our softwares. To do so we multi thread a lot the software and make the best use of the graphical acceleration.

    It looks like on the Mac, under 10.5.6 at least, it’s very difficult to have perfectly fluid animations on one monitor while continuing to update a software interface on another monitor. Unfortunately it’s what we want to achieve in all our softwares.

    I was able to reduce the problem by creating a simple application that draw a band in one window at 60 FPS. As soon as the system is slowing down you see clearly the speed of the animation being irregular. Because the code is very light, uses no texture, I was expecting this code to run very smoothly.

    The function that is called for every frame is as simple as this one:

    void display1(void)
    {
    static int framecount = 0;

    glClear (GL_COLOR_BUFFER_BIT);
    glColor3f (1.0, 1.0, 1.0);
    glLoadIdentity (); /* clear the matrix */
    /* viewing transformation */
    float x_pos = – ((framecount++ % 60) – 30.0) / 8.0;
    gluLookAt (x_pos, 0.0, 5.0, x_pos, 0.0, 0.0, 0.0, 1.0, 0.0);
    glScalef (1.0, 2.0, 1.0); /* modeling transformation */

    glColor3f(1,1,1);
    glBegin(GL_POLYGON);
    glVertex2f(-0.5,3);
    glVertex2f(-0.5,-3);
    glVertex2f(0.5,-3);
    glVertex2f(0.5,3);
    glEnd();

    glFinish ();
    }

    It simply draw a band like this that cycle in the window:

    band

    band

    Because framecount is incremented for each new frame it has a very visible impact when you miss a frame, this is what I want to see.

    Here is the result of sharking the application, untitled and untitled copy are the 2 instances running. You see that the whole machine is almost idle, but very clearly on this picture after 6 frames you see that the time between 2 ticks on the untitled line start to double:

    Sharking 60fps animations

    Sharking 60fps animations

    I have the feeling that the problem is in the WindowServer process but it’s out of my scope to understand what is wrong there.

    If you are curious my very simple GLUT sample is here GlutSample.zip

    Once you unzip it just run band1 and band2 and move one window on a second monitor you should see the animation become irregular.

    Here is the shark file GlutSample.mshark

    Creating a new video codec based on texture compression

    At ArKaos we always fight to get the best out of current computer configuration regarding media playback.

    When you are in the show industry and try to pick the best way to compress your content it’s still a little bit of black magic. While some codec do compress well video they are heavy to handle for the machines, even more, the codec that does the best job at compressing while keeping a good quality such as H264 are very bad when you need to scratch your media.

    The best codec for artists that need to interact a lot with the content should allow to play forward and backward easily and should allow to jump into the content quickly.

    Some have experimented with using a file format that is designed by the companies making the graphic chips of your computer. This file format is based on texture compression DXT1 DXT3 or DXT5.

    I was wandering how hard it would be to add that file support to QuickTime inside a new codec. Being programing for the QuickTime API since it’s version 1.0 beta I was considering the challenge fun and interesting.

    So I resuscitated an old sample code from 1999 on the apple web site and after a few hour I had a codec having my name and generating video files that could be played back by the QuickTime player.

    I jumped then on the DXT texture compression problem and picked the squish library to handle texture compression. A few more hours and I had a working codec.

     

     

    ArKaos codec in the QuickTime player

    ArKaos codec in the QuickTime player

     

     

    This codec when used by QuickTime is not efficient because the texture decompression is done on the CPU and so you don’t see the advantage of using the GPU to handle that texture format.

    To demonstrate the speed up I used an experimental player I am working on and patched ffmpeg to handle that new codec I just created. Thanks to that I can leave the texture datas untouched and pass them to the GPU via the OpenGL extensions regarding texture compression.

    When this was done I could enjoy the show and make some performance tests, file size is in MB and CPU load is taken from running top in the console:

    name resolution data H264 Photo JPEG Dxt1 Dxt5 Dxt1 Compressed Dxt5 Compressed
    Alternate Rotate DI 640*480 File Size 9,3 14,3 22 43,9 17,5 19,7
        CPU Load 31 30 7 9 15 18
                     
    1920 HD 1920*1080 File Size 36,5 41,8 90 180 55,7 63,8
        CPU Load 99 95 13 22 42 56
                     
    T25 Random Twirls 1280*720 File Size 35,9 73,2 65,9 131,8 51,5 58,4
        CPU Load 78 72 10 16 32 41
                     

    If you are curious regarding frame quality here is a part of an original frame:

     

    original frame Apple Intermediate

    original frame Apple Intermediate

     

    Here is the a part of the same frame using Dxt1

     

    Part of an HD frame Dxt1 compressed

    Part of an HD frame Dxt1 compressed

     

     

    And here is a part of the same frame with Dxt5 compression:

     

    Part of the same frame with Dxt5 compression

    Part of the same frame with Dxt5 compression

     

     

    The results of my test is that pure Dxt1 regarding CPU load is the best format, Dxt5 is better regarding quality but generate twice heavier files. Using Dxt1 I could play 7 HD loop at the same time with my laptop … if the data rate of my hard drive would allow it!

    The problem with pure Dxt1 or Dxt5 files is that they are huge, 4 times bigger than photo jpeg or H264. To go around that I experimented with using data compression on the CPU because the CPU is now almost unused!

    So using Dxt1 Compressed texture is the best solution regarding CPU usage and disk usage. The size is almost the same than photo jpeg and a little bit bigger than H 264. When using the CPU to decompress the texture data from disk the codec is still twice faster than any other codec I tested.

    On my MacBook pro I can’t play a full HD loop and even at 1280*720 my machine is struggling. Using Dxt1 compressed I can play 2 HD layers and 3 720p layers.

    I made available the test player I used and a set of video loops. This dmg file is for Mac OS X x86 only (no Power PC), so if you are curious here are the files:

    CompressedTextureDemo.dmg (435 MB)

    We will bundle that codec experiment in the next update of GrandVJ and MediaMaster, that’s for sure!