Monday, July 25, 2016

Machine Learning to Improve 360 Video Playback

Project Status: Still Dreaming

Problem

Watching a 360 video in any setting other than virtual reality could lead to a poor experience due to non-optimal viewing path. Viewers have to work pretty hard to be their own directors of photography or else they may end up staring at the wall while something super awesome is going on in the other direction.


Workaround

Currently you have to back up and try to find what you missed, assuming you even realize that you missed something in the first place. Using any type of virtual reality headset will help immensely due to the simple fact that the viewer may naturally look around in the environment.  

Solution

Storing viewer paths as people watch 360 degree videos will give us the data needed to learn what path is optimal to suggest to a viewer who has never seen this video before.

So while you won't be physically pushing a person's head around as they wear a virtual reality headset, you can be pushing the view around for someone watching from an online source such as YouTube. Meanwhile for those VR headset users, we can still suggest directions they may want to look in much the same way video games do. If you are as horrible of a gamer as I am you will know right away that flashing red area on the side of the view means that is the direction you are being attacked from.

When a new 360 video is uploaded to a site such as YouTube, one cool option would be to give the person uploading a tool that allows him/her to set a default viewing path throughout the video. Honestly that should be done anyway but, as you know, we can do even better that.

Now to be clear, what I mean when I say a path is basically if you imagine recording not a whole copy of the 360 degree video and every direction but the actual video frame by frame that a viewer saw. While watching the viewer may have the ability to turn in any direction but ultimately they can only look one direction at a time and so whatever they choose to see becomes a video experience of its own. Even if they back up and play through parts again while checking out other directions their perspective is still limited to one direction of view at a time.

So obviously it wouldn't be very efficient to store a new copy of the video each time it is viewed. Instead what we actually need to store is the camera angle at each point in time. We store this frame by frame camera angle and call it a path.

For now we'll keep the idea simple with camera angle data but it is interesting to note that very soon with the advantages and expanding use of virtual reality technology tracking eye movement will also be extremely useful to use.

Maybe other features could help us learn quicker such as if we take into account when the user backed up to view something else. We could also note when the viewer is using a virtual reality headset, because when they are we will end up weighing the path importance differently due to the fact that they have easy means of motion. Some paths may be more traveled depending on screen sizes or viewer demographics. We should focus on anything we can take into account as a feature to learn quicker, maybe viewers who liked the video had a better viewing experience than those who disliked it.

Quite simply the idea is that we start by recording the paths taken by every user who watches the video and with that aggregate data we can begin to find the paths most traveled. Then suggesting paths for future viewers, learning from their variations from that path previously learned and repeating the cycle.