Pipe Dreams: ml

Showing posts with label ml. Show all posts

Thursday, February 9, 2017

Tango - I Spy

Project Status: App Idea

Requirements

This app would require a Google Tango device to run. The development skill level is quite a bit more difficult than previous app ideas posted here on this blog. If you are looking to buy a Tango device check out this post which links to the newest devices on the market.

The Idea

The gameplay is based off of the child's game often called I Spy. The app would see/scan/learn the room the player is in and use computer vision to secretly select objects within the environment. The game is to have the player try and guess which object the app is thinking of, much like the child's game often played in real life. Not only would this app potentially be fun to play, it could be used to help train computer vision neural networks.

Gameplay

When you open the app the game may take a second to acclimate, this is basically how Tango works now as it often says "Hold Tight," only with the added process that the app will have to try and use computer vision to quickly see an object in the field of view that it recognizes.

The very first object selected by the app each time it opens will likely be super obvious and easy to guess, which will serve several purposes. For one thing, the quicker it can find an object the faster the game will start. This saves from having to have the player scan large portions of the room before playing, We know that the player will naturally move the device around and we can scan the room during gameplay. However; if no objects are visible or recognizable right away, we may need to ask the player to slowly look around a bit with the device until this is solved.

Another benefit of this quick and easy recognition is it will help us to understand our player. If he/she is not able to quickly guess an obvious object within a small field of vision, then we will know to keep the object selection and gameplay easy for him/her as it is likely this is a child or someone that would have a hard time guessing in general. On the other hand, if he/she quickly identifies the object we will know that we can safely step up the difficulty when possible.

The app will need to provide a hint to start the player off such as, "I spy something red." The player can walk/look around and touch objects to guess if they have found the correct object that the app is thinking of. When wrong, more hints may be provided over time to help guide the player. If they player is having a hard time the app could even use visual feedback to show if the player is getting "warmer" or "colder" as the device is moved around.

Hints could be basic like "look down," "try higher," or more complex like "it is shiny," "it may be wooden," or "the object is soft and fuzzy." Furthermore; if it happens to be a pretty common object that the app understands clues could be something like this, for a door for example: "you use it," it swings," or "it's closed."

It is possible to do a little machine learning here. At the end of each round we could let the player point out if any hints were misleading or flat out wrong. Of course we could playfully apologize while noting that we are a simple minded AI character that he/she is helping to teach, promising to try and do better in the future.

When the correct object is selected the app could continue with the next object or by allowing the player to think of an object. Reversing the roles is quite a bit higher of a challenge for the developers but could always be added later on as a feature to the game.

Training Computer Vision

By reversing the roles, allowing the player to pick objects, we now have a nice opportunity to train our neural networks and make our computer vision results better over time.

There are several creative ways to allow the player to provide hints to the app, as it now must guess the object. One way is that we could start out by allowing the user to pick from a dynamic color palette specific to this environment. Essentially saying "I spy something the color..." by tapping this selection.

After that point the app could lead the player to objects is wants to guess and provide the player with three options to always select from being something like: yes, no or I don't know. "I don't know" could be worded many other ways but it's basically the option for what you are asking or guessing doesn't make sense, so please try something else or clarify. The app could continue to select objects or pose questions for the player to answer such as "Is it food?" The questions would always be geared towards a yes/no answer and would often teach us something about the object in question.

Other ways of interacting that allow the player to select objects would be to use speech recognition or providing a keyboard and an input box, but typing doesn't sound very fun in this game format and we are already making our workload hard enough without having to deal with speech recognition for what would often fall into the same three categories of yes, no or I don't know.

The Development

While this would be a really fun project to work on or be involved with, it is way over my skill level at the moment. Hopefully though someone will read this and take off running with the idea. Good luck to you and please keep me posted with the progress. Right now I am simply following tutorials and trying to get anything working with Tango. Let me know if you would like more clarification on this idea, I often tend to rattle off parts of an idea that make sense in my head but I lose people along the way by leaving out major portions of the inter-workings.

Thursday, August 25, 2016

Typeface Detection App for Font Finding

Project Status: App Idea

Problem

Sometimes we see the use of an interesting or unique typeface in print or in graphics and we would like to know which font that is exactly. This way we can download it or purchase it for our own use.

Workaround

The current way of tracking down a font is by searching through many fonts online hoping to randomly come across a matching or similar one. Another way is to try to track down the person who designed the graphics or print and attempt to ask this person or team directly.

Solution

I would like to use computer vision, such as OpenCV, and design an app that will not only recognize that there is text in an image/video but will discover the name of the font type used and link to it.

The app will likely be used over and over for some very common fonts and a small database can be made to handle this. By nature though the app will be used in cases of very obscure typefaces and possibly even ones that aren't fonts but the one-off designs used for specific applications.

When a font can not be determined it can be logged. The app user can choose to be notified if the font is labeled at sometime in the future.

Actually, if you have ever used the Shazam mobile app it would work in much the same way as this does but instead of music based data it would focus on fonts.

I would also like it to be a useful tool to those people that design fonts. This app could help them to sell and distribute fonts they own or to popularize ones they give away for free. If done correctly it could also help provide them with user interest feedback, in ways that may help them in designing more fonts. For example; if they see the unlabeled fonts users are searching for they may be able to develop similar fonts that those users may also like.

The tagging process, calculating the probabilities of type matches and even finding similar font types could each largely be done or helped by machine learning. The algorithms and bots could carry the largest parts and the most tedious portions of the workload while their human counterparts could guide them and help to improve the results they generate.

There are several easy to find and scrape font repositories on the web but it would eventually be interesting and useful to write bots to hunt down and include the more obscure ones out there.

Monday, July 25, 2016

Machine Learning to Improve 360 Video Playback

Project Status: Still Dreaming

Problem

Watching a 360 video in any setting other than virtual reality could lead to a poor experience due to non-optimal viewing path. Viewers have to work pretty hard to be their own directors of photography or else they may end up staring at the wall while something super awesome is going on in the other direction.

Workaround

Currently you have to back up and try to find what you missed, assuming you even realize that you missed something in the first place. Using any type of virtual reality headset will help immensely due to the simple fact that the viewer may naturally look around in the environment.

Solution

Storing viewer paths as people watch 360 degree videos will give us the data needed to learn what path is optimal to suggest to a viewer who has never seen this video before.

So while you won't be physically pushing a person's head around as they wear a virtual reality headset, you can be pushing the view around for someone watching from an online source such as YouTube. Meanwhile for those VR headset users, we can still suggest directions they may want to look in much the same way video games do. If you are as horrible of a gamer as I am you will know right away that flashing red area on the side of the view means that is the direction you are being attacked from.

When a new 360 video is uploaded to a site such as YouTube, one cool option would be to give the person uploading a tool that allows him/her to set a default viewing path throughout the video. Honestly that should be done anyway but, as you know, we can do even better that.

Now to be clear, what I mean when I say a path is basically if you imagine recording not a whole copy of the 360 degree video and every direction but the actual video frame by frame that a viewer saw. While watching the viewer may have the ability to turn in any direction but ultimately they can only look one direction at a time and so whatever they choose to see becomes a video experience of its own. Even if they back up and play through parts again while checking out other directions their perspective is still limited to one direction of view at a time.

So obviously it wouldn't be very efficient to store a new copy of the video each time it is viewed. Instead what we actually need to store is the camera angle at each point in time. We store this frame by frame camera angle and call it a path.

For now we'll keep the idea simple with camera angle data but it is interesting to note that very soon with the advantages and expanding use of virtual reality technology tracking eye movement will also be extremely useful to use.

Maybe other features could help us learn quicker such as if we take into account when the user backed up to view something else. We could also note when the viewer is using a virtual reality headset, because when they are we will end up weighing the path importance differently due to the fact that they have easy means of motion. Some paths may be more traveled depending on screen sizes or viewer demographics. We should focus on anything we can take into account as a feature to learn quicker, maybe viewers who liked the video had a better viewing experience than those who disliked it.

Quite simply the idea is that we start by recording the paths taken by every user who watches the video and with that aggregate data we can begin to find the paths most traveled. Then suggesting paths for future viewers, learning from their variations from that path previously learned and repeating the cycle.