Online Video Conferencing Anonimity Thesis: 2012

Monday, 8 October 2012

Code Update

I have recently added some modifcations to the avatar inside the code. William has added some code also into the face tracker application (specifically so that the window is ontop). This was needed so that the Virtual webcam will be able to capture the window via WinAPI GetClientRect, do display it on webcam programs such as skype. The Virtual Webcam code has been uploaded to the svn repository.

Wednesday, 19 September 2012

2D caricature Animation using the kinect

Recently i was informed about the new changes to the Kinect SDK and have seen that they have tutorials that show facial feature detection. I have used these to create a simple (using the kinect sample as a base) program which creates a 2d caricature that animates according to the users face. Below is a picture of me using this. I am also trying to add a 3rd screen that will create a 3d mesh that moves much in the same way as the 2d caricature. (This code has been added to the svn)

Thursday, 6 September 2012

Update: faceoverlay

This is a new library that i have found that exists in the gst-plugins-bad set of plugins that uses OpenCV along with gstreamer to locate the face and overlay an image. At the moment i have not got this completely integrated because the library needs a dependencies which i have not been able to locate

Update: v4l2loopback

This is a loopback device that can act as a webcam for any program that can access a webcam. Essentially it takes in an input and that same input is its output for a program like Skype or Windows messenger. This essentially means that i can modify the input from a "real" camera with an overlay and transmit that to the v4l2loopback device which then Skype uses.

If i have time after the first prototype, i will try to integrate this with my prototype and possible show a live demonstration.

Links

http://code.google.com/p/v4l2loopback/

Lack of Blog Updates

I apologise for the lack of updates on this blog. Essentially i am in a coding phase. So at the present time i am just coding and am trying to put forward a prototype to show at the next meeting.

There is an update on my environment however. I have changed my coding environment from windows to linux for the sole reason that it is easier to install and use libraries.

Thursday, 23 August 2012

Issues in Windows

Currently i am having some issues coding with GStreamer in Windows. It seems that i get a few errors that relate to creating a pipeline for the program to work with. Not sure if it is my environment or some other issue. So currently getting a linux distribution working on a computer for easier coding environment setup and should have a prototype done in about 1 week or 2.

If anyone viewing this blog has any experience with GStreamer on windows (www.gstreamer.com - sdk), and has any information for me it would be really appreciated.

For a little more information about the prototype. If i get the time for a UI then it will be done with the QT Framework

Thursday, 2 August 2012

Current Modifications and What im Doing

Over these past few weeks, i have decided on a few changes to my system.

The previous UI that i had prepared, now seems to be a little too "clunky". So i have decided to mainstream the UI to make it easier to use. Essentially there will be no video screen on the actual program, but instead will have options that the user will be able to set. I decided on this because this program, put simply is changing a webcam stream and replacing the face with a 2d caricature or 3d avatar, and because of this the UI does not need any screens of its own. The settings for the user to modify will include the choice of type of avatar (2d or 3d), then the avatar itself, along with a checkbox with the name activate which is checked when the apply button is pressed which then activates the program overlay, and others with if i find that more options would be useful.

System Structure

The system itself will be :

GStreamer on the frontend that the UI code deals with
OpenCV on the backend which does the algorithmic functions for face detection etc, called only via GStreamer functions made.

Essentially, the UI will call the GStreamer functions when the checkbox for activate (explained in the above paragraph) is on when the user clicks apply. At the moment i am trying to find the best method to pass the screen to OpenCV so that the OpenCV libraries can do the algorithmic functions.

My current view is that i send via GStreamer a mat (or the relevant format) to OpenCV, which then calls functions for face detection and feature detection. OpenCV will then return a 2 dimensional array which are the coordinates of the features on the face, in reference to the mat that was sent to OpenCV. Then GStreamer will draw on the webcam video an overlay with the drawing according to the coordinates given by OpenCV.

Tuesday, 3 July 2012

Update: 03/07/2012

During these last couple of days i have been preparing my computer environment for proper coding which would include GStreamer.

My current environment now has :

Windows SDK (which took longer to install than i originally thought)
GStreamer Binaries (Built using CMake)

Currently i am doing code testing with GStreamer to verify its capability in my project. If GStreamer works for what i intend to use it for then i shall continue with doing project code and updating this blog when i can

Thursday, 28 June 2012

Resuming Work

As i have finished my exams, progress with this project will now continue.

I will update my progress on Monday 2nd July

Wednesday, 13 June 2012

Post-poned Work

As of now i am currently in my study period for final exams. Because of this, minimal to no work will be done on this till after my last exam on the 27th of June. My work on the project will then continue after my final exam.

Sorry for the delay in the project.

Saturday, 26 May 2012

GStreamer

During the week i have been looking for ways to implement stream modification and came across a library called GStreamer. It contains functions that allow for direct manipulation of a webcam stream even if the stream is currently being used by a program such as skype, which means that it does not take over the webcam stream like OpenCV. However OpenCV will still most likely be used due to the fact that facial recognition still needs to apply for the filters for GStreamer to use.

http://gstreamer.freedesktop.org/

Monday, 21 May 2012

Recent Activities and Progression

I realise that that blog post is over due but for good reason.

Firstly my project has been changed from a Waterfall Development Project to a Agile Development Project. This change was done because of the way that the code has been made. Me and William are currently sharing code and splitting the parts appropriately. While i have done more research than coding i am currently doing more coding than research because of my research in finding how my design can be implemented.

In my research i have found that to do the my implementation of being a go between between the video conferencing program and the kinect (or webcam), i need to implement a filter system. The filter uses the Windows SDK DirectDraw methods to hook into the webcam stream and modify the stream on the fly. This is exactly how i wanted to do this project but have had quite a few speed bumps along the way. First of which was finding out how to modify a webcam stream, since OpenCV is unable to do this an alternate route had to be found. This is where i found the filter method, while very promising there are very few open source examples of this type of implementation, not to mention that installing the Windows SDK was a more than a little hassle but will not go into that in the blog.

Currently i am re-evaluating my design with the filter method and find out how to use the Direct Draw methods in the SDK to be able to change the webcam streams, and doing more coding.

Tuesday, 1 May 2012

Facial Recognition Implementation

With the previous posts on the 2 different ways of facial tracking there arises 3 different ways to implement facial tracking into this project

Using AAM Tracking for Facial Recognition
Using HAAR for Facial Recognition
Using Both AAM and HAAR

The third option arose from my research on my discovery of this paper "Fast AAM Face Recognition with Combined Haar Classi ers and Skin Color Segmentation", written by various authors. It explains that since the AAM tracking is quite sensitive to the initial starting position of the model and image, it is possible to using HAAR classifiers to give the starting positions of the model and image, which would then filter to the AAM algorithm.

The implementation will be discussed with William Qi before any implementation is done due to how we are co-operating with the code.

Reference

http://www.jofcis.com/publishedpapers/2012_8_7_2799_2806.pdf

Active Appearance Model

Active Appearance Model (AAM) is a algorithm that uses a statistical model. This model is a model of the shape and grey-level appearance of an object. During the training phase of the algorithm, we begin to learn the relationship between model parameter displacements and the residual errors induced between a training image and a synthesised model. This algorithm is able to give a good overall match in just a few iterations even with poor starting estimates (to a certain degree).

However AAM is very sensitive to the initial matching position of the model and the image, and there could be problems with the computational expense of the algorithm and its accuracy without a good starting place.

References

T.F. Cootes, G.J. Edwards, C.J. Taylor. Active Appearance Models. 1998. Proc European Conference on Computer Vision.
(http://www.cs.cmu.edu/~efros/courses/AP06/Papers/cootes-eccv-98.pdf)

HAAR Object Detection for Facial Detection

This detection is also called "Viola-Jones object detection framework", named after Paul Viola and Michael Jones, which uses Haar Features (which derive from HAAR Wavelets) to detect objects. Haar-like features are features represented as digital images such as lines and edges that are used in object recognition. The Haar classifier uses these digital images to detect objects by viewing the change in contrast values between adjacent rectangular groups of pixels. These changes in contrast determine relative light and dark areas. The reason why these feature are used is because they are easily scaled by increasing or decreasing the size of the pixel group being analysed.

Figure 1. Haar Features

Using the Viola-Jones framework, the features that are used involves the sums of the image pixels within the rectangular areas. While there are other classifiers that use Haar such as the Haar Basis Function, the Viola-Jones uses more than one rectangular area making it more complex and therefore able to detect more facial features

This Viola-Jones framework is the method included with the current OpenCV libraries for facial detection.

References

Michael Jones, Paul Viola. Robust Real-time Object Detection. 2001. Second International Workshop on Statistical and Computational Theories of Vision - Modeling, Learning, Computing and Sampling.
(http://research.microsoft.com/en-us/um/people/viola/Pubs/Detect/violaJones_IJCV.pdf)

Dr John Fernandez, Phillip Ian Wilson. Facial Feature Detection Using HAAR Classifiers. 2006. JCSC 21, 4.

Saturday, 28 April 2012

Week 6/7

Design

Since my previous post there have been no current changes to the design of the main system. There has been one main change and that is instead of using openGL libraries such as Glut or FreeGlut, me and my associate student have concluded that it would be better to keep it as a directX program since our platform for this thesis will just be Windows.

Coding

The main part of these 2 weeks has been coding.

The first part was coding the kinect interface properly to receive data from the kinect. However this proved more challenging than originally expected and with the help of William and Hamed (a PhD Student/Researcher at the university), i have more understanding on how it works.

The second part of the coding is for the face detector. This is where my research of these 2 weeks comes in (written below).

The coding for this thesis has been more challenging than originally expected.

Research

During these past 2 weeks my main bulk of research was to look into the face detectors. Since there are more than one algorithms that could be used i narrowed them down to 2 main ones:

HAAR Object Detection - this is a library in OpenCV that has a face detection algorithm built into it
AAM Tracking - This is an external library that has been built to integrate into OpenCV.

My research into this has shown me that AAM tracking is more refined and is able to detect the face with a better accuracy than HAAR. I have not completed all my research into this, and when i have a blog post will be done on just this.

Current Implementation

Currently im in the process of implementing the HAAR detection because it comes ready with OpenCV. If my research shows me that AAM is a better algorithm in all aspects, steps will be taken to integrate AAM Tracking.

Friday, 13 April 2012

Week 5

For convenience all blog updates will be done on a weekly basis on the friday's of each week. Each blog update will be split up into 3 sections. Design, where i update on any design changes that may have occurred during the week. Coding, where i update on any code that has been done or is in progress over the week. And literature, where i add updates on any papers that i have read and deem have been useful in this specific context.

Design

As of the meeting that was held on the 4th of April 2012 these are what my designs were

Data Flow Diagram

HIPO Diagram

UI

The design has not been modified since said meeting.

Coding

During my constant communication with my colleague William Qi (http://anonskypewilliam.blogspot.com.au/?view=classic), we have come to the conclusion of using the Kinect SDK with OpenCV to be our coding base with the Kinect for such things as collecting data, face tracking etc.

We have also concluded that as a base for our thesis we would like to have 2D Deformation done, due to the fact that 3D Deformation could be quite CPU and Graphically intensive and possibly unviable until further research is done.

My current coding is in progress and it involves converting the input code from DirectX to OpenGL (OpenCV) in the Skeletal Viewer that is included in the Kinect SDK.

Current Literature

All the current Literature that i have read and analysed include

Automatic reconstruction of personalized avatars from 3D face scans by Michael Zollhofer, Michael Martinek, Gunther Greiner, Marc Stamminger and Jochen Sußmuth
Realtime performance-based facial animation by Thibaut Weise, Sofien Bouaziz, Hao Li and Mark Pauly
Computer-Based Analysis of Facial Action in Schizophrenic and Depressed Patients by Frank Schneider, Hans Heimann, Waldemar Himer, Dietmar Huss, Regina Mattes, and Birgitta Adam

Am also currently reading several papers that include information about 3D expression accuracy and kinect facial recognition algorithms.

Wednesday, 14 March 2012

AAM Tracking

AAM Tracking is Active Appearance Model Face Tracker using OpenCV. It is a library that was made by Dr. Radhika Vathsan as a way to track a face from a camera.

There are several examples of this used to control a 2d image, such as the youtube video below.

I am in the process of researching and looking into this code to see if it is possible to make this work with a 3d model.

http://www.youtube.com/watch?v=eSS4GFIH94w

Wednesday, 7 March 2012

Face Deformation with OpenGL

While doing research on some possible way to control a 3D face with OpenGL, i found this paper (http://paper.ijcsns.org/07_book/200704/20070420.pdf) named "3D Face Deformation Using Control Points and Vector Muscles" which was published on the International Journal of Computer Science and Network Security.

This paper provides information about a method of deforming a 3d face by using control points and vector muscles.

During my reading of this i have thought about actually using control points as a base point to control a 3D face while reading data from the users face. This should allow for a 3D face to appear to convey the same emotions as the user

I have not yet seen if a kinect or a webcam would either suit this method or which one would be better to use.

NB. All information used from this paper will be quoted and acknowledged

Tuesday, 6 March 2012

Preliminary Ideas

There are a few idea i have thought about regarding Anonymity, these are

Using the Microsoft Kinect to help generate a complete 3D avatar which takes over as the user during a online video conference
Using a webcam or a Microsoft Kinect to make a 2d avatar's face as an overlay on top of the users face. This would involve no other alterations to the video except the users face would now be replaced
Using a webcam or a Microsoft Kinect to place a 2d avatar's face as an overlay again, however the video is replace by a premade 3d base head in a 3d modelling program. In this way the 3d base head will always be the same and only the 2d face would change. This would ensure that the user would be fully anonimised

The Thesis

This is my thesis topic for 2012 at the University of Sydney. There are several reasons why anonymity would we used while conversing over the internet and this thesis is to explore the possible ways in which this can be done with either a Microsoft Kinect or a ordinary webcam found on most computers in this day and age.

However simple versions of anonymity like blurring, while it would work would not be able to convey any emotion over the internet and other avenues need to be looked at for different circumstances.