This video shows some of the image processing we are working on for analysing traffic video streams in Kampala. The aim of this project is to use the cameras in phones as the basis for an ultra-low cost congestion monitoring system, furthermore one which is able to deal with the unusual features of developing-world city traffic. In this example we first use SURF features to calculate correspondences between each frame, giving us a set of motion vectors in the coordinates of the image. We then project those vectors into 'real-world' coordinates, allowing us to calculate speeds in km/h. The last part of the video shows how a regular grid in real world coordinates compares to the motion vectors.
In earlier stages of the project, we worked with Uganda Police to use their network of CCTV cameras. We were at least able to get some good example images with those cameras showing why traffic monitoring is not as straightforward here as it is in some other places:
The problem is that because those cameras can pan, tilt and zoom, the field of view can change very frequently. Rose Nakibuule tried ways of automatically calibrating the camera projection using the motion vectors, features of common vehicles which help us estimate scale, and so on. Conclusion: this is difficult to pull off reliably. In the end we decided that some manual calibration is probably unavoidable anyway, since we need to know the different regions of interest in the frame (lanes going in different directions, for example, which we need to be processed separately). So now we use camera phones in a fixed position and setup involves a user clicking on four points in the image corresponding to a regular square on the road, which seems to work pretty well.
Including the speed estimate the results look like this (screenshots from a Python/OpenCV port of the Matlab code used above):