In this project, we have built a robot dog based on Raspberry Pi. First, this little dog can find your face and follow your steps. This function is mainly achieved by detecting the user’s face and use the position and size information to control the motors. Second, this dog can recognize a “go” command which is a person’s “go” word and a “stop command” which is the sound of a whistle and make the corresponding response. This is achieved by detecting the sound with zero crossing frequency detection. Third, this dog can make various sounds when it cannot find the face, or reply to “go” and “stop” command.
Build a robot dog with following functions:
 Face detection
 Move and track user's face
 Simple voice recognition of two commands
 Make sound response
In the final system setup, we use the Pi Camera and a mount we bought from Amazon. The Pi Camera is connected to Pi via the dedicated connector. After turning on the enable camera option in the configuration and reboot, the Pi camera was ready to use.
The official user manual suggests that the mount should be plug into the USB of the Raspberry Pi. However, the camera will shake together with the Pi when the robot dog moves. The Pi is mounted on the robot frame with VELCRO tape, not stable enough for the camera. As the camera shakes, the image will be blurred, leading to a low performance of the face detector. Thus, we fixed the mount on base of the robot frame and the performance of the face detection has been improved.
The installation of the Microphone is as simple as plugging it into the USB.
The robot frame used for this project is same one we used in Lab 3. For detailed installation procedure, please refer to the Lab 3 Instruction Note.
We followed Adrian Rosebrock’s tutorial to install OpenCV 3.1.0 on the system . The installing process can be summarized as downloading source code, installing dependency, compile and install. One thing to point out is that we did not install the OpenCV in a python virtual environment. Python virtual environment is a useful tool to isolate developing environment. However, we use this Raspberry Pi dedicatedly for this project. Thus there’s no need to install the OpenCV in a python virtual environment.
We followed an online tutorial to install pyAudio on the system  as:
$ sudo apt-get install git
$ git clone http://people.csail.mit.edu/hubert/git/pyaudio.git
$ sudo apt-get install libportaudio0 libportaudio2 libportaudiocpp0 portaudio19-dev
$ sudo apt-get python-dev
$ sudo python pyaudio/setup.py install
We use two servo motors for the two wheels. They are connected to Pi via Pin 20 and 21 (BCM number). These motors are powered with dedicated batteries with a separate power switch. The ground is shared with Pi as Figure 3 shows.
The face detection function is mainly achieved by the face detector module of OpenCV. There are generally two algorithms for this purpose. Namely Haar Cascade and Local Binary Patterns (LBP). Haar Cascade is used for this project because it has a higher detection accuracy. LBP has a faster processing speed but less accuracy. A faster algorithm means we can process more frames in a unit time, resulting a faster frame rate. However, the frame rate requirement is not very high for this project. Detect a face in the frame accurately is more important for this project. Thus, Haar Cascade is finally chosen.
A face detector instance can be setup by code:
facedetector = cv2.CascadeClassifier(cascade_path)
where the cascade_path is the file path of the face cascade xml file. A grey scale image will be passed into the detector and the position, width and height will be returned. The center of the face can be determined with this information.
The face detection function will serve as the navigator of the system. The position of the user will be detected in the camera frame and relative position will be returned. The robot will move regarding this relative position.
During the first phase of development, the camera used is the Logitech C525 USB webcam. However, the frame rate is only about 7 fps, which means this system could only process 5 frames per second and return the relative positions. This could be adequate for navigation purpose. However, there is a lag around 1s as Figure 4 shows, this will affect the navigation fatally. The latency was measured by comparing the taking a picture of the stop watch directly and together with the image preview of the processed frame.
As one of the TA, Jingyao suggest, we can make use the 4 cores of the Raspberry Pi 3 to process the frames in parallel. The latency may be reduced by a faster processing speed.
The program then edited to work with multi processes. The frame rate was increased to around 9 fps, which is 2 fps faster than single process version. However, the latency was getting worse with multi processes as Figure 7 shows. The latency was increased to 4s, it may cause by the overhead introduced by the multi processes.
The tests above indicated that the latency was cause by the procedures before detecting the faces in the frames, which leaves the image capturing process most suspicious. Besides USB webcam, the most popular camera solution on Raspberry Pi platform is the Pi Camera, which is designed dedicated for Raspberry Pis. The Pi camera is connected with the Pi with a dedicated connector, this may lead to a faster operation speed.
After switching to Pi camera, there was no improvement on frame rate. However, the delay was almost eliminated. The delay now is around 0.3s, which could be enough for tracking slow movement of the face.
The relative position between the robot and user is read by the camera. Two ways of interpretation the position signal were tested during the development.
If the user stand straight and look down at the robot, the position can be simply interoperated x and y value.
However, during the development, some problems occurred with this method. First, moving towards and backwards to the robot did not lead to much change in y direction. This can be solved by adjusting the gain when translating the error signal to movement adjustment signal. A more serious problem with this method is that, when the user stand straight, the size of the face in the frame will be small. A small face size will lead to a degrade accuracy and more computing time. Besides, as the camera is tilting towards the light on the ceiling. If there is high intensive light source in the image, the Pi camera will lower the exposure rate, leading to a dark image and the face cannot be found by the face detection algorithm. We have tried to turn off the auto exposure adjustment, but it is hard find a good manual value for this application. During the testing, the system frequently report failed to detect the face.
Instead of using the y axis value, another method we proposed to represent the distance of the face to the camera is the size of the detected face.
After obtain the position of the face in the frame, an error signal will be generated to control the motor. The x-axis error is the deviation of the detected face positon from 0, which is the central position. The size error is the deviation of the detected face size from 55, which was selected after testing a set of values. If the central size is too small, the robot will stop too far away from the user. Also, this will make the user’s face hard to be detected by the face detector. If the central value is too large, the robot will stop close to the user. However, during the testing it was found that when the robot stop too close to the user, the face would be out of the frame. 55 was the best value found.
This method was tested and the system showed a right response to the change of the face position.
Figure 12 is the testing result of the position interpretation function. When the tester’s face in the left or right side of the frame, the error message was changed regarding to the position.
Comparing with Figure 12, the tester’s face was closer to the camera. The size error changed for around 70 to around 110, which was same as expected.
The error signal need to be send to the main program of the system to control the wheels. The signal is send with a UDP socket. In the later section, it will be mentioned that the system involved two functions need to send instruction to the main program. Instead setting up another socket, the two set of instruction are differentiated by a ‘name tag’. The structure of the signal is:
The first element of the signal tuple is the name tag, which indicate this is the error signal. The second element is the error signal itself, which is a tuple containing the x-axis error and the size error. If no face was found, the error signal will be (None, None). The message is serialized with python pickle before sending.
This part of design refers to ‘face_m_picam.py’ code.
In Lab 3, our group has already implemented the motor controller class. The class used for this project was an edited version of the Lab 3 code. According to the servo datasheet, “As the length of the pulse decreases from 1.5 ms, the servo will gradually rotate faster in the clockwise direction” and “Likewise, as the length of the pulse increases from 1.5 ms, the servo will gradually rotate faster in the counter-clockwise direction”. The length of period is base pulse(1.5ms) plus the adjusted pulse. The range of pulse is from 1.3ms to 1.7ms and we have 21 stages ([-10, 10]).
where L_pulse is the calculated pulse width regarding to the speed stage. This set of code has been tested during Lab 3. However, when it was running with the Face recognition code, the wheels started to shaking, even the speed given was 0. In Lab 3, the motors were driven by PWM model for RPI.GPIO, which is a software PWM. When this program running with other processes, the control of PWM wave may be delayed and loss accuracy, leading to the shaking wheels. The course instructor suggested us using pigpio model, which is already built-in the Raspberry Pi. There are two options using PWM with pigpio. The first one is using the hardware PWM model. There are 2 hardware PWM models on Raspberry Pi 3, which can be accessed via pin 13 and 18 (BCM number) with code :
hardware_PWM(gpio, PWMfreq, PWMduty)
However, as it was mentioned in the lecture, using the two hardware PWM models will have conflict with the audio out function of the Raspberry Pi. Instead of using the hardware PWM models on Raspberry Pi, we used the hardware timed PWM, which is accessible from all the GPIOs. This method involved the DMA model on the Raspberry Pi and the performance will not be affected by the CPU workload as the software PWM.
The first line connected to the pigpio demon, which should be started in the terminal before running any pigpio involved program with command:
The second line changed the default range of a full PWM cycle from 255 to 10000 to increase the control precision. For example, now a 10% duty cycle is 1000 rather than 26. The last two lines change the frequency and duty cycle to the expected values.
After switching to pigpio library, the two wheels now have a stable performance.
Making the wheels to move with the face movement in x-axis (refer to Figure 10) can be achieved by controlling the wheels move at different direction (forward/backward) with a certain speed.
At first, we designed a schema that the speed of the wheels should be proportional to the error signal value as:
However, if the gain is too high, the control process would overshoot. The frame rate of the face detection process is not high enough to counter with the fast change frames. As we only expect the system to track low speed moving face, we simplify the x-axis control schema to:
The new schema was tested and has a satisfying result.
For y-axis with size as error signal, the proportional control worked fine. The control signal for x-axis and y-axis If there is no face detected and the error signal is (None, None), the system will simply sit still.
This part of design refers to motor_control_pigpio.py, wheel.py, car.py.
The zero-crossing rate is the rate of sign-changes along a signal, i.e., the rate at which the signal changes from positive to negative or back. This feature has been used heavily in both speech recognition and music information retrieval, being a key feature to classify percussive sounds.
ZCR is defined formally as
Where s is a signal of length T and 1R is an indicator function.
In this project, we use a zero crossing method to calculate the fundamental frequency of a voice.
Since all we needed to do was distinguish two different voices, we used a voice of low frequency and a voice of high frequency in order to have a high level of accuracy. In this project, we used a voice of word "Go" whose fundamental frequency was below 500 Hz to give a move command and a voice of whistle whose fundamental frequency was above 2000 Hz to give a stop command.
In our voice recognition program, we had two thresholds. If the fundamental frequency of the voice was higher than 2000 Hz, we treated the voice as a voice of whistle and sent a stop command through socket. If the frequency was less than 500 Hz, we sent a move command through socket.
One thing we need to point out is that the zero crossing method is easily affected by the noise which means that sometimes the robot will mistakenly recognize the noise, for instance, the voice produced by itself (the robot plays a audio of dog barking when it receives a command ) or the voice by other people. As a result, it needs a relatively testing environment and the amplituderesponse audio played by itself must be low.
Before we tested the voice recognition program on the Pi, we first tested the voice recognition on our own computer, using the internal microphone of your PC. And before we tested the program using a real sound, we first tested it by using specific sin waveforms produced by a waveform generator. In this part, the voice recognition works well, the difference between the recognition value and the real value was less than 30 Hz if the amplitude was large enough.
Then we put the voice recognition program onto the Pi and tested on Pi. In this part, several problems occurred. The first one was that we needed to do some configurations on the Pi for some specific libraries we used in the voice recognition program.
The second problem we faced with was that there was a frame overflow in the Pi. The cause of this problem was that the original frame size (1024 points) we set before was too small. In order to solve this, we increased the frame size to 8192.
The third problem we found was that when power source of the microphone was low, there would be a degradation of performance. After we recharged the power bank, the microphone recovered to its normal performance.
The fourth bug of the voice recognition was that if our voice had a length which was longer than one recognition period we set, the recognition program would send repeated commands. The solution of this was that every time we sent a command, we stored the timestamp. Next time we plan to send a command, we checked the time interval between the current timestamp and the last time stamp. If the interval was larger than 3 seconds, we treated the current recognition as a new one and sent the command. If it is less than 3 seconds, we ignored it. The fifth point was that since the zero crossing recognition method is a method using an accumulation value, when we recognized a frequency which was below the low threshold, there are two possibilities. One possibility was that this frequency represented a low frequency command and the other one was that it was a signal before a high frequency signal which displayed the beginning of the accumulating process of a high frequency signal. In order to distinguish these two possibilities, after we recognized a low frequency command, we continuously checked the recognized signal after it. If the two signals after it were high, we thought it was the second possibility two and sent the stop command. If not, we could ensure that this was a low frequency command and we sent the go command.
The last problem was that the response voice to the command played by the robot would become a noise and sometimes the robot would mistakenly treated the response sound as a command. Since the voice recognition highly affected by the amplitude of the sound, we decreased the loudness of the response voice to a level which would not influence the normal recognition.
After solving these problems mentioned above, we had a good performance of voice recognition in a reasonable distance ranging from 0 to about 4 meters.
The testing results of this module is shown in Figure 14:
The system responded corrected to the sound commands we generated. Also, the system only responded to one command within 3 seconds as we expected.
This part of design refers to hytersis_fre.py.
The main program performs as a UDP socket server to receive signals from face and frequency detection modules. The received signal will first be de-serialized with python pickle. Then signal will be processed as the error or command signal indicated by the first element of the tuple. The structure of the tuple could be:
(‘error’, error) or (‘command’, command)
As there are only two commands involved, ‘go’(move) and ‘stop’, a Boolean flag will be controlled by these two commands.
The sound playback function used in this project is PyGame mixer. It is initialized with:
and playing the sound with:
The sound files used for this project was downloaded from following sources:
Not find face:
The movement control will be affected by the ‘move’ flag. The robot will only move when the flag is true. Also, if no face is detected by the face detection program, the robot will not move.
The part refers to car.py code.
There are 3 programs involved in this project. A script has been generated to start those 3 programs. Besides, to ensure the pigpio demon starting before running any gpio involved code, the script will start the pigpio demon at the beginning.
The system will keep running unless all the three programs stopped. To stop all the 3 programs, a stop script has been generated. First, the PIDs of the involved processes will be found then kill commands will be sent to each of them. As the motors are controlled by the pgpiod, if the last sent instruction for the programs is not stop, the motors will keep running even the programs has been stopped. Instead of turnning off the gpiod, we wrote a simple python code to send stop command to the gpiod at the end of the script.
This part refers to start, stop and stop_motors.py.
For the final testing, we used an iPhone to setup a mobile hotspot and using SSH to connect the Raspberry Pi wirelessly. As the testing of the final system is hard to be demonstrated with data or graphs, the demo video can be served as a good testing process of the system.
The face detection has a decent performance when we changed to detected the distance by face size. The system function well when the tester moved slowly. However, the robot dog will lose track of the face if the tester move rapidly.
The frequency detection module function as we expected when we say ‘go’ or blow the whistle. However, the ‘go’ command sometimes get messed with the surrounding noise as the noise may contain a frequency component which can trigger the frequency detection. The whistle detection worked well as the frequency of the whistle is much higher than the noise.
All the sound has been played correctly with regard to the situation. However, when the volume is high, the ‘dog bark’ may trigger the ‘go’ command. We have to lower the volume of the speaker.
The robot dog can follow the slow-moving tester and respond to the sound command most of the time.
All the planned work and proposed functions has been finished at the end of the project and all the function has been proved to be functional.
The current system using python version of the OpenCV which has lower efficiency. If switch to C++ version, the system should have a better performance. Also, the Cascade kernel used in this project is a trained version for front face. The robot is put on the ground and the user looks down at the robot. Retrain the kernel with this certain relative position may improve the performance of the system.
For the sound recognition part, this system suffers from the low quality of the USB microphone and varies distance. A handheld commander would solve the problem. Also, as the accuracy of the frequency improved, more commands can be added to the system and it would be more interesting.
|Raspberry Pi 3||Amazon||$37.79||1|
|Raspberry Pi Camera||Amazon||$26.55||1|
|Oracle Speaker||Career Fair Gift||Free||1|
|Servo Motor||Borrow from lab||Free||2|
|Robot Frame||Borrow from lab||Free||1|
|Zhuo Chen||Rui Min|
|Background Research||Background Research|
|Overall Hardware Design||Webpage Design|
|Face Detection||Voice Recognition|
|Servo Motor Control||Servo Motor Control|
|Test and Debugging||Test and Debugging|
|Project Report||Project Report|
 Gouyon, F., F. Pachet, and O. Delerue. "Classifying percussive sounds: a matter of zero-crossing rate." Proceedings of the COST G-6 Conference on Digital Audio Effects, Verona, Italy. 2000.
 Parallax Continuous Rotation Servo Motor Datasheet
 pigpio hardware PWM example http://abyz.co.uk/rpi/pigpio/python.html#hardware_PWM
 Face Detection using Haar Cascades
 ECE 5725 Lab 3 Instruction Note
 Install guide: Raspberry Pi 3 + Raspbian Jessie + OpenCV 3 http://www.pyimagesearch.com/2016/04/18/install-guide-raspberry-pi-3-raspbian-jessie-opencv-3/
 [Raspberry Pi] Using PyAudio & External Audio Card for Recording http://raspberrypirecipes.blogspot.com/2014/02/raspberry-pi-using-pyaudio-external.html
We'd like to thank our professor Joseph Skovira for his advice and help not only in class but also in everyday life. And We also want to thank our TAs, Jingyao Ren and Jay Fetter.
Zhuo Chen firstname.lastname@example.org
Rui Min email@example.com