Project Objective & Introduction

One day when Yuchong and Jie were discussing on what they planned to do for their ECE 5725 project, Jie's received a email from Cornell Alert reporting a home burglary where the criminal sneaked into a house. An idea popped in their minds： How about building a home security sytem to protect their homes and loved ones in it?

This system should be powerful enough so that Jie and Yuchong could watch the real time CCTV stream video and audio of their front doors from anywhere. Also, the system should be smart enough to automatically recognize their faces and open the doors while deny the faces of intruders. Even more, this system should enable remote control so that even if Yuchong forgets bringing the key, Jie still could open the door remotely while sitting at Phillips hall.

Home Pi, our project, is a security system desgined for protecting the front doors of our home. This system streams live video and audio CCTV based on Picamera and Microphone to a webserver, which an android phone could access to. This system provides fast face recognition and real-time semantic segmentation based on Tensorflow and OpenCV powered by Coral TPU. An Android application should be able to access the system remotely.

Design

Home Pi is a RPi embedded system apparatus consisting of a user interface, a android phone for remote control, a CCTV and its streaming server, a TPU and a webcam for smooth face recognition and a servo module for door control. Our system enable users to monitor the CCTV remotely via capturing video and audio through a Pi Cam and a microphone. The image and audio could be transmitted to a anrdoid phone, which could remotely control the system, via WiFi. Also, an extremely smooth face recogntion module powered by the powerful USE edge TPU enable face recognition and face registration.

CCTV Server

Our project has a CCTV server streams the live video and audio. The video is captured via a Picamera and the audio is collected via a microphone. The video and audio should be accessed via web browser and android application allowing users to access it conveniently.

TPU & Face Recognition

To enable smooth user face login and fast face recognition, in our project, a USB Edge TPU, a tensor processing unit capable of accelerating machine learning tasks powered by Tensorflow, is integrated into our system.

Android App

This system allows users to control it remotely via Android App. This android application should allow users to login via username and password. The authorized users could access live CCTV, the live semantic segmentation results and also control the door based on socket communication with the system.

Multiprocessing Algorithms

In our project, the RPi has to perform tasks, including face recognition and semantic segementation, video and audio streaming, a multithreading UI, TCP socket communcation and a Flask web application at the same time. To reduce latency, we used Python multi processing to fully take advantage of four cores of RPi.

FSM Control Interface

In this project, a user interface is implemented on PiTFT. To control the work flow of the GUI, a finite state machine is designed. This state machine helps the interface behave correspondingly based on the current state and given user inputs and performs state transitions. The FSM could also help the multithread algorithms perform correctly.

Flask-based Stream Server

Thanks to the powerful coral TPU, real-time semantic segmentation could be achieved in our project. To show the results, a web video server based on Flask was developed to stream the semantic segmentation results to a server, which could be accessed by Android App and web browser.

CCTV Server

Pi Camera Video Stream

Pi camera was installed following this tutorial. First, install the Raspberry Pi Camera by inserting the cable into the Raspberry Pi camera port. Then we ran sudo raspi-config in the terminal to enable the camera. If the camera option is not available, then an update needs to be made, we run sudo apt-get update and sudo apt-get upgrade in this case. The Raspberry Pi needs to be rebooted.
To check if the picamera is successfully installed, we use camera to take photos raspistill -o image.jpg If we could view the image.jpg by running gpicview image.jpg without errors, the picamera is installed perfectly.
We consulted the official handbook tutorial to make the RPi with Picamera as a video stream server. This is a simple HTTP server that achieves significantly higher frame rates than any solution else we tested (mjpg-streamer, Motion program, RPi-Cam-Web-Interface). The code we used could be found here. We set the live video feed to a website served by the RPi http:// IP_Address_Of_Pi:9000. So, we could access the video streaming through web browser on a machine that is connected to the same LAN. The video stream could come with very little latency.

Pi Audio Stream

For audio stream, a USB microphone was used in our project. Once the microphone is plugged in, we load the audio module by typing sudo modprobe snd_bcm2835. And we check if we could use it properly by recording some audio into a file by running: arecord -D plughw:1,0 test.wav. And press CTRL+C when we’ve got enough recording. We play it to check if it works! aplay test.wav. Using the command alsamixer, we could record louder or adjust some parameter and play with the input/output levels of the microphone.
Similar to the video stream, it would be convenient if we stream the audio to a webserver to which the user could access via Android App or browsers. We followed the tutorial tutorial here to make our RPi an Internet radio station to record or play your podcasts.

We set up the streaming station using two packages called DarkIce, a live audio streamer, and Icecast an audio/video streaming media server. The detail of how to install Darkice and Icecast could be found here.
After the Icecast2 installation, we made a Config file for DarkIce by creating a file named darkice.cfg. The content configurations file should be like xxx and we also need to make shell script named darkice.sh so we could start the audio stream service by executing it. After entering sudo service icecast2 start, we could launch the audio stream server by executing the shell script sudo /home/pi/darkice.sh. And we set We set the live audio feed to a website served by the RPi, and the audio could be accessed by visiting the url http:// IP_Address_Of_Pi:8000/rapi.mp3 on browser.

. Github Repo

TPU-Based Face Recognition

Our system will first run semantic segmentation to obtain the area of interest (ROI) that are related to people. Then, the ROI will be put into a classification model to determine whether the current user is registered. The classification results will be sent to the UI via message queue and to determine whether to open the door.

The main benefit of TPU is that we could achieve real-time recognition speed. Even though Raspberry Pi 4B comes with a GPU, this GPU is designed for more general tasks of graphic processing but not specifically designed for tensor processing tasks. Thus, with this Edge USB TPU, the computation performance of a Raspberry Pi on tensor processing tasks is greatly boosted.

Sampling video as dataset, the inference time of the classification for each frame in the video is recorded. Then, we plot the recorded inference times on a bar graph, with y-axis being the inference time in milliseconds which is shown on the comparison between inference times on CPU and TPU, there is a huge difference for TPU and CPU. The inference times for TPU are all below 20 milliseconds with average around 10 milliseconds. These numbers translate to a frame rate of 100 frames per second, which is a lot bigger than most cameras’ frame rate which is usually 30 or 60 fps. As for the CPU, the average inference time lies below 80 milliseconds, which corresponds to just 12.5 frames per second. So using TPU can greatly improve our system’s response time.

Another benefit of using TPU is it can retrain a model on-device very quickly. This feature allows new users to register their faces into the system and at the next boot, our program can retrain the classification model very quickly. For our experiment, re-training on 300 images for around 300 iterations only takes 2 to 3 seconds, and the accuracy of the re-trained model is also very high. In figure 2, we have shown the accuracy and loss over 300 iterations. As shown in the figure, the model quickly learns the images and reach a plateau very early.

Integer rutrum ligula eu dignissim laoreet. Pellentesque venenatis nibh sed tellus faucibus bibendum.

And to illustrate our model’s performance, we have also plotted the confusion matrix for test images. As shown in this confusion matrix, the model can clearly differentiate between Yuchong, Jay and negative.

Github Repo

Android Application

Video and Audio Stream on APP

After setting up the live stream video and audio station on RPi, we need to display the video on the Android phone by using WebView in App's MainActivity.

public class MainActivity extends AppCompatActivity implements SensorEventListener {
        private WebView webview ;
        ...
        @Override
        protected void onCreate (Bundle savedInstanceState) {
        ...
        webview =(WebView)findViewById(R.id.webView);
	// make the webview adapt to the screen
        WebSettings settings = webView.getSettings();
        settings.setUseWideViewPort(true);
        settings.setLoadWithOverviewMode(true);
	// display the video
        webview.setWebViewClient(new WebViewClient());
        webview.getSettings().setJavaScriptEnabled(true);
        webview.getSettings().setDomStorageEnabled(true);
        webview.setOverScrollMode(WebView.OVER_SCROLL_NEVER);
        webview.loadUrl("http://ip_address_of_Rpi:9000");

       }
 }

But WebView only could not play the audio, to get the real-time audio streaming, we still need to include a MediaPlayer in our Android to play the sound from the audio server.


	MediaPlayer mediaPlayer = new MediaPlayer();
        mediaPlayer.setAudioAttributes(
                new AudioAttributes.Builder()
                        .setContentType(AudioAttributes.CONTENT_TYPE_MUSIC)
                        .setUsage(AudioAttributes.USAGE_MEDIA)
                        .build()
        );
        try {
            mediaPlayer.setDataSource("http://ip_address_of_Rpi:8000/rapi.mp3");// the url for the sound data source
            mediaPlayer.prepare();
        } catch (IOException e) {
            e.printStackTrace();
        }
        //  might take long! (for buffering, etc)
        mediaPlayer.start();

TCP SOCKET COMMUNICATION (Application End)

In our project, we have "Admin Login" section in our interface. When entering this module, the UI would ask the user (the admins or the app holders) to login by entering password and username on the APP. After a successgul login, the UI (PiTFT) would display the username of the user who have logged in the system. And the user who has successfully logged in should have the right to open the door and watch the CCTV live. Thus, a TCP socket communication module should also be deployed on both the Android APP and RPi server end. The following code is

class Sender extends AsyncTask {
        Socket s;// socket connected to RPi
        PrintWriter pw;
        String msg;
        String type;
        BufferedReader bufferedReader;
        @SuppressLint("SetTextI18n")
        @Override
        protected Void doInBackground(Void...params){
            try {
                s = new Socket("ip address of RPi", 7000);
			// send message to RPi Server
                pw = new PrintWriter(s.getOutputStream());
                pw.write(msg);
                pw.flush();
                bufferedReader = new BufferedReader(new InputStreamReader(s.getInputStream()));
			// receive message from RPi Server
                String msg2 = bufferedReader.readLine();

                switch (type) {
                    case "login":
                        if (msg2!=null && msg2.equals("Success")) {
                            success = true;
						// login successfully
                            statusTextView.setText("Login Success");
                           showToast("Login Success");
                        }else{
						// login fail
                            statusTextView.setText("Not Login In");
                            showToast("Login Fail");
                        }
                        break;
                    case "doorOpen":
                        if (msg2!=null && msg2.equals("Success")) {
                            doorOpen = true;
                            doorStatusTextView.setText("Door Open");
                        }
                }

             ....
        }
    }

User Login Activity

Before allowing the user to see the CCTV live stream and control our system, a login activity must be passed to verify the user identity. Thus, a user login activity was first launched on our App. The login activity would allow users to fill in the username, password, IP address of RPi, and the designated port number.

public class StartActivity extends AppCompatActivity {

	private EditText textViewUserName;
    private EditText textViewPassword;
	...

    @Override
    protected void onCreate(Bundle savedInstanceState) {
	....
	// login button
        Button buttonEnter = (Button) findViewById(R.id.enter);
        buttonEnter.setOnClickListener(

                (x) -> {
                    uName = textViewUserName.getText().toString();
                    pwd = textViewPassword.getText().toString();
                    host = textViewIP.getText().toString();
                    TCPport = Integer.valueOf(textViewPort.getText().toString());

                    if (uName == null || uName.length() == 0 || pwd == null || pwd.length() == 0) {
                        showToast("please fill all blanks");
                        return;
                    }else{
                        StartActivity.Sender sender = new  StartActivity.Sender();
                        sender.host = host;
                        sender.port = TCPport;
                        sender.msg = "Check:"+uName+":"+pwd;
                        sender.type = "login";
                        sender.execute();
                        long timeInit = System.currentTimeMillis();

			// wait for the login result from server
                        while(System.currentTimeMillis() - timeInit<3000){
                            if (success){
                                Intent startMainActivityIntent = new Intent(StartActivity.this, MainActivity.class);
                                startActivity(startMainActivityIntent);
                                finish();
                            }
                        }
                    }
                });

Mutliprocessing Algorithms

To achieve our goal of live video and audio streaming, face recognition, semantic segementation and flask stream, we must decrease the latency and running time of every part as fast as possible. At the beginning, we tried to launch every thing in one bash file. This delay time and the latency is too long. To reduce the latency, one solution is to parallel process different tasks to take advantage of four cores in R-Pi. In python, we could use the multiprocessing library to fully take advantage of four cores of R-Pi. In total, we have five process assigned to four cores.

Learn More

The first core is assgined to stream the video and audio for the CCTV. Since we do not need the feedback from CCTV to control our state machine, we do not need to pass a queue into this process for interprocess communication. Even more, this core is assigned to a Flask Web application process, which is consumer process consuming the frames generated by sementic segmentation process. Thus, a frame_queue is passed into it to received processed semantic segmentation frames from the face recognition module.
The second core is assgined to launch the TCP socket server to receives socket connection and communicates with Android APP. For the user login module, the main interface needs user login information and display the login user. Thus,this is a producer process producing TCP connection results and the login_queue is passed into for processing communication.
The third core is assigned to launch the face registration and face recognition module. By assigning a core to this computation-heavy module, the system could achieve higher frame per second and also receive face recognition result in only 20 ms. For this module, this is a producer process producing face recognition results and semantic segmentation frames. Thus, a face_recognition_queue for passing recognition results and a frame_queue for sending frames to Flask server were passed into.
The forth core is used to launch the main user interface. This UI screen is refreshed every 0.02 second, several GPIO callback and several threads were running at the same time so it needs the multiprocessing algorithm to ensure the UI smooth response. This is a consumer process, this process receives the face recognition results and the login results from other processes. Thus, the face_recognition_queue and login_queue were passed into to allow the UI get the results from APP login and face recognition.

The above describes the tasks that every core executes. But we still need to figure out how each process could communicate with each other via queues. In our project, the producer processes produce the results and frames. If the producer ceasely put results into the queue, unexpected results would appear. For exmaple, if the face recognition module continuously feeds results into the queue and the UI only asks for results occasionally, the recognition results would be piled up in the queue. The next time when the UI requests the request, it could only get = the outdated piled up results from the front of the queue since queue is First-In-First-Out. In order to resolve this problems, we have two solutions: 1. the producers examines the status of the message queue and clear the outdated results; 2. use LIFO (Last-In-First-Out) queue so that the consumer could always get the newest results.
After weighing the pros and cons, we decided to take the first option that the producers clear the outdated results in time since it hep could lower the burden on the system memory and resources. This is a classic producer-consumer design pattern where queue serves as the message pipe and the cache, and the producer process needs to match the speed of the consumer process.

FSM Control User Interface

In our project, apart from multiprocessing, the user interface and the system need to process several complex tasks at the same time. A finite state machine is designed to control the logic of multi thread activities and all functional modules. The user interface would behave accordingly given user inputs and performs state transitions.

When system starts, the system would first assgin cores to each modules, and initialize the message queue for process communication. Then the state machine would enter the MAIN state where the main interface is displayed. When user input is given (For example, the user chooses "Admin Login", "Face Login" or "Face Registration"), the state machine would transit to the corresponding state.
When state machine in "Admin Login" function, the state machine would start a new thread trying to get user login results from TCP module modules via message queue. When the state machine receives positive results, the state machine would transit to "User Welcome" and will then return to "MAIN" state.
When state machine in "Face Login" function, the state machine would start a new thread trying to get results from Face Recognition module modules via message queue. When user press "Confirm", the state machine would shift to "Check" and will decide if open the door based on recognition results. If it is the intended users, a new thread would be started to open the door asynchronously.
When state machine in "Face Registration" function, the state machine would start a new thread to call a bash shell script taking photos of the faces of users and save them to a designated location. When this process finish, the state machine would return to "FINISH" and then "MAIN" automatically.

Flask based Semantic Segmentation Result Streaming Server

Thanks to the powerful coral TPU, real-time semantic segmentation could be achieved in our project. Thus, in order to get more intuitive results of the face recognition module, we decided to stream the semantic segmemantation results. We evaluated several solutions:

TCP socket image transfer, but we need to view it through VNC, which is not very ideal in our project, since we do not want to ssh login the RPi when system is running.
RPi-Cam-Web-Interface and mjpg-streamer allows for high frame rate realtime image transfer, but don’t allow for customization
Django could be ideal but it is too heavy for our project, since we do not need to process complex requests.

At last, we settled on Flask, which is a lightweight Python web backend framework. Then, a web video server based on Flask was developed to stream the semantic segmentation results to a server, which could be accessed by Android App and web browser.

Testing & Issues

In our project, we have tracked issues we met based on timeline. The below is a timeline recording the issues we discovered in our project and also the way how we resolve those problems.

Broken Pi Cam

At first, we have tested every hardware we have got in our project. And we have found out that the PiCam turned out to be a little fragile.

Python Module Installation

For the installation of python modules, since the Raspibian system, we need to pay special attention to users and python version. In our project, we launch our programs with sudo python3. Thus, we need to use sudo pip3 install to make sure every module is installed under super user and Python3.

Multiprocessing Communication

Since mutliprocesses need to communicate with each other, we have researched on several solutions of multiprocessing communication, such as FIFO, Queue. We decided to use Queue and implement the multiprocessing algorithms, since there would be no need to interact with OS frequently thus to increase the efficiency.

Multiple Camera Conflicts

In our project, we used two cameras. However, the camera index is randomly assigned but OpenCV requires the specific number of WebCam is assigned to. In order to avoid this randomness, we have designed a method to confirm the number which is assigned to web cam using commands v4l2-ctl -d /dev/videoN -D

Clear Text not permitted in Android

When we developed android application viewing the live CCTV, the communication protocol is based on HTTP which is unsecured and not permitted in Android 8. Also, we also need to configure Internet before launching the App. Thus, we need to include android:usesCleartextTraffic="true" and in the AndroidManifest.xml file.

Internet Conncetion when Initializing

Since our project highly relies on Internet connection, we have included a piece of shell script code to ensure Pi has connected to Wifi before launching the main program.

Servo Issues

For this project, we also use a continuous rotation servo to control a door. However, the servo is not calibrated and it continues to rotate at a slow speed when given a stop signal. To solve it, by looking through the servo’s datasheet, we found that we can calibrate the servo by adjusting the potentiometer. So we calibrated the servo by giving a stop signal and trying to find the position that the servo moves the least.

Memory Allocation

The on-board GPU on Pi may have run out of memory space since we used multiple cameras. Upon investigation, we have found that Pi actually allows users to determine the memory allocation to GPU. We have increased default memory 128 MB to 256 MB.

Test Videos

We followed an incremental testing approach in our development. This enabled our team to parallelize our work seperately, while ensuring each individual of us develop fully functional component before being integrated into the system-at-large. The below videos records how we test each module, and integrate them into a system.

Face Recognition & Segmentation Results

User Interface Test

Android Phone Test

CCTV Test on Android and Browser

About Us

Jie He (jh2735)

Android, CCTV, User Interface, Multiprocessing Algorithm, Flask, Website

Yuchong Geng (yg534)

Servo Control, Face Recognition, Semantic Segmentation, TPU

Result & Conclusion

For this assignment we have successfully designed and implemented a Smart Home Entry system. We have used many different tools and techniques to build our system including but not limited to Android Studio, Flask, TPU, multi-processing, TCP/IP socket, FSM and etc. And we think our system performs well as we expected. We can also further improve our face recognition module by using diverse images to train our model. After implementing this system and spending many times on debugging errors we have made, we have grabbed a lot of knowledge about embedded systems that cannot be learned just by studying textbooks. And we want to use this opportunity to thank our amazing instructor Joseph Skovira and dedicated TAs for their generous support, detailed guidance and great passion into this course.

Github Repo

Future Work

Even though we have achieved all of the functions that we have designed, we do think that there are many places for improvement. And they are divided into three parts: network latency, UI design and diverse dataset for our classification model. Network latency is a main factor that affects our system’s user experience. Our system streams two videos into the network at the same time and they may suffer from high latency and thus are not able to generate smooth and stable streams. We think there are some ways to solve this problem. For example, a resolution adjuster can be implemented so when the system senses a high latency, it will automatically reduce the video resolution. Another improvement can be made on our UI design. Specifically, we have two places that have UI interfaces, the PiTFT display screen and our Android App. When we were designing the system, our top priority was to make sure all the codes work perfectly, so we did not have many UI designs on our Apps and display screen. We think a more user friendly interface would make our system more attractive. Lastly, we think we can further improve our classification model’s performance by re-training it on a more diverse dataset. Currently, our dataset only contains Yuchong and Jay’s images as well as their rooms’ images, so we think our model’s great performance can only be realized at our rooms. A more diverse dataset can make sure our model has a “global” understanding of its task.

Parts List

Raspberry Pi Camera V2 $25.00
Microphone $8
Android Phone - from student's MEng Project
TPU - from student's MEng Project
Logitech WebCam - from student's
Raspberry Pi, Resistors, Wires, Servo,- Provided in lab

Home Pi Security System

Home Pi Security System

Home Pi Security System

Project Objective & Introduction

Design

CCTV Server

TPU & Face Recognition

Android App

Multiprocessing Algorithms

FSM Control Interface

Flask-based Stream Server

CCTV Server

Pi Camera Video Stream

Pi Audio Stream

TPU-Based Face Recognition

Android Application

Video and Audio Stream on APP

TCP SOCKET COMMUNICATION (Application End)

User Login Activity

Mutliprocessing Algorithms

FSM Control User Interface

Flask based Semantic Segmentation Result Streaming Server

Testing & Issues

Broken Pi Cam

Python Module Installation

Multiprocessing Communication

Multiple Camera Conflicts

Clear Text not permitted in Android

Internet Conncetion when Initializing

Servo Issues

Memory Allocation

Test Videos

Face Recognition & Segmentation Results

User Interface Test

Android Phone Test

CCTV Test on Android and Browser

About Us

Jie He (jh2735)

Yuchong Geng (yg534)

Result & Conclusion

Future Work

Parts List

Total: $33

References