Paolo Arguelles (pa394@cornell.edu)
Mike DiDomenico (md848@cornell.edu)
Previous projects that have implemented a gesture-based interface use smart gloves and colored tape to make the fingers easier to track. We wanted to be able to realize gesture based control simply, and in the most organic way possible.
For us, this meant tracking the actual hand with no additional hardware. We wanted a similar kind of user experience as the Leap Motion Controller, the current industry standard in gesture control. One way in which our system differs from the Leap Motion Controller is that none of the intensive processing is done on the actual device; rather it is consigned to a computer whose minimum specifications are consistent with the performace of an Intel I3 processor.
We wanted to do all the processing in situ. We used the simplest setup we could think of - just a single Raspberry Pi and camera module - and saw where that took us. We added a PiTFT to act as a convenient user interface, and to demonstrate the computational abiity of the Raspberry Pi by showing the output of the realtime image processing.
We also wanted to leverage the Bluetooth capability of the Raspberry Pi. Our project tricks a host computer into thinking that the Raspberry Pi is a Bluetooth mouse. Using this scheme, the mouse on any Bluetooth-enabled desktop computer or laptop can be controlled using hand movements.
Towards a more organic setup, we wanted to do minimize background control. A simple solution to avoid having to process a noisy background is to point the camera upwards. Weand the image of the blank ceiling is stored, and removed from all of the successive frames. Each frame is processed to detect the hand, and which of two gestures that the user is making. The motion of the hand, and the gestures are analyzed to send Bluetooth mouse commands to a remote machine.
When starting the project, we experimented with OpenCV on both our laptops and on the Raspberry Pi. We began on the Pi by simply reading frames from the PiCam.
#Import required libraries for PiCamera from picamera.array import PiRGBArray from picamera import PiCamera #Initialize Picam, turn off automatic exposure adjustments camera = PiCamera() camera.resolution = (160*resScale, 120*resScale) camera.framerate = 30 camera.exposure_mode = 'off' rawCapture = PiRGBArray(camera, size=(160*resScale, 120*resScale)) #Capture frames from PiCamera for frame in camera.capture_continuous(rawCapture, format="bgr", use_video_port=True): #MAIN CODE GOES IN FOR LOOP# cap.release()
M = cv2.moments(frame) if M['m00'] != 0: cx = int(M['m10']/M['m00']) cy = int(M['m01']/M['m00'])
f = np.zeros((frame.shape[0],frame.shape[1],3),original.dtype) f[:,:,0] = frame f[:,:,1] = frame f[:,:,2] = frame _,contours,h = cv2.findContours(frame, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
cnts = np.vstack([contours[i] for i in range(len(contours))]) hull = cv2.convexHull(cnts) defectHull = cv2.convexHull(cnts,returnPoints=False) defects = cv2.convexityDefects(cnts, defectHull) f = cv2.drawContours(f, [hull], 0, hullColor, 5)
matches = cv2.matchShapes(hull,matchContour[1],cv2.CONTOURS_MATCH_I2,0)
matches
is printed, the console outputs a floating point number characterizing the similarity of the two polygons. Numbers closer to 0 indicate increasing similarity.
Convex hull bounding a "move mouse" gesture
cmake
command:
$ cd ~/opencv-3.3.0/ $ mkdir build $ cd build $ cmake -D CMAKE_BUILD_TYPE=RELEASE \ -D CMAKE_INSTALL_PREFIX=/usr/local \ -D OPENCV_EXTRA_MODULES_PATH=~/opencv_contrib-3.3.0/modules \ -D ENABLE_NEON=ON \ -D ENABLE_VFPV3=ON \ -D BUILD_TESTS=OFF \ -D INSTALL_PYTHON_EXAMPLES=OFF \ -D BUILD_EXAMPLES=OFF ..
ENABLE_NEON
and ENABLE_VFPV3
flags are flipped on.
Our final demonstration system was implemented with a Raspberry Pi, PiCam, and PiTFT. When the user powers on the system, the applications starts immediately, and performs background subtraction to compensate for whatever background it is facing.
SET CLICK GESTURE (BUTTON 22)
- The user can press button 22 on the PiTFT to designate a gesture as the "click" or "mouse down" state. Once set, the convex hull will turn green to indicate that a click state has been detected.
The convex hull turns green upon click gesture detection
RESET BACKGROUND SUBTRACTION (BUTTON 23)
- If the camera placement has been perturbed since program start, the image on the PiTFT may show residual edges consistent with the background. Pressing button 23 on the PiTFT will reset the background image based on the new camera placement. Users should take care not to let any part of their body enter the frame as the new background is recorded.
HOLD CURSOR (BUTTON 17)
- If the user decides to momentarily transfer control of their mouse back to the trackpad (and temporarily disengage the mouse emulation activity from the Raspberry Pi), we have implemented button 17 as a "hold" button. Pressing this button once toggles a "hold" state, which will allow the user to control their computer normally. Pressing the button again brings the system out of its "held" state. It should be noted that the system still operates normally while in its hold state; all the mouse emulation activities are simply overidden as if no motion is being detected.
BAILOUT (BUTTON 27)
- The script will exit upon button press.
The scope of our project changed, gradually but drastically, over the course of the project. The primary shift occurred when we changed the direction of the camera. At that point in the project, we decided to use a static background (the ceiling), instead of a variable background containing the user’s face. This decision was largely made because it made it much easier to distinguish the user’s hand from the background each image frame. It also meant that we could move forward with other methods of hand tracking and identification, other than using the depth of the hand. Because of this change, we did not meet our initial goal of tracking the hand in a dynamic environment, but still met the goal of hand tracking. We also met our goal of simple gesture detection. We are able to detect whether the user is making one of two gestures. Our project was successful in meeting our outlined objectives, and running at a reasonable speed.
The main performance issue we encountered (throughout the project) was image noise. We found it somewhat difficult to denoise the image spatially, but had some success. Toward the end of the project, when we had built a working system, we had difficulty with temporal noise. Small changes from one frame to another could cause large mouse movements, as well as “phantom” clicking or unclicking. It was not possible to implement any normal filtering operations, since they would have introduced delay into system, would have diminished the user experience. We attempted to use temporal filtering with some success, but with some delay. This problem resulting in a user interface that was not as smooth as we would have hoped.
Upward camera placement resulted in easier background removal
pa394@cornell.edu
Paolo compiled and implemented an optimized version of OpenCV to achieve an estimated ~30% speedup and worked with Mike to debug some SYMLINK problems as a result of the compilation. He incorporated absolute mouse tracking emulation on Raspbian usingxdotool
, and wrote a Python script using PyGame to demonstrate centroid tracking to move an on-screen sprite. He worked with Mike to display the filtered image and hull on the PiTFT external display, and wrote code to map background subtraction, mouse hold, click gesture set, and bailout functionality on the PiTFT's four buttons. He also investigated potential for GPU speedup, reassigning some blurring operations to take place on the PiCam to hardware to reduce code runtime. Before we had finalized our hardware to a single PiCam, Paolo wrote code to interface two USB cameras with the Raspberry Pi and OpenCV, and worked with Mike to attempt to implement a stereo camera setup, an approach we ended up scrapping in favor of a single camera setup.
md848@cornell.edu
Mike started working on this project by experimenting with different edge detection methods. Mike explored different filters and techniques, as well as different parameters and thresholds. Mike also experimented with many different variations of hand detection algorithms, including calibration to the skin color of the user using region growing, and generating edge images with HSV format images instead of RGB format images. Mike also wrote some code, and worked with Paolo on different depth detection methods. Mike implemented center of mass object tracking, and worked with Paolo to develop a good method of background subtraction. Paolo and Mike worked together to build and compile the optimized OpenCV library, and Mike modified the Bluetooth keyboard emulation project to instead emulate a Bluetooth mouse with commands sent from an external Python script.
#Stop the background process sudo /etc/init.d/bluetooth stop # Turn on Bluetooth sudo hciconfig hcio up # Update mac address #./updateMac.sh #Update Name #./updateName.sh RPi_Mouse #Get current Path export C_PATH=$(pwd) #Create Tmux session tmux has-session -t mlabviet if [ $? != 0 ]; then echo "starting tmux commands" tmux new-session -s mlabviet -n os -d tmux split-window -v -t mlabviet:os.0 tmux split-window -v -t mlabviet:os.1 tmux send-keys -t mlabviet:os.0 'cd $C_PATH && sudo /usr/sbin/bluetoothd --nodetach --debug -p time ' C-m tmux send-keys -t mlabviet:os.1 'cd $C_PATH/server && sudo python btk_mouse.py ' C-m tmux send-keys -t mlabviet:os.2 'cd $C_PATH && sudo /usr/bin/bluetoothctl' C-m echo "tmux done" fi
# source: # https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_gui/py_video_display/py_video_display.html # import the necessary packages from picamera.array import PiRGBArray from picamera import PiCamera import RPi.GPIO as GPIO import time import cv2 import numpy as np import math import os # for OS calls import pygame # Import pygame graphics library import dbus import dbus.service import dbus.mainloop.glib resScale = 1 postthreshold = 200 matchThreshold = 2 cxp = 0 cyp = 0 # setup pygame drivers and screen if True: os.putenv('SDL_VIDEODRIVER', 'fbcon') # Display on piTFT os.putenv('SDL_FBDEV', '/dev/fb1') #os.putenv('SDL_MOUSEDRV', 'TSLIB') # Track mouse clicks on piTFT #os.putenv('SDL_MOUSEDEV', '/dev/input/touchscreen') # initialize the camera and grab a reference to the raw camera capture camera = PiCamera() camera.resolution = (160*resScale, 120*resScale) camera.framerate = 30 camera.exposure_mode = 'off' rawCapture = PiRGBArray(camera, size=(160*resScale, 120*resScale)) # allow the camera to warmup time.sleep(0.1) key = None bg = None portFcn = [0, 0, 0] mouseEm = np.asarray([0, 0, 0]) prevMouseEm = np.asarray([0, 0, 0]) frameCount = 0 thrs = 0.09 et = 0 bus = dbus.SystemBus() btkservice = bus.get_object('org.yaptb.btkbservice','/org/yaptb/btkbservice') dev = dbus.Interface(btkservice,'org.yaptb.btkbservice') time.sleep(2) def to_binary(i): if i >= 0: if i > 127: i = 127 i = i & 0xFF else: if i < -127: i = -127 i = abs(i) & 0xFF i = ~i + 1 i = i & 0xFF return i def send_move(dev, buttons, x, y): x = to_binary(int(x)) y = to_binary(int(y)) wheel = 0 dev.send_array(0,[0xA1,0x01, buttons, x, y, wheel, 0x00, 0x00]) def send_state(dev, buttons, x, y): while abs(x) > 127 or abs(y) > 127: if abs(x) > 127: x -= x/abs(x) * 127 if abs(y) > 127: y -= y/abs(y) * 127 send_move(dev,buttons,x,y) send_move(dev,buttons,x,y) def GPIO17_callback(channel): portFcn[0] = not portFcn[0] print("HOLD CURSOR") def GPIO22_callback(channel): portFcn[1] = 1 print("SET CLICK GESTURE") def GPIO23_callback(channel): portFcn[2] = 1 def GPIO27_callback(channel): print("QUITTING PROGRAM") exit() # INITIALIZE GPIO GPIO.setmode(GPIO.BCM) pull_up_ports = [17,22,23,27] quit_port = 27 for port in pull_up_ports: GPIO.setup(port, GPIO.IN,pull_up_down=GPIO.PUD_UP) GPIO.add_event_detect(17, GPIO.FALLING, callback=GPIO17_callback, bouncetime=300) GPIO.add_event_detect(22, GPIO.FALLING, callback=GPIO22_callback, bouncetime=300) GPIO.add_event_detect(23, GPIO.FALLING, callback=GPIO23_callback, bouncetime=300) GPIO.add_event_detect(27, GPIO.FALLING, callback=GPIO27_callback, bouncetime=300) # INITIALIZE PYGAME STUFF pygame.init() clock = pygame.time.Clock() size = width, height = 320,240 black = 0,0,0 screen = pygame.display.set_mode(size, pygame.HWSURFACE) startTime = time.time() pygame.mouse.set_visible( False ) def edges(frame, thresh): sobelx = cv2.Sobel(frame, cv2.CV_32F, 1, 0, ksize=1) sobely = cv2.Sobel(frame, cv2.CV_32F, 0, 1, ksize=1) mag = np.power(np.power(sobelx,2) + np.power(sobely,2),1/2) # processing on edge image frame = cv2.blur(mag,(3,3)) #frame = cv2.medianBlur(frame5) # thresholding mm = (np.amax(frame) * thresh) frame = (mm < frame) frame = np.float32(frame) return frame def center_of_mass(img): h = img.shape[0] w = img.shape[1] mx = np.amax(img) mn = np.amin(img) yc = 0 xc = 0 total = 0 for y in range(h): for x in range(w): v = img[y,x] if v > 0: yc += y xc += x total += 1 if total == 0: yy = 0 xx = 0 else: yy = int(yc/total) xx = int(xc/total) return yy,xx bg = None matchContour = [None] * 10 nbg = 1 Start = 1 hullColor = (0,0,255) cx = 0 cy = 0 mouseL_len = 3 mouseL = np.asarray([[0,0,0]] * mouseL_len) for frame in camera.capture_continuous(rawCapture, format="bgr", use_video_port=True): wk = cv2.waitKey(1) frame = frame.array original = np.copy(frame) frame = cv2.blur(frame,(7,7)) # Our operations on the frame come here frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) thrs = 0.09 frame = edges(frame, thrs) frame = (frame - np.amin(frame))/(np.amax(frame)-np.amin(frame)) frame[frame < 0.1] = 0 if Start: bg = frame nbg = 1 Start = 0 if portFcn[2]: bg = frame nbg = 1 print("BACKGROUND SUBTRACTED") portFcn[2] = 0 if wk & 0xFF == ord('n'): bg = frame/(nbg+1) + bg*nbg/(nbg+1) nbg += 1 if wk & 0xFF == ord('q'): break if type(bg) != type(None): frame = frame - bg et = 0.5 frame[frame < et] = 0 frame = frame * 255 frame = frame.astype(np.uint8) kernel = np.ones((3,3),np.uint8) kernel[0,0] = 0 kernel[0,2] = 0 kernel[2,0] = 0 kernel[2,2] = 0 frame = cv2.erode(frame,kernel, iterations=1) frame = cv2.blur(frame,(3,3)) frame[frame < postthreshold] = 0 f = np.zeros((frame.shape[0],frame.shape[1],3),original.dtype) f[:,:,0] = frame f[:,:,1] = frame f[:,:,2] = frame _,contours,h = cv2.findContours(frame, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE) if len(contours) > 0: cnts = np.vstack([contours[i] for i in range(len(contours))]) hull = cv2.convexHull(cnts) defectHull = cv2.convexHull(cnts,returnPoints=False) defects = cv2.convexityDefects(cnts, defectHull) f = cv2.drawContours(f, [hull], 0, hullColor, 5) dists = [] if chr(wk & 0xFF) in '12': matchContour[int(chr(wk&0xFF))] = np.copy(hull) #if portFcn[0]: #matchContour[0] = np.copy(hull) #portFcn[0] = 0 if portFcn[1]: matchContour[1] = np.copy(hull) portFcn[1] = 0 if type(hull) != type(None): matches = cv2.matchShapes(hull,matchContour[1],cv2.CONTOURS_MATCH_I2,0) print(matches) prevMouseEm[0] = mouseEm[0] if matches < matchThreshold: print("DRAG") mouseEm[0] = 1 else: mouseEm[0] = 0 M = cv2.moments(frame) if M['m00'] != 0: cx = int(M['m10']/M['m00']) cy = int(M['m01']/M['m00']) # display frame f = cv2.resize(f, (160*4,120*4), fx=0, fy=0, interpolation = cv2.INTER_NEAREST) cv2.imwrite('tmp.jpg',f) f = pygame.image.load('tmp.jpg') f = pygame.transform.scale(f, (320, 240)) f = pygame.transform.flip(f,0,1) rawCapture.truncate(0) screen.blit(f, [0,0]) pygame.display.flip() if portFcn[0] == 1: continue #Check jump if abs(prevMouseEm[2]) > 100 or abs(prevMouseEm[1]) > 100 or prevMouseEm[0] != mouseEm[0] or len(contours) == 0: rx = 0 ry = 0 else: rx = cx - cxp ry = cy - cyp if abs(rx) > 100 or abs(ry) > 100: rx = 0 ry = 0 cxp = cx cyp = cy prevMouseEm[1] = mouseEm[1] prevMouseEm[2] = mouseEm[2] if rx != 0: #mouseEm[1] = abs(rx**2.5)*(rx/abs(rx)) mouseEm[1] = rx*15 else: mouseEm[1] = 0 if ry != 0: #mouseEm[2] = -abs(ry**2.5)*(ry/abs(ry)) mouseEm[2] = -ry*30 else: mouseEm[2] = 0 mouseL[1:,:] = mouseL[0:(mouseL_len-1),:] mouseL[0,:] = mouseEm[:] print(mouseL) mouseEm1 = np.mean(mouseL, axis=0).astype(np.int16) if len(np.unique(mouseL[:,0])) != 1: print('CLICKING') mouseEm1[1] = 0 mouseEm1[2] = 0 if int(mouseEm1[0]) == 1: hullColor = (0,255,0) else: hullColor = (0,0,255) # Send the compiled mouse emulation information over the Bluetooth link send_state(dev, mouseEm1[0], mouseEm1[1], mouseEm1[2]) # Release the capture cap.release() cv2.destroyAllWindows()