A Project By Minze Mo, Ramita Pinsuwannakub, and Wenjie Cao
Our claw machine, which can catch dolls inside, has three mode. The first mode is the hand mode. In this mode, a user controls lever switches to move the claw. Control signal sends to Raspberry Pi and Raspberry Pi performs movement accordingly. The second mode is the voice mode. In this mode, a user controls the claw by voice. Raspberry records the input voice, performs speed-to-text, and moves the claw in specific direction and duration accordingly. The last mode is auto mode. In this mode, a user selects a doll on the piTFT touchscreen. Raspberry Pi performs real-time object detection to locate the selected doll. A close-loop control is performed to move the claw to that location and captures the doll.
The goal of this project is to build a claw machine and extend the functions of it. The claw machine has three mode, hand mode, voice mode and auto mode. In hand mode, the claw machine is controlled by lever switches. In voice mode, the claw machine is controlled by voice. In auto mode, the claw machine can capture the doll specified by a user.
The main hardware part is the machine which conststs of 3 lever switches, 3 DC motors, and 2 limit-switches. Since the DC motor use 4.5V to move, the Raspberry Pi can't run the motor with 3.3V. We used L293D (motor driver IC) to run these motors and received the signal from Raspberry Pi. Moreover, we connected diodes between the DC motor and 4.5V or Ground.
The claw Machine
From the top view point, there are 4 limit switches, left-end, right-end, front-end, and back-end.
These small black switches will stop the moving motor in that direction. Two of them limit left and right and another two limit forward and backward.Each of the pairs connected to one motor.
From the original machine connection, there are two limit switches connected to a motor.
We also reconnected them because we can't control the movement of the motor box from earlier connection. We connect limit switches to high voltage and DC motors to L293D.
piTFT and GPIO pins
There are two pins which we used affect the performance of piTFT. There are GPIO 24 and GPIO 25. We have reached the limit of GPIO pins from Raspberry Pi.
There are 14 pins we can use as GPIO pins. Because it is limit, we don't have bail-out button for quit the piTFT touchscreen and the enable pin for L293D. We set this pin as always enable.
By hand mode
This mode require all hareware parts and run on the codes in software part. A user can catch a doll the same action as in the arcade.
By voice mode
We use usb microphone and plug with the Raspberry Pi.
We use Pi Camera to capture the video inside the machine. The location of the camera is beneath the motor box and face toward the ground.
This is the chosen spot. From this view, we can see half of the frame because the claw block another half. But it is enough to search a doll.
There are three main functions in our project: control the claw machine manually, voice control and auto control. There is a mode selection menu and users can select the desired mode. trol
The hand mode allows a user to play the claw machine using three lever switches. Each lever switch corresponds to two directions, left/right, forward/backward, up/down.
When the claw machine touches the limit switches on each directions, it will stop. Three lever switches connect the Raspberry Pi. As the Raspberry Pi receives input signal from a lever switch,
it outputs signal to corresponding GPIO pins and perform a corresponding movement.
The challenge in this part is that the left side and right side shares a limit switch and so do it for the front end and back end. When the claw machine touches the limit switch for left/right and it stops,
we cannot know it reaches which end and we don't know which direction to move at the next iteration. We design a state machine to fix this issue. Taking direction left /right as example.
There are three states, left limit, intermediate and right limit. When the machine in the left limit state, which means it already reaches the left end and it will ignore the 'moving left' signals; If the 'moving right' signals come in, it will change to intermediate state and moving to the right;
When the machine is in the intermediate state, it can either goes to left or right; If the current state is right limit, which means the machine already reach the right end and it will ignore the 'moving right' signals and only can go to left and change the state to intermediate when 'left' signals in.
Following picture shows the state transition between the left and right.
Similarly, there are three states in the forward and backward directions and they work in the same state machine as what we define on left and right. Following is the state diagram for backward and forward.
In general, the Raspberry Pi receives signals from lever switches, recognize the direction, and then output the corresponding movement by output signals to specific GPIO pins. The claw is initialized at left and backward side, which is at state left limit and backward limit. As a user move the claw, the state transit according to state transition graphs. Any unpermitted movements would not be performed in this case and the claw machine keeps stopping.
Voice Control Mode
The claw Pi allows users to use voice command to control its movement. We use USB microphone to record the speech. Users need to press the 'record' button to start recording. When the button is pressed, the program will send a "arecord -D plughw:1,0 -d 3 --rate 44100 --format S16_LE -c1 out_123_machine.wav" command, which is a Linux command and is to record the voice from the microphone by the terminal and save it as a 'wav' audio file.
Users can say anything they want. Even though there is no requested format for the voice command, for a valid one, it must contain a direction word, like left, right, back/backward, forward, up and down. Besides, users can specify the moving duration. This is optional because for each valid voice command, it has a default moving duration which is 2 seconds. The customize moving duration should range from 1 second to 7 seconds, which is inclusively.Examples of valid voice commands: "go right for 5 seconds", "go back", "go left for 1 second".
We use the wit.ai library for the speech recognition. We already get a valid audio file from the previous step. We use function post() to send a speech recognition request to the website https://api.wit.ai/speech and use another function json.load() to get the transformed result. Here are some examples of the returning results, the u'_text' field is the recognition result. We extract the direction word and specific moving time (if there is) and pass them to the relevant Ge claw machine.
Following are the steps in voice control part: Pressing 'record' button to start recording; recording; show what users said on the piTFT and the claw machine does the correspondinthe same time.
The last mode is auto mode. In this mode, a user choose a doll, and the claw machine capture
the specific doll automatically. In order to detect and locate the position of an object, a Pi camera is used to capture real time image and
opencv is used to perform image processing.We try to set up the Pi camera in different location, top-left corner of the claw machine or top of
the claw or outside the claw machine. We decided to attach the Pi camera at the top of the claw base on three reasons. Although the camera fixed
at the corner can capture the whole box of the claw machine, it still has some blind spot covered by the claw. Moreover, the camera cannot move
with the claw. If the camera is attached at the top of the claw, the camera can move with the claw, which give us some feedback of the location of t
he doll, and allows us to do a closed-loop control. But in this case, due to the position of the claw, the camera can only detect half of the image
and it cannot detect the whole box of the claw machine in one image.Figure 24 shows the position of the camera.The camera is attached with the claw,
whichmove with the law.
There are four The claw is initially at the left and backward position of the machine, which has the
state left limit and backward limit as we mentioned before. The first step is to move the claw to the middle of forward and backward axis. The reason
for this is that mid point of forward and backward axis makes the image captured by the camera symmetric, which avoid any unnecessary pixel transforma
tion and some extreme situation such that the object located at the edge of the image, which potentially increase the detection failure.It is import to
note that the orientation of the camera attached to the claw is different from the frame of the claw machine. The camera is 90 degree clockwise different
from the claw frame. All the frame stated below has already transformed rame.
The second step is to search the specifil doll. The claw moves to the right and perform a real time object
detection. Our initial plan is to use SIFT to do template matching. SIFT stands for scale-invariant feature transform. Given a reference image, it transform
the image into feature vector, which is invariant to scale, orientation and partially invariant to illumination changes. To do template matching, we perform
the same procedure for a query image to extract feature. Then, we detect an object that match the reference image by comparing the feature vectors of two images
based on euclidean distance. If an object in query image matches the reference image, then features vector at certain location must match features of reference image.
Using this method we can locate the doll in an image. We use opencv2 to do image processing. However, the SIFT function we used is only allowed in opencv3. We try to
follow the given instruction to install opencv3 and compile opencv3 into our Raspberry Pi. It fails due to some issues such as memory and some files missing. The compilation
process even change the configuration of our Raspberry Pi, which makes the Raspberry Pi fails to show on the monitor. We have to use the backup SD card due to this issue.We,
then use color detection to find the object in the query image. We first assign a unique color to each doll. Each color have a color range , . We first filter out all
the pixels that are out of the range by setting them to be zero. We set all the pixels to 255 if they are in the color range. In order to filter out disturbance, we say an object
is being detected if more than 5000 pixels are in the range. Figure below shows the color detection of an object. The red point shows the centct.
If an object is detected, the claw then stop and proceed step 3. However, if the claw reach the right limit and no
object is found, the claw is then go back its initial position.
If an object is found at step 2, step 3 then move the claw to the location of the found object. Using object detection, we can get
pixels of the object that match the reference image. Center of the found object can be computed as the mean of x axis and mean of y axis of those pixels. We know that the position of
the camera in the captured image must locate in the center. Then, the claw can move according to the relative position of the found object from the center. The camera keeps tracking the
relative position of the object as the claw move toward the object. Ideally, the claw stops until the object is underneath the claw. In other words, the center of the object is at the center
of an image. However, the object disappear as the claw is close to the object. This is because the camera is set at the top of the claw, the claw will cover the object if it is underneath
the claw. Therefore, we say an object is underneath the claw if one of the condition meet. First, the object disappear from the image. Second, relative distance of x axis and y axis of the object
from the center are small enough. The last condition is for an extreme case. For example, if the object located at the edge of the box. Before the claw move close to the object, a limit switch could
be touched. To solve this problem, we set the third condition to be that left/right limit switch is touch and forward-backward distance is close enough, or forward/backward limit switch is touched and
left-right distance is close enough, or two limit switches are touched.
As soon as the claw move to the location of the found object, step 4 is processed. Step 4 is to grab the doll and return the doll to the exit. Since the height
of doll from the claw is fix. The claw is moved down and then move up for a fix time. The exit is in the same position of the initial location of the claw. In order to move back initial position, it is necessary
to the current position. To do that, we use state machines which is the same as the hand mode, and voice control part. Therefore, we keep track of the states of the claw in all steps stated above. When the claw
move back initial position, it moves left and backward until it reaches state left limit and backward limit.
One thing to point out is that, it is harder to identify two color if they have overlapped color range. We use HSV color space to help use better identify two color instead of RGB. However, even that there still exist some color have overlapped color range, such as brown and yellow. We performs a large amounts of experiment to determine a color range that can be identified betwlow.
As the result of our project, we have a fully working claw machine with a user interface on piTFT.
There are three mode of the claw machine.Hand Mode. As a user select the hand mode on piTFT. He or she can freely control the claw with three lever switches,
left/right, forward/backward and up/down.
Voice Mode. In this mode, as a user presses the record button on piTFT, his or her voice would be recorded.
The voice is then parsed to a command. The claw machine would move accordingly. In this mode, the user can not only specify the direction but also the duration the
claw move. For example, if the user speak to the microphone ‘go right for three seconds’, then the claw move to the right for three seconds.
Auto Mode. In this mode, a user specify the doll he or she want. The claw machine then perform a real time object detection
to locate the doll. As the claw move to the location of the doll, it capture the doll, move to the drop down zoon and drop it.
A user interface is built on the piTFT. A user freely selects the mode they want. In each mode, the user is allows to restart
if he or she press the restart button on piTFT. If the restart button is pressed, the claw moves back original position. The user can also quit the current mode by pressing
the quit button. Doing that, the claw also move back initial position, and the user interface go back previous level. In voice mode, piTFT would show the command that is re
cognized after recording. It helps the user to know how the machine recognize and correct the voice command.
The result of our project meets most of the goal we proposed. Three mode of the claw machine, hand mode, voice mode and auto mode, are all successfully built. H
owever, some goals did not work due to the configuration of the claw machine and Raspberry Pi. According to our initial plan, user can either choose the area he or she want the claw to capture or the specific doll to capture.
Due to the configuration of the claw machine, there is no place for the camera to set up such that the camera can capture an image covered all the area without concealed by the claw. Another issue is because of the Raspberry.
Opencv 3 cannot be compiled into Raspberry Pi since Opencv do not support the architecture the Raspberry Pi use. Therefore, some latest functions such as SIFT in opencv 3 are not able to use, which makes our object detection algorithm changed.