by Victor Fuentes (vmf24) and Katherine Fernandes (kaf245)
Wednesday Lab - May 12th Demo
Robo Buddy is a friendly robot assistant who is always ready to help answer any questions and share his love for potatoes. Robo buddy runs on NixOS and uses various API’s including Whisper, OpenWakeWord, and Mimic to listen, understand, and respond to users. He underwent multiple prototypes before settling with a laser cut acrylic body. He has the ability to dance and wave with the help of servo and DC motors.
Create a Robot Buddy that you can talk to and have a conversation with. The Robot Buddy performs limited actions, such as dancing and waving. It is able to understand questions asked to it and provide audible and understandable answers.
We opted to use NixOS on the Raspberry Pi rather than the default Raspberry Pi OS, as it offers an easier workflow from coding on a personal laptop (x86_64) and having that code and dependencies all work as expected when transferred to the Raspberry Pi (Aarch64). NixOS also provides a large variety of packages (More than 80,000, currently the most of any Linux distribution), many of which support Aarch64, and therefore the Raspberry Pi. Another advantage of using NixOS is having the Nix package manager set up. A common issue across a lot of old projects is that given untracked dependencies, it’s impossible to compile and use in the future. Nix solves this by strictly versioning every version of every dependency, that way down the line, anyone else can fire up Nix and instantly have a working version of the project (no hunting for old dependencies, no downgrading the system version of SDL).
Another advantage of NixOS is rather than needing to store
multiple
16Gb backups of the SD card on every OS change, whenever anything on the
system level is changed, NixOS creates a “generation”, which can be
chosen at boot time. If some change breaks the system, all that is
needed is a reboot and choosing the previous generation (New update
broke everything? Just reboot and everything works again!). This saved
us some time when some changes caused the Raspberry Pi to not boot
correctly. The system state is also defined by a few .nix
files. The entire operating system is built using these definitions,
meaning that with only a few text files, the state of the operating
system is completely reproducible. This combined with version control
via git means that with every commit to these text files, we can
completely rebuild the operating system in some previous state, and
anyone with these configurations can also build and deploy the system on
their own Raspberry Pis.
The one disadvantage of using NixOS is that all the progress we
had
done over ECE 5725 Labs 1-4 were invalided. In order to use the piTFT
screen and pigpio, we needed to start over. First we installed NixOS on
the Raspberry Pi and confirmed that basic functionality worked. Next in
order to get the piTFT screen working, we looked over the Raspberry
Pi 4 device tree overlay documentation for the device tree overlays
necessary to make the piTFT function correctly. For the tft to function,
we added the pitft28-resistive-overlay
to our device tree.
We also needed a few more overlays for proper GPIO motor support, these
included pwm-2chan-overlay
and
w1-gpio-overlay
. Since NixOS doesn’t have an editable
config.txt
file in /boot
, we made a custom
overlay to enable SPI (same functionality as adding
dtparam=spi=on
to config.txt
). With all of the
overlays in place, the piTFT and pigpio worked as expected.
In order for the Robo Buddy to be able to interact with people,
we
needed some way for it to listen and understand what was being said to
it, and some way for it to be able to speak its responses. One of the
first hurdles we tackled was figuring out how to implement speech to
text (STT) on the Raspberry Pi. From looking online and on GitHub, we
found that OpenAI’s Whisper is, as of writing, one of the best STT
programs. However the program from OpenAI runs pretty slowly on the
Raspberry Pi. In looking for alternatives, we found a fork of Whisper
called faster-whisper.
Faster-whisper still runs pretty slowly on the Pi, but using it combined
with using the least accurate faster-whisper-tiny.en
model,
we were able to run speech to text on the Raspberry Pi locally. As it
was pretty slow, we also added an option to use the OpenAI web API
instead for faster transcription.
Next we needed a way for Robo Buddy to know when a person is addressing it specifically. To do this, we looked for some existing projects implementing wake-word functionality. The first one we tested was howl. However after trying to get it working with our project for some time with little success, we decided to move on to another project, OpenWakeWord. Installation of OpenWakeWord was very straightforward, and it worked well with our project out of the box. We did not have time to fully implement a custom wake word, but included wake words covered our needs. We also added the ability to tap the piTFT screen to skip needing to speak the wake-word.
Finally, in order for Robo Buddy to speak, we needed some form of
Text to Speech (TTS). One TTS readily available on Linux is
espeak
, however it sounds very robotic and is barely
legible. Another option we looked into was using ElevenLabs Voice AI
API. The voices sound very good, however the request to the API adds
extra time waiting before a response is spoken. We settled on another
tool available on NixOS called Mimic. mimc
still sounds robotic, but not to the extent of espeak
. It
also generates voice lines quickly when called. We implemented
mimic
as the default TTS engine for our robot, with the
option to use ElevenLabs Voice AI.
--web
, --fast
, --local
,
--gpt4
)
Within the main program, there are four flags that determine
which
services are used for different functions of the robot.
--web
uses web APIs as much as possible. OpenAI Whisper API
for STT and ElevenLabs Voice AI for TTS. --fast
uses the
fastest methods available for STT and TTS, OpenAI API for STT and the
mimic
command for TTS. --local
uses as many
local options as possible. It uses faster-whisper for STT and the
mimic
command for TTS. --gpt4
enables the use
of GPT4 for the chat responses rather than the default GPT-3.5-turbo.
This makes the quality of the responses a lot better, however make
responses significantly slower.
This was our initial sketch of the robot when we created our project proposal.
The original plan was to 3D print a body for the robot. Neither of us had experience designing 3D models in Fusion360 so we followed some tutorials to learn the basics. After familiarizing ourselves with Fusion360 we made an example body.
Afterwards, we decided to prototype with cardboard boxes to find the correct dimensions and finalize how we wanted to attach the motors. There were a couple of different iterations as shown below:
The first prototype consisted of two boxes for the head and the body. These boxes were very small and could not fit the RaspberryPi or piTFT. We decided a bigger box would be better for the body. At this point it was decided to focus on prototyping the body of the robot.
The second cardboard box prototype was a cube of 120mm by 120mm by 120mm. The DC motors were attached with screws and using the acrylic spacers from lab 3. This was a good starting box however we felt it was too wide and needed to be narrower but deeper. In addition, we decided to move the motors to the outside since the wheels were experiencing friction with the box and could not be attached properly to the motors.
For the third box prototype we accidentally cut windows on two of the sides. This turned into a useful accident to see how everything fit inside the box. As shown in the photo the RaspberryPi and the piTFT were able to fit inside, but we decided to make the box shorter and wider to account for having the USB microphone and speaker attached to the RaspberryPi.
The final cardboard prototype included cardboard boxes for the head, body, and arms. There is a cut out in the head for the piTFT and for the wires connecting the piTFT to the RaspberryPi in the body.
Since we are making hollow boxes we were told it would be easier and faster to use laser cut acrylic rather than 3-D printing. We used MakerCase to get the laser cut designs after inputting the proper dimensions and if we wanted fingered edges to help hold the box together. Smith from the MakerClub helped us use the laser cutter.
We used 3 different types of motors for the robot. The “legs” were DC motors driven by a Sparkfun TB6612FNG dual-channel motor driver that was used in lab 3. The arms were controlled by Tower Pro SG92R Micro Servos and the head was controlled by a Standard Parallax Servo Motor. We used hardwarePWM from the pigpio library to control the DC motors and initially the plan was to continue using softwarePWM from lab 3 to control the servos. After deciding to combine all our servo code into one file we decided it would be best to use pigpio for all of the servos, as it caused the least issue when running everything together. The microphone and speakers were connected to the Raspberry Pi using a USB-A cable.
We decided to use the piTFT to display the face of Robo Buddy. Some issues we ran into was how to connect the piTFT to the Raspberry Pi and have access to the GPIO pins to connect the motors. Originally we used the cable to connect the piTFT to the Raspberry Pi and then connected the piTFT directly to a Raspberry pi cobbler on a breadboard. This caused the robot to be very front heavy. Afterwards we planned to use two cables to connect the piTFT to the Raspberry Pi and the Pi cobbler however for some reason the piTFT was not functioning. Instead, we ended up removing the Pi Cobbler and wiring directly to the end of one of the cables.
We used Piskel to create different facial expressions for the robot to display on the piTFT.
Since we used Nix to set up the main project and its dependencies, it works just as well on a laptop as on the Raspberry Pi. In order to test all of the voice and chat response parts of the project, we ran and tested the response on a laptop first before putting it onto the Raspberry Pi itself. This allowed us to make changes and test the chat features while not in the lab and without the Raspberry Pi. We also initially tested with a more sensitive microphone while making adjustments to ensure that any potential problems were not microphone related.
We tested the motors on the Raspberry Pi first one at a time and later all together. The first motors tested were the DC motors we used as part of the robot built during Lab 3. We already knew the circuit and parts necessary to run the motors, so aside from some issue where we needed to debug the standby connection of the motor controller, the DC motors worked as expected. Afterwards, we tested the micro servos and the parallax servo which also worked as expected.
There were a few issues that arose when integrating all the parts
of
the robot, software parts and hardware parts together. The first we
noticed was an issue where some of the servos would randomly start or
stop moving. At this point we were still using the software pwm code
from Lab 3 using rpi-gpio. We switched to using pigpio in order to fix
this issue. Another issue we ran into was when using PyGame to display
the Robot Buddy’s face on the screen, sound input could not be received
by the chat program. In order to fix this without spending too much time
debugging, we stopped using PyGame entirely, opting to switch to drawing
the face image directly to the piTFT framebuffer using the
fbi
command from fbida
. Since
root is needed to write to the framebuffer, we created a systemd service
that runs a python script and receives input from a FIFO available to
the pi
user. When the pi
user writes a string
corresponding to some face into the FIFO, the service script displays
the face onto the piTFT display. Displaying the face in this manner
eliminates the need to use PyGame for the most parts, however we still
need a way to capture touch from the display. To do this we directly use
the python bindings for evdev, no need for any old version of SDL,
TSLIB, or PyGame.
When putting together the acrylic box, we used hot glue and tape on the inside to hold everything together. On some parts such as the face, we needed to use tape on the outside of the head in order to hold it together. When adding batteries to run the robot untethered, we needed to mount the battery pack on the outside, as it was too long to fit inside the robots as initially planned. We used long strips of tape that acted as backpack straps in order to mount the battery pack on its back. With everything assembled, we still had the ability to use SSH to access the Pi in order to debug software components.
Overall we met all of the initial plans we set out to complete. The Robot Buddy is able to listen until it is called, understand speech, and provide relevant answers. It is also able to perform some actions such as dancing and waving. We did not reach any of our potential extended goals, however overall the robot functioned very well for our target scope.
If we had more time to work on the project we would have implemented some of the extensions from our initial project proposal. These extensions included adding a sensor for the robot to detect the general direction of the person speaking to it and rotate its body to face them. We also could give Robo Buddy more commands and movements such as moving forward a certain distance. In addition, allowing Robo Buddy to roam the area it is in freely, listen to its surroundings and make decisions/reactions based on what it hears. Another aspect we would explore is the ability to make the software 100% declarative and reproducible using Nix but unfortunately some python packages are needed from pip at the moment.
kaf245@cornell.edu
vmf24@cornell.edu
Part | Cost |
---|---|
Raspberry Pi 4b | Provided by lab |
PiTFT Display | Provided by lab |
DC Motors | Provided by lab |
Dual Channel Motor Controller | Provided by lab |
Micro Servo motors | Provided by lab |
Parallax Standard servo | Provided by lab |
Raspberry Pi Case Fan | Provided by lab |
USB Microphone | Provided by lab |
USB Speaker | Provided by lab |
Wires | Provided by lab |
Cardboard | Provided by lab |
Tape | Provided by lab |