ECE 5725 Final Project - Robo Buddy

by Victor Fuentes (vmf24) and Katherine Fernandes (kaf245)

Wednesday Lab - May 12th Demo

Figure 2: initial sketch — Figure 1: Robo Buddy

Demonstration Video

Introduction

Robo Buddy is a friendly robot assistant who is always ready to help answer any questions and share his love for potatoes. Robo buddy runs on NixOS and uses various API’s including Whisper, OpenWakeWord, and Mimic to listen, understand, and respond to users. He underwent multiple prototypes before settling with a laser cut acrylic body. He has the ability to dance and wave with the help of servo and DC motors.

Project Objectives:

Create a Robot Buddy that you can talk to and have a conversation with. The Robot Buddy performs limited actions, such as dancing and waving. It is able to understand questions asked to it and provide audible and understandable answers.

Design

Brain (software)

NixOS

We opted to use NixOS on the Raspberry Pi rather than the default Raspberry Pi OS, as it offers an easier workflow from coding on a personal laptop (x86_64) and having that code and dependencies all work as expected when transferred to the Raspberry Pi (Aarch64). NixOS also provides a large variety of packages (More than 80,000, currently the most of any Linux distribution), many of which support Aarch64, and therefore the Raspberry Pi. Another advantage of using NixOS is having the Nix package manager set up. A common issue across a lot of old projects is that given untracked dependencies, it’s impossible to compile and use in the future. Nix solves this by strictly versioning every version of every dependency, that way down the line, anyone else can fire up Nix and instantly have a working version of the project (no hunting for old dependencies, no downgrading the system version of SDL).

Another advantage of NixOS is rather than needing to store multiple 16Gb backups of the SD card on every OS change, whenever anything on the system level is changed, NixOS creates a “generation”, which can be chosen at boot time. If some change breaks the system, all that is needed is a reboot and choosing the previous generation (New update broke everything? Just reboot and everything works again!). This saved us some time when some changes caused the Raspberry Pi to not boot correctly. The system state is also defined by a few .nix files. The entire operating system is built using these definitions, meaning that with only a few text files, the state of the operating system is completely reproducible. This combined with version control via git means that with every commit to these text files, we can completely rebuild the operating system in some previous state, and anyone with these configurations can also build and deploy the system on their own Raspberry Pis.

The one disadvantage of using NixOS is that all the progress we had done over ECE 5725 Labs 1-4 were invalided. In order to use the piTFT screen and pigpio, we needed to start over. First we installed NixOS on the Raspberry Pi and confirmed that basic functionality worked. Next in order to get the piTFT screen working, we looked over the Raspberry Pi 4 device tree overlay documentation for the device tree overlays necessary to make the piTFT function correctly. For the tft to function, we added the pitft28-resistive-overlay to our device tree. We also needed a few more overlays for proper GPIO motor support, these included pwm-2chan-overlay and w1-gpio-overlay. Since NixOS doesn’t have an editable config.txt file in /boot, we made a custom overlay to enable SPI (same functionality as adding dtparam=spi=on to config.txt). With all of the overlays in place, the piTFT and pigpio worked as expected.

Programming – Whisper, OpenWakeWord, and Mimic

In order for the Robo Buddy to be able to interact with people, we needed some way for it to listen and understand what was being said to it, and some way for it to be able to speak its responses. One of the first hurdles we tackled was figuring out how to implement speech to text (STT) on the Raspberry Pi. From looking online and on GitHub, we found that OpenAI’s Whisper is, as of writing, one of the best STT programs. However the program from OpenAI runs pretty slowly on the Raspberry Pi. In looking for alternatives, we found a fork of Whisper called faster-whisper. Faster-whisper still runs pretty slowly on the Pi, but using it combined with using the least accurate faster-whisper-tiny.en model, we were able to run speech to text on the Raspberry Pi locally. As it was pretty slow, we also added an option to use the OpenAI web API instead for faster transcription.

Next we needed a way for Robo Buddy to know when a person is addressing it specifically. To do this, we looked for some existing projects implementing wake-word functionality. The first one we tested was howl. However after trying to get it working with our project for some time with little success, we decided to move on to another project, OpenWakeWord. Installation of OpenWakeWord was very straightforward, and it worked well with our project out of the box. We did not have time to fully implement a custom wake word, but included wake words covered our needs. We also added the ability to tap the piTFT screen to skip needing to speak the wake-word.

Finally, in order for Robo Buddy to speak, we needed some form of Text to Speech (TTS). One TTS readily available on Linux is espeak, however it sounds very robotic and is barely legible. Another option we looked into was using ElevenLabs Voice AI API. The voices sound very good, however the request to the API adds extra time waiting before a response is spoken. We settled on another tool available on NixOS called Mimic. mimc still sounds robotic, but not to the extent of espeak. It also generates voice lines quickly when called. We implemented mimic as the default TTS engine for our robot, with the option to use ElevenLabs Voice AI.

Different Versions (`--web`, `--fast`, `--local`, `--gpt4`)

Within the main program, there are four flags that determine which services are used for different functions of the robot. --web uses web APIs as much as possible. OpenAI Whisper API for STT and ElevenLabs Voice AI for TTS. --fast uses the fastest methods available for STT and TTS, OpenAI API for STT and the mimic command for TTS. --local uses as many local options as possible. It uses faster-whisper for STT and the mimic command for TTS. --gpt4 enables the use of GPT4 for the chat responses rather than the default GPT-3.5-turbo. This makes the quality of the responses a lot better, however make responses significantly slower.

Body

This was our initial sketch of the robot when we created our project proposal.

Figure 3: Robot Body Design in Fusion360

The original plan was to 3D print a body for the robot. Neither of us had experience designing 3D models in Fusion360 so we followed some tutorials to learn the basics. After familiarizing ourselves with Fusion360 we made an example body.

Afterwards, we decided to prototype with cardboard boxes to find the correct dimensions and finalize how we wanted to attach the motors. There were a couple of different iterations as shown below:

The first prototype consisted of two boxes for the head and the body. These boxes were very small and could not fit the RaspberryPi or piTFT. We decided a bigger box would be better for the body. At this point it was decided to focus on prototyping the body of the robot.

The second cardboard box prototype was a cube of 120mm by 120mm by 120mm. The DC motors were attached with screws and using the acrylic spacers from lab 3. This was a good starting box however we felt it was too wide and needed to be narrower but deeper. In addition, we decided to move the motors to the outside since the wheels were experiencing friction with the box and could not be attached properly to the motors.

For the third box prototype we accidentally cut windows on two of the sides. This turned into a useful accident to see how everything fit inside the box. As shown in the photo the RaspberryPi and the piTFT were able to fit inside, but we decided to make the box shorter and wider to account for having the USB microphone and speaker attached to the RaspberryPi.

The final cardboard prototype included cardboard boxes for the head, body, and arms. There is a cut out in the head for the piTFT and for the wires connecting the piTFT to the RaspberryPi in the body.

Since we are making hollow boxes we were told it would be easier and faster to use laser cut acrylic rather than 3-D printing. We used MakerCase to get the laser cut designs after inputting the proper dimensions and if we wanted fingered edges to help hold the box together. Smith from the MakerClub helped us use the laser cutter.

Muscles

We used 3 different types of motors for the robot. The “legs” were DC motors driven by a Sparkfun TB6612FNG dual-channel motor driver that was used in lab 3. The arms were controlled by Tower Pro SG92R Micro Servos and the head was controlled by a Standard Parallax Servo Motor. We used hardwarePWM from the pigpio library to control the DC motors and initially the plan was to continue using softwarePWM from lab 3 to control the servos. After deciding to combine all our servo code into one file we decided it would be best to use pigpio for all of the servos, as it caused the least issue when running everything together. The microphone and speakers were connected to the Raspberry Pi using a USB-A cable.

piTFT and Faces

We decided to use the piTFT to display the face of Robo Buddy. Some issues we ran into was how to connect the piTFT to the Raspberry Pi and have access to the GPIO pins to connect the motors. Originally we used the cable to connect the piTFT to the Raspberry Pi and then connected the piTFT directly to a Raspberry pi cobbler on a breadboard. This caused the robot to be very front heavy. Afterwards we planned to use two cables to connect the piTFT to the Raspberry Pi and the Pi cobbler however for some reason the piTFT was not functioning. Instead, we ended up removing the Pi Cobbler and wiring directly to the end of one of the cables.

Figure 10: Robo Buddy Facial Expressions

We used Piskel to create different facial expressions for the robot to display on the piTFT.

Testing

Voice

Since we used Nix to set up the main project and its dependencies, it works just as well on a laptop as on the Raspberry Pi. In order to test all of the voice and chat response parts of the project, we ran and tested the response on a laptop first before putting it onto the Raspberry Pi itself. This allowed us to make changes and test the chat features while not in the lab and without the Raspberry Pi. We also initially tested with a more sensitive microphone while making adjustments to ensure that any potential problems were not microphone related.

Motors

We tested the motors on the Raspberry Pi first one at a time and later all together. The first motors tested were the DC motors we used as part of the robot built during Lab 3. We already knew the circuit and parts necessary to run the motors, so aside from some issue where we needed to debug the standby connection of the motor controller, the DC motors worked as expected. Afterwards, we tested the micro servos and the parallax servo which also worked as expected.

Integrating Everything

There were a few issues that arose when integrating all the parts of the robot, software parts and hardware parts together. The first we noticed was an issue where some of the servos would randomly start or stop moving. At this point we were still using the software pwm code from Lab 3 using rpi-gpio. We switched to using pigpio in order to fix this issue. Another issue we ran into was when using PyGame to display the Robot Buddy’s face on the screen, sound input could not be received by the chat program. In order to fix this without spending too much time debugging, we stopped using PyGame entirely, opting to switch to drawing the face image directly to the piTFT framebuffer using the fbi command from fbida. Since root is needed to write to the framebuffer, we created a systemd service that runs a python script and receives input from a FIFO available to the pi user. When the pi user writes a string corresponding to some face into the FIFO, the service script displays the face onto the piTFT display. Displaying the face in this manner eliminates the need to use PyGame for the most parts, however we still need a way to capture touch from the display. To do this we directly use the python bindings for evdev, no need for any old version of SDL, TSLIB, or PyGame.

When putting together the acrylic box, we used hot glue and tape on the inside to hold everything together. On some parts such as the face, we needed to use tape on the outside of the head in order to hold it together. When adding batteries to run the robot untethered, we needed to mount the battery pack on the outside, as it was too long to fit inside the robots as initially planned. We used long strips of tape that acted as backpack straps in order to mount the battery pack on its back. With everything assembled, we still had the ability to use SSH to access the Pi in order to debug software components.

Result/Conclusion

Overall we met all of the initial plans we set out to complete. The Robot Buddy is able to listen until it is called, understand speech, and provide relevant answers. It is also able to perform some actions such as dancing and waving. We did not reach any of our potential extended goals, however overall the robot functioned very well for our target scope.

Future Work

If we had more time to work on the project we would have implemented some of the extensions from our initial project proposal. These extensions included adding a sensor for the robot to detect the general direction of the person speaking to it and rotate its body to face them. We also could give Robo Buddy more commands and movements such as moving forward a certain distance. In addition, allowing Robo Buddy to roam the area it is in freely, listen to its surroundings and make decisions/reactions based on what it hears. Another aspect we would explore is the ability to make the software 100% declarative and reproducible using Nix but unfortunately some python packages are needed from pip at the moment.

Work Distribution

Project group picture

Katherine Fernandes

kaf245@cornell.edu

Worked on prototyping and making the body out of cardboard and acrylic
Implemented motor control
Helped test system

Victor Fuentes

vmf24@cornell.edu

Implemented NixOS operation system
Added Whisper STT and Mimic TTS
Implemented OpenAI API into Python project

Parts List

Part	Cost
Raspberry Pi 4b	Provided by lab
PiTFT Display	Provided by lab
DC Motors	Provided by lab
Dual Channel Motor Controller	Provided by lab
Micro Servo motors	Provided by lab
Parallax Standard servo	Provided by lab
Raspberry Pi Case Fan	Provided by lab
USB Microphone	Provided by lab
USB Speaker	Provided by lab
Wires	Provided by lab
Cardboard	Provided by lab
Tape	Provided by lab

Total: $0

References

Code Appendix

Project GitHub : https://github.coecis.cornell.edu/vmf24/ece5725project
Operating System GitHub: https://github.coecis.cornell.edu/vmf24/ece5725nixos