Developed an iOS app that uses the iPhone X TrueDepth Camera to record a 3D face mesh for a hospital patient with pain. The app also records video to be processed by OpenPose as an alternate source of facial keypoints. The original video is not accessible to the researchers for privacy reasons. Participated in creating a machine learning model that takes the self-reported pain scores of the patients as ground truth to train a model that can predict pain levels from facial expressions.
Technical challenges. Created an app that uses the Apple Augmented Reality Kit to access and store a face mesh at 30 fps. Recorded data and video avoiding gaps caused by multi-threading issues. To protect patients’ privacy, set up a pipeline for downloading the data from the iPhone to an offline Linux server located at the hospital. Set up the Linux server to perform GPU computations without access by experts. Implemented scripts that processed the video with OpenPose and encrypted the results with GPG for transport on a USB drive. Developed a player based on WebGL to play a face mesh recording as a video without access to the original video.
Technologies. iOS, Swift, ARKit, UIKit, Python, OpenPose, GPG encryption.
In preparation for a collaboration with a university on locating lung nodules in CT images, created a web-based viewer and labeler for CT images. Computed CT image slices in two additional dimensions to allow for navigating through a CT image cube by displaying synchronized slices in 3D. Supported marking of 3D bounding boxes of lung nodules, navigating to previously marked bounding boxes, and linking passages of radiology reports to those marked regions.
Technical challenges. Load DICOM images in batches to display progress. Compute horizontal and vertical image slices on demand in off-screen canvases by applying the selected grayscale range from the 12-bit source image to the 8-bit viewer. Implement zooming by drawing the off-screen canvases into the viewer canvases for the three slices. Synchronize panning, zooming, and changing slices across the three viewers.
Technologies. DICOM medical images, React, XMLHttpRequest, HTML canvas, JavaScript.
Worked on an approach for localization in the gastrointestinal tract based on images. This would support non-invasive surgery and other medical applications.
Technical challenges. Developed an approach for automatically traversing the 3D model of a gastrointestinal tract with minor random variations for different traversals. Collected images at regular intervals in different directions from the current camera position. Maintained smooth camera rotations for good overlap between collected images and for a pleasant viewing experience. Discussed different image collection approaches to best support the machine learning algorithms.
Technologies. Unity video game engine, quaternions, C#.
Participated in creating an approach for accurately forecasting future activity from the activity observed in a video. Applied the approach to an open dataset of assembling furniture, resulting in higher accuracy than previous results for the same dataset. Explored how this technology could be used for fall prediction in hospital settings.
Technical challenges. Created a Python wrapper for Dense Trajectories to produce the input for a deep learning model. Explored whether Dense Inverse Search (DIS) would speed up the optical flow computation of dense trajectories but found that the OpenCV implementation of Farnebäck dense optical flow has much better GPU acceleration. Developed an approach for running OpenPose in a Docker container for an improved version of our activity forecasting. Explored how to speed up a PyTorch model using ResNet and other layers. Worked around a bottleneck of reading HDF5 data by using a faster compression algorithm.
Technologies. Python, Numpy, HDF5, C++, Tensorflow, PyTorch, ResNet.