First Consulting Contract

This week, I closed on the first-ever consulting contract for Tarada Consulting, LLC. While I can’t currently discuss the client, I can say that I will be working on an interesting machine learning problem as a consultant.

I’m hoping that Tarada Consulting will continue to win consulting contracts and allow me to gain more experience in machine learning. There are also some potential opportunities for collaboration with researchers in different fields for application of machine learning to areas where it’s never been applied before.

Once this contract is complete, if I receive permission to publish my client’s name, then I’ll put up an update at that time.


Research Update and Consulting

I spent the past few weeks working very hard to prepare my lab for important upcoming deadlines. After the long hours were over, I decided to leave my current position. However, I will be reviewing all papers I co-authored as they continue to move through the publishing pipeline. I am co-author on a paper that the lab is set to submit to ICRA 2018, which will be on multi-task learning of behavioral modes in autonomous driving. Another paper I am co-author on will be submitted to ICLR 2018, whose deadline is a bit further out. That paper will be on my work on autonomous driving with SqueezeNet and LSTMs.

In the mean time, I have formed a consulting firm, Tarada Consulting, LLC, through which I will be doing deep learning consulting. I have a number of projects in the works, though I may not be able to discuss some of them here due to their confidentiality. I will be sure to detail any projects I do that are not encumbered by NDAs or other confidentiality requirements.

Switching Dataset Formats

During the last week at Karl’s autonomous RC car lab, we made significant progress in fixing the slow training speed¬†and memory leaks. Essentially, there were two main problems. One problem was that the internal structure of the dataset within our HDF5 files was much too complicated. The second was that our autonomous driving dataset is simply too large and complex to do any substantial on-the-fly data processing during training.

Improving HDF5 Layout

To address the first problem, Karl created a new, somewhat simplified layout inside the HDF5 files containing the dataset, which made random access significantly more efficient. Previously, random access within the dataset required multiple dictionary lookups, which are significantly slower than indexing into an array. All of this was flattened out into a single, large, multi-dimensional array that contained all of the data indexed with integers.


The second issue had only one answer: pre-processing. We finally created pre-processing code that worked. This required a pre-processing pipeline made up of multiple stages. I won’t go through all of them here, but I’ll discuss the crucial part.

We made a complete pass through the dataset to make it fully ready for training. In the old system, we had a couple hundred HDF5 files called “runs.” Each run is a set of data collected in an uninterrupted timeline. However, the entire timeline wasn’t necessarily good for training. When the car was picked up or not moving, data was still recorded in the run, but it would not have been useful to train on this. Instead, we had a system of converting runs into “segments” on the fly. Each segment is a set of data collected in an uninterrupted timeline, and consisting entirely of usable data. In this new system, when we passed through the data during pre-processing, we broke up each run HDF5 file into several segment HDF5 files, each containing a continuous stream of trainable data. Any data that wasn’t trainable was discarded. We ended up with clean, compact files that each contained only continuous, usable data.

ONNX for Neural Networks

Just in the last few days, I’ve been seeing a lot about a new open source format for neural network models called Open Neural Network Exchange, or ONNX. I haven’t yet gotten a chance to try it out myself, but it looks very promising.

ONNX appears to be a way to save neural network models from multiple deep learning frameworks in a universal format that is cross-compatible. If this turns out to really work, then it would be a major advancement, as currently models made in one deep learning framework are very hard to translate to another.

I can think of several cases in the projects I’ve already worked on where having a format like ONNX would have been immensely helpful. The way I’d use it, it would allow me to take advantage of the pros of each deep learning framework I work in, while avoiding the cons by loading up my weights file in a different framework when I encounter a framework-specific issue.

I hope that ONNX ends up being integrated into every major deep learning framework. Their GitHub page claims that Caffe2, PyTorch, and Cognitive Toolkit will all support ONNX. However, in order for it to take off, I’d expect that TensorFlow/Keras support would be absolutely crucial. This will be an interesting project to watch. When I have some time to try out ONNX, I may test to see if I can transfer some simple networks between PyTorch and Caffe2.

Research and Logistics Update

For the past week, I’ve been running the same experiments I outlined before. It turned out that I had an error in my code that was only visible after running for an epoch, which takes 12 hours. As a result, I had to restart training from scratch. I will need 20 or more epochs of these networks, so I won’t be getting full results any time soon.

In the meantime, I’m working on the manuscript for the paper on this work. The paper submission deadline I’m working towards is at the end of October, so I’m getting going on the manuscript now so that when experiment results come in, I’ll already have some of the paper writing done.

From a logistics perspective, I also managed to get one of our training machines into a Berkeley data center. Now, the machine is properly cooled and ventilated, has 1 Gbps download speeds, and has proper UPS backup. I’m hoping to start getting the rest of the machines into that data center as well since we continue to have unnecessary outages and other issues with the machines.

HDF5 Memory Leak

For the past month or so, my colleague Sauhaarda and I have been trying to solve a strange problem in our training codebase. The longer we train our network, the more RAM it uses, even though there’s no in-scope variable that’s getting accumulated in our own code. After lots of memory profiling, we figured out that the issue wasn’t in our own code, but in the Python library h5py. This is the library that we use to read our data files. So why was it leaking and what was the solution?


It turns out that the underlying HDF5 library libhdf5 and the Python bindings h5py don’t properly clean up as they accumulate accesses to our data files. I haven’t yet tracked down whether this is an actual bug in the library or if it’s caused by the way the data files are organized. As a result, as training continues, RAM usage increases more and more until the entire program crashes because it runs out of memory.


I initially went with pre-processing to try to solve this problem, which is where that post on pre-processing came from. However, in practice, I was never able to get it to work without having some sort of bug that I couldn’t track down. As a result, my colleague came up with a stop-gap measure.


While pre-processing did speed up the training as I explained in the post linked above, there was some bug in the code that I couldn’t find quickly enough. Somehow, the validation loss was coming out 2 orders of magnitude worse than without pre-processing. Since I couldn’t track down the bug, I ended up shelving this method for a time when I don’t have a paper deadline coming up.

Stop-gap Script

For a few days, Sauhaarda and I tried to use the code without pre-processing and just hope that the RAM wouldn’t run out, but that quickly became impossible as we started training more and more experiments. Sauhaarda then came up with a stop-gap measure in which the training code runs only for one epoch each time it’s called, and a Bash script calls the training program in a loop. This way, the memory usage is reset after each epoch. As long as the RAM doesn’t run out before even one epoch can complete, our code runs. This is the method that I’m using to train the experiments I wrote about yesterday.


Although the memory issue bothers me a lot, I’ve resigned to using the stop-gap script for now so that research work can continue without being bogged down with the bug. I will eventually fix it after my paper deadline passes and I have some time to spend on the bug. For now, I will go with what my professor told me regarding this problem, “Do what works.”