For the past week, I’ve been running the same experiments I outlined before. It turned out that I had an error in my code that was only visible after running for an epoch, which takes 12 hours. As a result, I had to restart training from scratch. I will need 20 or more epochs of these networks, so I won’t be getting full results any time soon.
In the meantime, I’m working on the manuscript for the paper on this work. The paper submission deadline I’m working towards is at the end of October, so I’m getting going on the manuscript now so that when experiment results come in, I’ll already have some of the paper writing done.
From a logistics perspective, I also managed to get one of our training machines into a Berkeley data center. Now, the machine is properly cooled and ventilated, has 1 Gbps download speeds, and has proper UPS backup. I’m hoping to start getting the rest of the machines into that data center as well since we continue to have unnecessary outages and other issues with the machines.