Ph.D. Defense Post Mortem and Advice for Others

NOTE 1: This is part 1 in a series that will probably contain 3 or 4 parts. Then I will return to the usual data science etc. posts.
NOTE 2: This post was intentionally delayed until I received final approval on the submission of my final dissertation.

On March 14, I passed my final oral defense for my Ph.D. in Statistics.

It was the moment I had been waiting for. The moment I dreaded for so many years and the moment that I thought would never come. I had very few days and times to choose from. It came down to the 13th (bad luck?) and the 14th (Pi Day). My defense began at 10am and was supposed to take a max of 2 hours. I left home around 7am to be sure I arrived in enough time after what could be a long ride. I arrived at 8:45am after sitting in rush hour traffic for almost 2 hours. I printed color copies of my slides and four copies of my dissertation and set up each committee member’s spot nicely: one water, slides, dissertation, a plate and a napkin. I placed a big tray of color-sprinkled cookies in the middle of the table equidistant from each member’s “assigned” seat. The IT manager set up my laptop so it would project to the portable projector, and set up a second laptop which would serve as one of my committee members — my original advisor retired and was participating remotely via Skype.

Then the waiting started. At 9:50am my original advisor rang in on Skype and gave me a quick wave while chewing on a toothpick, a signature quirk. At 9:59am, nobody else was there and I started to panic. A few minutes later I got an email that one member was sitting outside a locked door on the other side of the hallway. I went to meet him, and the rest of my committee all arrived together, including my advisor, who I was warned by others is always late. I was then kicked out of the room while the committee deliberated my case, decided who would ask what questions, and what behavioral tests may be used (I am convinced this is a thing in PhD defenses, and I suppose it’s useful). Then a problem arose. Nobody could hear my remote member. I ran down the hallway to get the IT manager and he brought in some speakers that really didn’t help. Sigh I thought. We had no choice but to continue.

My advisor had joked that I had chosen a very “nice” committee. My advisor himself is very soft spoken, but knows how to deliver constructive criticism. He has this down to an art. I had never known this about him before working with him in this capacity. Another committee member is on the younger end and is perhaps even more soft-spoken than my advisor. I had heard that he is very rigorous in terms of theory and has high expectations in that regard. Many of his comments trailed off and I couldn’t really hear them. My outside member is known for being very friendly and I always felt this way about him (but I had forgotten something — more on that later). And nobody could hear my previous advisor, which was a shame. The day before my defense, my advisor was trying to pep me up speaking very highly of my work and that this should be pretty easy if not a “slam dunk.” So I was feeling pretty good, though he warned me “X really appreciates theory, so it would be helpful if you can frame your research theoretically.” The color left my face as I thought to myself “you’re telling me this now?” I had become accustomed to minor surprises though. During each of our meetings, my advisor would at one point put his clenched knuckles to his lower lip and stare deep into space for about 60 seconds or so, and return with a suggestion that required significant thought. It was almost like he was running through all combinations and permutations of how things could go down in an effort to be “preemptive” (that’s the word he used).

I won’t go into a narrative of everything that happened during the defense because that would take all day. It was a tough experience, but in hindsight it was a positive experience. Actually, it is a good thing I am writing this now because the committee’s feedback makes a lot more sense now and made my dissertation stronger. I consider myself open to constructive criticism (and even flat out criticism) but I was so caught off-guard and confused I sort of wish I could have seen the look on my face.

Fast forward to the aftershow:

My advisor was smiling, but there was no “congratulations” which scared me. He invited me into the room, closed the door, and we all quickly discussed some small changes. I took notes, and soon everyone left. On their way out they congratulated me on passing the defense.

After all that, nobody ate any of the cookies or drank the water.

My advisor said that my defense went “very smoothly,” and was “gentle” though it felt “brutal.” I admit it could have been a ton worse. These are some of the comments I received during my defense, which kind of made me think I had failed, but I had to keep up a facade that I was confident about what I was doing:

  1. My dissertation title, and the name I gave to the method I developed, was rejected.
  2. “The mathematical notation is very critical, and doesn’t seem dissertation-quality.”
    • Thought in my head: “I am meticulous about notation. Are you serious?” His arguments made sense though which I will discuss later.
  3. “Your method is very ad-hoc and makes a lot of assumptions.”
    • Thought in my head: “So are 99% of the methods in the literature!”
  4. “You have not really proven that your method works.”
    • Thought in my head: “I had a very easy to understand baseline, and my method exceeded that baseline.”
  5. I got a lot of questions from a committee member about what TF-IDF is which was bizarre because the person that asked me… was the person that taught me what TF-IDF is. I figured out what he was doing. He was doing his job and testing me. I think he was genuinely confused, but I also feel he intentionally stayed on the same topic, to see how I would respond. This is important for any academic that does research or teaches… to be able to communicate effectively.
  6. “Please don’t think I hate your research because I don’t, I’m just confused!” This made everyone laugh and was a great moment of levity.
  7. My advisor commented that I could address many concerns by adding some more literature review on data augmentation and he explained how it is used on the MNIST dataset in computer vision applications with neural networks (his research area).
    • Thought in my head: “Please no. How am I going to relate computer vision (a field that kind of scares me) methods to NLP?” and “But, neural networks are not my strong suit…”
  8. A newer faculty member in my field asked, “wait, is that a truncated SVD?”
    • Thought in my flustered head: “WTF is a truncated SVD? This has nothing do with with truncated or censored data!” He then mentioned the k in the subscript in my SVD and immediately “Oh, yeah, it’s a truncated SVD”
  9. And even the quintessential, “Why didn’t you use Deep Learning or a GAN?”
    • I won’t comment on the thought in my head. 😉

Most of the feedback above came from two of my members, with my advisor basically agreeing with them which made me nervous, but I found out why later. Sometimes if you can agree with people on the small stuff, they won’t be so eager to move on to harder things. I’ve been told that I am humble, so I know that my job as a student and professional in a setting like this is to shut up and listen because these people know much more than I do about what I am doing in terms of theory. I am actually much more comfortable in a technical-explained-practically setting. Anyway, I had seven strategies to get through the defense:

  • Be confident. You did a lot of work and you know this work better than anyone else. Besides, there is a quote that the dissertation is “supposed to be the worst research you ever write, and is the beginning of your research career, not the end.”
  • Be cooperative. (My advisor actually told me this one, but it’s obvious). The faculty want you to pass (my committee was very friendly, not all are), and they want you to also write a good manuscript.
  • Be humble and patient. It’s possible someone on my committee will be in the dark (possibly the outside member) and/or you may assume the committee is more familiar with your specialization (i.e. machine learning, finance, whatever) than they are.
  • Defend yourself, and push back (hard) if needed. Everybody of course has an opinion, but each should be respected (especially in this setting). My goal was to acknowledge and appreciate their suggestion, but push back if I felt someone’s suggestion was infeasible, was confusing, or did not make sense for my work. (YMMV here though as I think I gave my advisor the wrong idea though but that’s always bound to happen depending on personality…)
  • Readily admit “I Don’t Know.” I actually never had to use this, but considering the wringer I was put through in terms of questioning, I would have had no problem saying “I don’t know” if I truly did not know the answer to the question.
  • Take tons of notes to show you care about their feedback (you should!).
  • While organizing my research and writing the dissertation, I questioned myself about everything. “Why did you choose a topic model instead of a regular classifier?” “What kind of cross-validation did you do and why?” “What is the point of this? Why is this method not stupid?” “Why didn’t you use Deep Learning?” Except for Deep Learning, none of the questions on my “list” were asked.

I had a few big hiccups that I could have avoided:

  • My advisor suggested that I meet with my committee with my slides and have them go through them with me. Some of the issues I faced (notation) could have been fixed much earlier. Also, the slides were not in the sequence that one of the committee members was expecting. It lead to the awkward “what problem are you solving?” question. I had this exact same problem in my preliminary oral. If your advisor tells you to do something, do it, even it sounds like a “mere suggestion.”
  • Somehow, I must have given the wrong impression that I took the feedback personally. The thing is, I can be soft-spoken and intimidated when talking with faculty, but I am not that way in general, so I think they may have been surprised when I was a little more assertive than I usually was with them. Also, I figured out that my advisor may have interpreted the word “brutal” literally when all I meant was that it was “tough.” Because of that, he thought I was offended. Later on, he had used another word quite literally and I figured out what was going on.
  • It had been years since I had witnessed a final oral defense, and it was in a different department (Computer Science), not my own (Statistics). I do recall the one Stats defense I attended was very tense (even compared to CS) and I genuinely thought the guy was going to fail. The committee sounded like they hated his research, one member was late and the student called him out on it, and at one point the candidate told one of the members (paraphrased) “all of your questions could have been answered by actually reading my dissertation when I sent it to you a month ago” and his advisor laughed. He passed. And everyone was smiles and laughter afterwards. Go see a dissertation defense in your own department before you defend!
  • Not really a hiccup, but it had been a while since I had seen my outside member in a research setting. At the last seminar of his I presented at, I discussed a fun project I was working on, something I did not intend to publish. I remember he was very vocal and asked me really tough questions about my work that I was not expecting at all. This is what faculty is supposed to do. I wish I had taken that project more seriously and looked at it through publishing rather than engineering curiosity. I had forgotten about that experience, but boy did the memories come back during the defense.

My advisor met with me for about 20 minutes after my defense to go over some minor revisions and he spent quite a bit of time explaining why the committee had the feedback they did. I think he realized how surprised I was at the amount of “tension” in the room. I did not take any of it personally because a defense is not supposed to be easy, but since he was much more critical of my work compared to the past ten weeks, it threw me for a loop. We discussed the changes to be made and explained that this is like submitting a paper to a journal. If you want the paper to be published and a referee says to do something, just do it, even if it may not make sense. I agree with him and after thinking about their feedback more, I realize that there is a kernel of truth in each suggestion, and part of this process is learning how to incorporate feedback into something that makes sense:

  • The title was confusing and suggested that I was researching something very theoretical, and it seems the committee wasn’t satisfied enough theory had gone into the work assuming the title was correct as it was. I had been lazy with my terminology, and I changed the title to something more believable.
  • The notation was on par with some research papers I have read, but I remember feeling so confused about their notation. I had made a lot of assumptions and placed a lot of conditions on my sampling method, and those conditions were not reflected in the notation. So, this was a fair point and I ended up redoing all of the notation (a frustrating process) and adding a “notation guide” in the appendix.
  • The ad-hoc nature ended up being fine. The feedback came from the theoretical member and our program graduates dissertations that are applied in nature.
  • Not proving the method works was the hardest one to agree with but even as I was doing the research, I felt like something was missing in my comparison because it was my method of creating the baseline. I quickly did another experiment with someone else’s baseline.
  • Data augmentation was an area I had completely missed in my literature review. I avoided it because it seemed to only used with neural networks and computer vision. I instead focused on “query expansion” which is still used to this day, but the theory behind it is quite dated. I found through my expanded literature review that Data Augmentation was the other side of the coin and filled in some of the gaps in my thought process. I saw other researchers that had done similar (though still quite different) things that I did, but were much more relatable than query expansion. Honestly, it got me a bit more interested in computer vision and neural networks because it suggested that there were some principled methods and not just magic black boxes.

I took about a week off and then started making my edits. I used several different color pens to mark where I needed to make changes and what those changes should be. After all was said and done, that draft looked like it had open heart surgery performed on it. It seemed like no word went untouched.

Anyway, I hope this advice can help someone. “You’ll get through it” is what I say with some trepidation because it is an experience I do not want to go through again!

For anyone interested, my defense slides are here and my final dissertation is here.

Some New Year Resolutions for (this) Data Scientist in 2017

I’ve never been very big on New Year’s resolutions. I’ve tried them in the past, and while they are nice to think about, they are always overly vague, difficult to accomplish in a year, trite, or just don’t get done (or attempted). This year I decided to try something different instead of just not making resolutions at all. I set out some professional goals for myself as a Data Scientist. So without further ado…

1. Don’t Complain about It, Fix It: Contribute to Open Source Software (More)

Open source software is only as good as its community and/or developer(s). Developers are human and typically cannot manage all bugs and feature requests themselves. My goal is to routinely contribute back to the community either with new features, or by fixing bugs that I discover. This not only helps the community at large, but also helps me as a software engineer. There is no better way to become an even better engineer than by wading through someone else’s code. While this is something I did all day every day at my $DAYJOB, I do it less while on my sabbatical.

Some of the projects I use the most and that I hope to contribute to are scikit-learn and […]

It’s Been a While

This past three years has really flown. It’s time for me to finally get back to my roots and also start blogging more, like I did previously.

My last post was about Strata 2013. During this time period, I was taking a break from working full-time to finish a Ph.D. dissertation that I had neglected during my previous two positions. I learned my lesson the hard way, never work externally if you want a Ph.D. in a reasonable amount of time! I quickly got my dissertation from an intro to the first 65 pages or so during this gap. I then received an offer from Facebook. I was ready to move to Silicon Valley and enjoy all the things I had been envious over for so many years: the perks, the culture of innovation and intelligence, and the technology community. This was an opportunity I could not pass, and the dissertation went on the back-burner for another two years as I spent the majority of my waking hours, both during the week and the weekend… and on holidays… coding into a frenzy. I was looking forward to living in a world where I was entrenched in the technology and data ecosystem. […]

Summary of My First Trip to Strata #strataconf

In this post I am goIing to summarize some of the things that I learned at Strata Santa Clara 2013. For now, I will only discuss the conference sessions as I have a much longer post about the tutorial sessions that I am still working on and will post at a later date. I will add to this post as the conference winds down.

The slides for most talks will be available here but not all speakers will share their slides.

This is/was my first trip to Strata so I was eagerly awaiting participating as an attendant. In the past, I had been put off by the cost and was also concerned that the conference would be an endless advertisement for the conference sponsors and Big Data platforms. I am happy to say that for the most part I was proven wrong. For easier reading, I am summarizing talks by topic rather than giving a laundry list schedule for a long day and also skip sessions that I did not find all that illuminating. I also do not claim 100% accuracy of this text as the days are very long and my ears and mind can only process so much data when I am context […]

Merry Christmas and Happy Holidays!

Wishing you all a very Merry Christmas, Happy Holidays and Happy New Year!

An update on me. In October, I began working at Riot Games, the developers of League of Legends. It has been an amazing experience and has occupied the majority of my free time as has my dissertation. My New Year’s resolution this year is to dust the cobwebs off this blog!

Have a safe holiday season!

Here in California, I will be having Christmas in the Sand

A New Data Toy -- Unboxing the Raspberry Pi

Last week I received two Raspberry Pis in the mail from AdaFruit and just now have some time to play with them. The Raspberry Pi is a minimal computer system that is about the size of a credit card. In the embedded systems community, the excitement is for obvious reasons, but I strongly believe that such a device can help collect and use data to help us make better decisions because not only is it a computer, but it is small and portable.

For development, Raspberry Pi can connect to a television (or other display) via HDMI or composite video (the “yellow” plug for those still stuck in the 1900s haha). A keyboard, mouse and other devices can be connected via two USB ports. A powered hub can provide support for even more devices. There are also various pins for connecting to a breadboard for analyzing analog signals, for a camera or for an external (or touchscreen) display. An SD Card essentially serves as the hard disk and probably a portion of the RAM. The more recent Model B ships with 256MB RAM. Raspberry Pi began shipping in February 2012 and these little guys have been very difficult to get a […]

Adventures at My First JSM (Joint Statistical Meetings) #JSM2012

During the past few decades that I have been in graduate school (no, not literally) I have boycotted JSM on the notion that “I am not a statistician.” Ok, I am a renegade statistician, a statistician by training. JSM 2012 was held in San Diego, CA, one of the best places to spend a week during the summer. This time, I had no excuse not to go, and I figured that in order to get my Ph.D. in Statistics, I have to have been to at least one JSM. […]

OpenPaths and a Progressive Approach to Privacy

OpenPaths is a service that allows users with mobile phones to transmit and store their location. It is an initiative by the New York Times that allows users to use their own data, or to contribute their location data for research projects and perhaps startups that wish to get into the geospatial space. OpenPaths brands itself as “a secure data locker for personal location information.” There is one aspect where OpenPaths is very different from other services like Google Latitude: Only the user has access to his/her own data and it is never shared with anybody else unless the user chooses to do so. Additionally, initiatives that wish to use a user’s location data must be asked personally via email (pictured below), and the user has the ability to deny the request.The data shared with each initiative provides only location, and not other data that may be personally identifiable such as name, email, browser, mobile type etc. In this sense, OpenPaths has provided a barebones platform for the collection and storage of location information. Google Latitude is similar, but the data stored on Google’s servers is obviously used by other Google services without explicit user permission.

The service is also opt-in, that […]

SIAM Data Mining 2012 Conference

Note: This would have been up a lot sooner but I have been dealing with a bug on and off for pretty much the past month!

From April 26-28 I had the pleasure to attend the SIAM Data Mining conference in Anaheim on the Disneyland Resort grounds. Aside from KDD2011, most of my recent conferences had been more “big data” and “data science” oriented, and I wanted to step away from the hype and just listen to talks that had more substance.

Attending a conference on Disneyland property was quite a bizarre experience. I wanted to get everything I could out of the conference, but the weather was so nice that I also wanted to get everything out of Disneyland as I could. Seeing adults wearing Mickey ears carrying Mickey shaped balloons, and seeing girls dressed up as their favorite Disney princesses screams “fun” rather than “business”, but I managed to make time for both.

The first two days started with a plenary talk from industry or research labs. After a coffee break, there were the usual breakout sessions followed by lunch. During my free 90 minutes, I ran over to Disneyland and California Adventure both days to eat lunch. I managed to […]

My Interview about the Statistics Major

Recently, I participated in an email interview about what being a Statistics major entailed, how I got interested in the field and the future of Statistics. I figured this might be of interest to those that are contemplating majoring in Statistics, or considering a career in Data Science.

Q1: Why did you decide to pursue a major in statistics in college?

A: “When I was a kid, I really enjoyed looking at graphs, plots and maps. My parents and I could not make of what was behind the interest. At the same time, I was also heavily interested in education. My mother was a teacher and the first set of statistics I ever encountered were standardized test scores. I strived to understand what the scores attempted to say about me, and why such scores and tests are so trustworthy. When the stakes increased with the AP and SAT exams, I began reading articles published by the Educational Testing Service and learned a ton about how these tests are constructed to minimize bias, and how scores are comparable across forms. It fascinated me how much science goes into these tests, but in the end of the day they are still just one factor […]