Highlights from My First NIPS


The first few hundred registrations received a mug.

As a machine learning practitioner in the Los Angeles area, I was ecstatic to learn that NIPS 2017 would be in Long Beach this year. The conference sold out in a day or two. The conference was held at the Long Beach Convention Center (and Performing Arts Center), very close to the Aquarium of the Pacific and about a mile from the Queen Mary. The venue itself was beautiful, and probably the nicest place I’ve ever attended a conference. It’s also the most expensive place I’ve ever had a conference. $5 for a bottle of Coke? $11 for two cookies? But I digress.I attended most of the conference, but as someone who has attended many conferences, I’ve learned that attending everything is not necessary, and is counterproductive to one’s sanity. I attended the main conference, and one workshop day, but skipped the tutorials, the Saturday workshops and the industry demos. The conference talks were livestreamed via Facebook Live at the NIPS Foundation’s Facebook page, and the recordings are also archived there.

This may make some question why one would actually want to attend the conference in person, but there are several!

  1. to talk with the authors of interesting papers during the poster sessions;
  2. to meet up with likeminded people — a reunion of sorts. I had dinner with the LA Data Science crew;
  3. to be surrounded by likeminded people and perhaps get to meet some of the big names in machine learning, or people whose work has been valuable. During the week, I saw Yann LeCun, Ian Goodfellow, Hal Daume III, Judea Pearl etc. There were so many people at this NIPS that I did not see many others that I knew were present;
  4. As my friend Rob pointed out “THE WORKSHOPS!” Yes, the workshops are legit and are not recorded. You can also buy a special ticket for only the tutorials or only the workshops. That is something to keep in mind if your time is limited;
  5. The sponsor and employer expo can be useful for those looking for internships, full time jobs, or post-docs. Unfortunately, the opportunities were heavily focused on research fellow positions, and positions in research labs as a researcher, not the standard applied roles that I usually gravitate towards. This was a real bummer.
  6. There are also plenty of parties for the TensorBros. I kid. If convex optimization, TPUs and functional programming are too boring for you, you could have chilled with Flo Rida instead. Wait, who??

I usually write very long, drawn out blog posts about these conferences, but I am getting old, so I will try to just summarize some of the sessions and research I found the most interesting. It looks like it is just as long as usual.


Usually I nod off during the keynote and plenary talks as I tend to find them too general. Honestly, I think I found these talks to be the most interesting and motivating talks of  the entire conference. The presenters spoke more about issues facing the community without getting hung up on deep learning and particular ways of doing machine learning AI.

Ali Rahimi, this year’s winner of the Test of Time award, delivered an acceptance speech that earned a standing ovation, and it gave all of us a reality check about the direction of machine learning AI. He described a “self-congratulatory” aura in the AI community. He further likened our current deep learning discourse to alchemy and encouraged a return to rigor, which NIPS seemed to be quite religious about in earlier days. He seemed to take issue with Andrew Ng’s tweet, “AI is the new electricity.” My take is that we are currently in a hype cycle, one that I believe transformed from the “data science” hype cycle. I admit I have not embraced deep learning in my own work and Ali’s claim that we are treating AI as alchemy really struck me to the point that I feel a bit vindicated. I am not a proofs or theory person, but I cannot use methods that do not seem to have some sort of mathematical basis… and to use such methods for life changing decisions would be unethical and irresponsible. Yann LeCun posted disagreement to some of Ali’s points here.

Kate Crawford spoke about fairness and bias in machine learning models, and how many models are biased against particular groups because they are trained on data that is biased by preconceived notions about race, gender roles, and more. Her concern is that if we allow these biases to affect models that make life-changing decisions, machine learning will suffer negative backlash leading to another AI winter. Kate listed several examples, but a few of them stood out to me as very surprising. She noted that one study showed that when Googling a name that sounds African-American, Google’s ad server chose to display an ad for criminal background checks. To approach resolving the problem, Kate suggested building pre-release models and carefully studying how the model treats each subpopulation. This is something that is commonplace in the world of educational testing (I originally studied psychometrics), a field test procedure is always performed on new test items. If a particular subpopulation performs significantly better or worse than the others net of all other factors, the test item is dropped. This phenomenon can be described mathematically as differential item functioning (DIF). Anyway, back to Kate. What I appreciated about her talk is that the problem was clearly obvious to anyone that works in machine learning, but she went into a level of detail that we have not heard before.

Main Conference

The main conference was divided into two parallel tracks that started with 4-6 15-minute talks followed by 12-20 “spotlight” (i.e. lightning) talks of 5 minutes each. The tracks were: Algorithms, Optimization, Algorithms/Optimation (a 2 for 1!), Theory (goodness no), Deep Learning Applications, Probabilistric Methods Theory, Deep Learning and Reinforcement Deep Learning. The tracks were very blurred – I mean, the entire conference was theory and there is a lot of optimization involved with deep learning, so the main conference was sort of a grab bag involving a lot of walking back and forth between rooms depending on the topic… or for me, whether or not the air conditioning was on.

Most of the talks involved deep learning obviously. I found that the majority of applications focused on images, video and speech… the usual. I would love to see more talks focused on language/text, music and motion, though I am sure those are coming. There was some discussion about art and style transfer, which is cool, but, well, cool. There were a lot of interesting talks, but the one that stood out to me (and many others) was actually a 5 minute spotlight/lightning talk on interpreting models using a technique called Shapley Additive Explanations, or SHAP (paper, code). The method boils down to an importance score for each feature and each observation which can be studied after model prediction to determine why a particular observation was labeled as it was and which feature(s) was/were responsible. There was a similar talk focused on image processing, where a proposed algorithm would “highlight” parts of an image that “encouraged” the model to attach a certain label (such as the ears and nose for a dog).

Many, if not all, of the spotlight/lightning talks are associated with a poster and also have slides, code, and sometimes a video associated with it. Check the NIPS 2017 schedule to find resources for each poster.


The symposia seemed just seemed like another “main conference” track but with a panel discussion. I attended the symposium on Interpretable Machine Learning, which seems like a hot topic right now… but we statisticians have been doing it for years, and have stuck to the unsexy methods “regression” and “decision trees.” Many of the talks involved causality and interventions, which initially came out of left field to me, but makes sense in the grand scheme of things. If one can “prove” that causes y, interpreting models becomes easier. Although we can prove correlations, many are spurious and meaningless, and thus the model likely is not interpretable. This whole issue seems to have arisen from the medical community (my opinion/observation) as machine learning AI is being used more and more in medicine for diagnoses and recommendations. If we are going to deploy models that prescribe certain medicines or procedures, we (and doctors) need to be able to debug model errors or we will injure or kill many people. For machine learning practitioners, this “debugging” is conceptual and mechanical. For doctors, this debugging must be in terms of their original training… in other words, the model must be interpretable. Another area where AI-in-a-box can cause problems is with driverless cars. Honk, honk!

One talk I found very interesting in this session was a talk called On Fairness and Calibration (unfortunately I don’t remember which author spoke). The speaker basically rehashed the importance of looking at metrics other than accuracy such as true positive rate, false positive rate, true negative rate etc., particularly among subpopulations. He suggested analyzing performance across groups and looking at the gap between how we expect the model to perform for a subpopulation and what performance we actually observe. What was amusing to me as a statistician is that this paper basically “rediscovers” ROC and PR curves (calibration in general), hypothesis testing (observed vs. expected results), and the mixed effects model (different analysis for each group) used in statistics. Of course, the audience was not from statistics and it was a very impactful talk.

In statistics we are taught that interpretable models are extremely important. This is why some machine learning competitions on platforms such as Kaggle are bothersome for aspiring data scientists. The problem statements and datasets often encourage extremely complicated models that really have no meaning but seem to “just work.” I suppose these models are fine for products, but they are dangerous in high stakes situations.

The panel discussion showed that we have a long way to go in terms of interpretable models. Much of the discussion involved participant’s definitions of the word interpretability and statements of “it depends on your definition of interpretable.” I think this whole issue is going to end up being resolved by, “try to make models interpretable, but if you can’t, just don’t make them so complicated that nobody knows what it’s doing.”

I ended up leaving early and grabbing dinner with some fellow attendees. By the time I left, my ears had become completely numb to the words interpretable and interpretability — a cacophany of syllables that just ran into each other.

Friday Workshop: Machine Learning Systems

I’ve never been to a conference where there were 27 workshops going on at the same time, on the same day. Then there was another 26 the following day, all at the same time. This was a bummer because there were so many good ones to choose from. One might as well just throw their hands in the air and go to the one that had the best seating.

Since I mainly work as a machine learning engineer, and have experienced the usual issues building and monitoring machine learning systems, I decided to attend the ML Systems workshop. Of course, that’s not what the workshop was about, but it was still very interesting. Ion Stoica presented on Ray, a distributed execution system for AI and reinforcement learning. Another interesting talk was on DLVM, which is a compiler framework for creating neural network DSLs. There was also a series of talks giving updates on current AI systems: TensorFlow (project), PyTorch (project), Caffe2 (project), CNTK (project), MXNet (project), TVM (project), Clipper (project), MacroBase (project) and ModelDB (project). There was also a presentation about ONNX (project), an ecosystem for interchageable models that can be used across deep learning systems (which reminds me of the seldom used PMML). Most, if not all, of these systems are based on Python. Woof!

There were two very interesting talks that did not involve deep learning frameworks. Alex Beutel from Google presented on The Case for Learned Index Structures which focused on using the distributions of the data within an index to speed up common database operations such as selects, by presumably using percentiles and other statistical measures. The premise of Alex’s talk was that the B-Tree induced by the index can be considered a machine learning model over an assumed uniform distribution. Further work is required for data that changes over time. Virginia Smith presented on Federated Multi-Task Learning (paper) which discussed a framework for building models from data provided by several heterogeneous devices all which have their own failure rates and communication limitations.

Takeaway Lessons and Learnings

I had a good time at NIPS, but because I have not yet embraced deep learning, I did not dive into it as much as I could have. My first learning is that I can no longer put off reading Ian Goodfellow’s book and some of the other deep learning books I’ve collected such as Josh Patterson’s book and Francois Chollet’s new book. NIPS is a very academic conference, and I do not believe I have been to this level of an academic conference before (and I’ve been to IJCAI, KDD, CIKM, and JSM). That is not a bad thing, but as a more applied person, I think KDD et. al. are more my cup of tea. With those conferences, I felt the application was the star, and that the methods and theory were discussed in detail as a means to an end. At NIPS, I feel the methods and theory are the star.

Image uploaded from iOS (1)

datascience.LA dinner at Padre. Source: Szilard Pafka


It was empowering being surrounded by the best minds in machine learning artificial intelligence that rise above the hype. By listening to the talks, following the discussion on Twitter and speaking with others, I could sense a level of frustration with the hype train and some of the charlatans that are going to crash the train if the community lets them. It was also great to meet up with several folks from the LA Data Science organizations. I am looking forward to going to WSDM, the ACM International Conference on Web Search and Data Mining, which is more my cup of tea, in February, in Los Angeles. I hope to see you there!

Long Beach Performing Arts Center$11 for 2 holiday themed cookies!The spectacular Hyatt Regency next to Rainbow LagoonQueen MaryRainbow LighthouseMain promenade along entrance to Long Beach Convention Center

Ph.D. Defense Post Mortem and Advice for Others

NOTE 1: This is part 1 in a series that will probably contain 3 or 4 parts. Then I will return to the usual data science etc. posts.
NOTE 2: This post was intentionally delayed until I received final approval on the submission of my final dissertation.

On March 14, I passed my final oral defense for my Ph.D. in Statistics.

It was the moment I had been waiting for. The moment I dreaded for so many years and the moment that I thought would never come. I had very few days and times to choose from. It came down to the 13th (bad luck?) and the 14th (Pi Day). My defense began at 10am and was supposed to take a max of 2 hours. I left home around 7am to be sure I arrived in enough time after what could be a long ride. I arrived at 8:45am after sitting in rush hour traffic for almost 2 hours. I printed color copies of my slides and four copies of my dissertation and set up each committee member’s spot nicely: one water, slides, dissertation, a plate and a napkin. I placed a big tray of color-sprinkled cookies in the middle of the […]

Some New Year Resolutions for (this) Data Scientist in 2017

I’ve never been very big on New Year’s resolutions. I’ve tried them in the past, and while they are nice to think about, they are always overly vague, difficult to accomplish in a year, trite, or just don’t get done (or attempted). This year I decided to try something different instead of just not making resolutions at all. I set out some professional goals for myself as a Data Scientist. So without further ado…

1. Don’t Complain about It, Fix It: Contribute to Open Source Software (More)

Open source software is only as good as its community and/or developer(s). Developers are human and typically cannot manage all bugs and feature requests themselves. My goal is to routinely contribute back to the community either with new features, or by fixing bugs that I discover. This not only helps the community at large, but also helps me as a software engineer. There is no better way to become an even better engineer than by wading through someone else’s code. While this is something I did all day every day at my $DAYJOB, I do it less while on my sabbatical.

Some of the projects I use the most and that I hope to contribute to are scikit-learn and […]

It’s Been a While

This past three years has really flown. It’s time for me to finally get back to my roots and also start blogging more, like I did previously.

My last post was about Strata 2013. During this time period, I was taking a break from working full-time to finish a Ph.D. dissertation that I had neglected during my previous two positions. I learned my lesson the hard way, never work externally if you want a Ph.D. in a reasonable amount of time! I quickly got my dissertation from an intro to the first 65 pages or so during this gap. I then received an offer from Facebook. I was ready to move to Silicon Valley and enjoy all the things I had been envious over for so many years: the perks, the culture of innovation and intelligence, and the technology community. This was an opportunity I could not pass, and the dissertation went on the back-burner for another two years as I spent the majority of my waking hours, both during the week and the weekend… and on holidays… coding into a frenzy. I was looking forward to living in a world where I was entrenched in the technology and data ecosystem. […]

Summary of My First Trip to Strata #strataconf

In this post I am goIing to summarize some of the things that I learned at Strata Santa Clara 2013. For now, I will only discuss the conference sessions as I have a much longer post about the tutorial sessions that I am still working on and will post at a later date. I will add to this post as the conference winds down.

The slides for most talks will be available here but not all speakers will share their slides.

This is/was my first trip to Strata so I was eagerly awaiting participating as an attendant. In the past, I had been put off by the cost and was also concerned that the conference would be an endless advertisement for the conference sponsors and Big Data platforms. I am happy to say that for the most part I was proven wrong. For easier reading, I am summarizing talks by topic rather than giving a laundry list schedule for a long day and also skip sessions that I did not find all that illuminating. I also do not claim 100% accuracy of this text as the days are very long and my ears and mind can only process so much data when I am context […]

Merry Christmas and Happy Holidays!

Wishing you all a very Merry Christmas, Happy Holidays and Happy New Year!

An update on me. In October, I began working at Riot Games, the developers of League of Legends. It has been an amazing experience and has occupied the majority of my free time as has my dissertation. My New Year’s resolution this year is to dust the cobwebs off this blog!

Have a safe holiday season!

Here in California, I will be having Christmas in the Sand

A New Data Toy -- Unboxing the Raspberry Pi

Last week I received two Raspberry Pis in the mail from AdaFruit and just now have some time to play with them. The Raspberry Pi is a minimal computer system that is about the size of a credit card. In the embedded systems community, the excitement is for obvious reasons, but I strongly believe that such a device can help collect and use data to help us make better decisions because not only is it a computer, but it is small and portable.

For development, Raspberry Pi can connect to a television (or other display) via HDMI or composite video (the “yellow” plug for those still stuck in the 1900s haha). A keyboard, mouse and other devices can be connected via two USB ports. A powered hub can provide support for even more devices. There are also various pins for connecting to a breadboard for analyzing analog signals, for a camera or for an external (or touchscreen) display. An SD Card essentially serves as the hard disk and probably a portion of the RAM. The more recent Model B ships with 256MB RAM. Raspberry Pi began shipping in February 2012 and these little guys have been very difficult to get a […]

Adventures at My First JSM (Joint Statistical Meetings) #JSM2012

During the past few decades that I have been in graduate school (no, not literally) I have boycotted JSM on the notion that “I am not a statistician.” Ok, I am a renegade statistician, a statistician by training. JSM 2012 was held in San Diego, CA, one of the best places to spend a week during the summer. This time, I had no excuse not to go, and I figured that in order to get my Ph.D. in Statistics, I have to have been to at least one JSM. […]

OpenPaths and a Progressive Approach to Privacy

OpenPaths is a service that allows users with mobile phones to transmit and store their location. It is an initiative by the New York Times that allows users to use their own data, or to contribute their location data for research projects and perhaps startups that wish to get into the geospatial space. OpenPaths brands itself as “a secure data locker for personal location information.” There is one aspect where OpenPaths is very different from other services like Google Latitude: Only the user has access to his/her own data and it is never shared with anybody else unless the user chooses to do so. Additionally, initiatives that wish to use a user’s location data must be asked personally via email (pictured below), and the user has the ability to deny the request.The data shared with each initiative provides only location, and not other data that may be personally identifiable such as name, email, browser, mobile type etc. In this sense, OpenPaths has provided a barebones platform for the collection and storage of location information. Google Latitude is similar, but the data stored on Google’s servers is obviously used by other Google services without explicit user permission.

The service is also opt-in, that […]

SIAM Data Mining 2012 Conference

Note: This would have been up a lot sooner but I have been dealing with a bug on and off for pretty much the past month!

From April 26-28 I had the pleasure to attend the SIAM Data Mining conference in Anaheim on the Disneyland Resort grounds. Aside from KDD2011, most of my recent conferences had been more “big data” and “data science” oriented, and I wanted to step away from the hype and just listen to talks that had more substance.

Attending a conference on Disneyland property was quite a bizarre experience. I wanted to get everything I could out of the conference, but the weather was so nice that I also wanted to get everything out of Disneyland as I could. Seeing adults wearing Mickey ears carrying Mickey shaped balloons, and seeing girls dressed up as their favorite Disney princesses screams “fun” rather than “business”, but I managed to make time for both.

The first two days started with a plenary talk from industry or research labs. After a coffee break, there were the usual breakout sessions followed by lunch. During my free 90 minutes, I ran over to Disneyland and California Adventure both days to eat lunch. I managed to […]