NIPS 2017

I'm continuing my tradition of summarising conferences I attend. Previous posts: NIPS 2016, NIPS 2015, AAAI 2016, ICML 2016. I also went to AAAI 2017 to present my work on unitary recurrent neural networks, but didn't write a summary.

This was my third time attending NIPS, but my first time attending NIPS with jetlag. The advantage of jetlag is that it provides a topic of small talk less agonisingly self aware than the weather (weather readily avoided by waking up at 6am). The downside of jetlag is me standing glassy-eyed in front of a poster, trying to formulate intelligent thoughts but just yawning really, really obviously.

After a few days of complaining about the jetlag I realised I was probably exhausted because NIPS is exhausting. The problem is early mornings, listening to talks, bumping into people I know, talking to people I don't know, having meetings, talking to recruiters, talking over dinner, going to poster sessions, talking at posters, finding people who I had previously talked to as strangers but who are now acquaintances and talking to them again, and so on. Having gone twice before did not teach me moderation, and I was hoarse by Thursday. I also experienced an interesting fluctuation in my desire to do research, which I have depicted in the following graph: (enthusiasm has since returned, luckily)

Figure 1: We observe that research enthusiasm of the PhD student is a nonlinear function of Days of NIPS (dNIPS), with two local maxima attained towards the ends of day 3 and 5. Data beyond day 6 could not be reliably collected due to hostility from the test subject.

This analysis clearly indicates that the optimal length of NIPS (including tutorials and workshops) is three days. Recent work (private communication) suggests that "taking breaks" can prolong the research-excitement peak, sustaining the individual beyond the end of the conference, providing hope for 2018. When I got back to Zurich I slept for 7 hours, arose for 7 (during which time I did a roller derby exam, but that's another blog post), then went back to bed for another 10. My body had no idea what was going on.

As in 2016, I'll organise this by topic. This post is rather long, but each section is largely independent so feel free to pick and choose.

Tutorials

The first day of WiML actually coincided with the tutorials, so I was only able to attend those in the morning. I went to A Primer on Optimal Transport. I then got Baader-Meinhof'd about it for the rest of the conference.

I was twenty minutes late to the tutorial. This is decided to commute to the conference on roller skates (see frame from snapchat selfie video, right), and on the first day I misjudged how long it would take (my Airbnb was about 3 miles away).

Unfortunately missing the start of a tutorial, especially a mathematical tutorial, can be fatal. I arrived in the middle of an explanation of how Kantorovich's formulation of optimal transport relates to Monge's formulation and I had no reference for what was going on. I tried to sneakily look things up on Wikipedia to catch up, but all was lost and I came away from the tutorial with only an intuitive understanding of the optimal transport problem, and that Wasserstein barycentres are better than l2 averages for combining distributions, usually. In case you missed it, here are the slides (pdf). I said to myself that I'd go and learn optimal transport real quick and give a coherent summary of it here, but I also want to write this post before I graduate.

Women in Machine Learning Workshop

WiML took place on the tutorial day (Monday), and also on symposia day (Thursday). I am not sure why they split it up like this.

Last year I said that 15% of the 6000 NIPS attendees were women. I don't recall them releasing statistics about attendee demographics this year, but apparently 10% of unique authors amongst submissions were women (amongst accepted submissions? unknown), so the gender situation in ML is still pretty dire. Fixing this is a hard problem and not really my area of expertise (except for what I know from invariably being involved in conversations about Women in STEM), but I'm pretty sure events like this help. Why do I think that? Well, this year was the first instance of the Black in AI workshop, and while I didn't attend (I was at the healthcare workshop), even seeing people tweeting about it made me way more aware of the work being done by Black researchers. So hopefully WiML also alerts people to the existence of good work getting done by women. Oh, and travel grants! I could imagine in this era of NIPS-selling-out-rapidly that pre-purchasing tickets to redistribute to minority groups could also play a part in promoting diversity. Weird to think of women as minority group, but apparently we only comprise 49.58% of the world's population these days.

Interesting talks/posters:

Regarding the poster sessions, I spent all of the Monday session presenting my poster (see the Healthcare workshop below), and much of the Thursday session talking at my friend's poster (Learning the Probability of Activation in the Presence of Latent Spreaders) and sneaking peeks at the Interpretable Machine Learning symposium - video of a panel session here, and video of the debate here.

Roundtables

As in previous years, the roundtables were one of the highlights of WiML for me. It's a great opportunity to meet senior scientists I might not otherwise be able to, and also to get to know some of the other WiML attendees.

I ended up going to four tables - two career-based, two topic-based:

The Main Conference

Invited Talks

Long Beach in December: not bad

John Platt spoke about Powering the next 100 years (video), which was less environmentalist than I was hoping, and more about economics (also important, less exciting). He also spoke about nuclear fusion, which is very exciting, and possibly important (in the future). One issue I had with the premise of this talk is that I don't think we should be trying to expand US power usage to the rest of the world - the US uses disproportionately much energy relative to other developed nations (even with high standards of living, see also the 2000-watt society), so while it would be nice if we could, I would personally rather focus on minimising our energy consumption until it is sustainable to consume more. But anyway, assuming the premise, they use machine learning to optimise both the economics of power usage, and for identifying promising (and safe) experiments to run on fusion reactors.

I missed Brendan Frey's talk about reprogramming the human genome, and also Ali Rahimi's talk for the Test of Time Award. I sorely regret missing the latter talk because people kept asking me about it. I had to wait until I got back to Zurich to rectify the matter, but having now watched it (available here), I get the fuss.

So, regarding Rahimi's talk: Yann LeCun quickly posted a response, and Ferenc Huszár posted another response, and I should make a separate blog post to add my incredibly important opinions on the matter, but I'll just cram them right in here. Ali Rahimi's talk claimed that much of machine learning these days is alchemy - people are building models that work, seemingly by magic, which we don't quite understand. As a relative newcomer (remember, only my third NIPS) I can't hark back to any golden days of rigour and understanding, but I can certainly say that the things he suggested - simple experiments, simple theorems - are appealing.

My take: We should not make unsubstantiated claims in science. We should design experiments to test claims we make about our models, and we should not accept speculatory claims from others as fact. How often do papers today fail by these measures? Rahimi's talk implies this happens often enough to be worth calling out. I feel like I have read papers which make unsubstantiated claims, or over-explain their results, or introduce poorly-defined concepts, but I can't recall any to mind, so my claim must remain purely speculative.

What really resonated with me from Rahimi and also Huszár's points is that empiricism does not imply lack of rigour. A lot of what I do is quite empirical. A lot of what I do is somewhat applied. I've struggled with feeling like it's less scientific as a result. I've felt like I am "just" doing engineering. But the best way I have come to understand this work, which was captured in this point about empiricism, is that rigour does not need to be mathematical (forgive me, I am a former theoretical physicist, so this has taken me some time to realise). Experimental design is also rigorous when done well. Building a model to solve a problem may be a kind of engineering, but trying to understand it afterwards, forming hypotheses about its behaviour and then testing them - this can, and indeed should, be done rigorously. Otherwise, you show that a model exists which can achieve a certain performance on a certain task on a certain dataset, and little else.

The next talk I actually attended was The Trouble with Bias from Kate Crawford (video here). This was a great talk, and I'm glad it got a prime spot in the program. Not only was her public speaking skill commendable (the slides just vanished near the end and she barely skipped a beat), but the talk was really interesting. I admit I was worried I'd already know most of the contents, since I read things about bias on a semi-regular basis (somehow). Even if I'd known everything she was going to say (which I didn't), I'd consider this talk a good distillation and overview of the pressing issues. She made an illuminating distinction which I shall now paraphrase.

When it comes to bias, there are harms of allocation and harms of representation. Biased allocation is easy to see - someone got a loan someone else didn't, someone got bail and someone else didn't, etc. These are concrete and tangible, immediate, and easy to quantify. Representation on the other hand relates to impressions and stereotypes. Google searches for 'CEO' returning all white men is a representational bias, and its effect is much harder to measure. Images of Black people being labelled as 'gorillas' is representational bias and while clearly hurtful, the impact of allocation is not immediately obvious. Many people generally accept that this kind of representation is bad, but can we blame it for any particular instance of allocation bias? Usually not. Representational bias is diffuse across culture, difficult to measure, and may not have any immediately obvious impacts. An example from me: We as a society are starting to suspect that something about how women are represented in society may be influencing the rates of women going on to study STEM subjects. This representational bias may be slowly manifesting as a tangible absence of female engineers, but it is difficult to formalise or prove that these observations are causally related. And of course, machine learning algorithms (like literally any algorithm) can be biased in either of these ways (and presumably more). Once again: watch the talk.

Pieter Abbeel spoke about Deep Learning for Robotics - really, (deep) reinforcement learning for robotics. Probably the most important takeaway from this talk was the 1 second clip of Dota 2 1v1 mid he showed, establishing an important moment in both Dota 2 and NIPS keynote history. The non-Dota content of the talk was largely focused about meta-reinforcement learning, or 'learning to reinforcement learn', and architectures to achieve this. The idea is that you want to build agents which can adapt quickly to new environments, as humans do. One interesting idea was 'Hindsight Experience Replay', which assumes whatever ended up happening was actually the goal, and deriving reward from that.

Reinforcement learning agent re-evaluating its experience.

This converts the usually sparse reward in RL to plentiful reward signals, given the Q-function is augmented with a notion of a goal. He used the cake metaphor that everyone loved from Yann LeCun's keynote at NIPS last year, converting the cherry on top of a cake to multiple cherries on a cake. People can't get enough of the cake joke. It's Portal all over again.

I missed the talks from Lise Getoor, Yael Niv, and Yee Whye Teh because there is only so much time in a day.

Spotlights and Orals

First, a brief rant.

I was quite impressed by the quality of the spotlights and orals this year. Coming from the rather low bar of 'mumbling at a slide covered in equations' of previous years, I was glad to see that many presenters really put time into preparing their talk. These talks give people the opportunity to explain their work to potentially thousands of fellow researchers, so giving a terrible talk is insulting both to the audience and to the people who didn't get that opportunity.

I've thought about the implications of having an additional selection process for determining orals and spotlights. There's a trade-off between highlighting really good papers (with possibly terrible speakers) and highlighting less meritorious work (with a good communicator). There's also a challenge of being fair to non-native English speakers when assessing presentation quality - it would not be acceptable to condemn a talk on the basis of the speaker's command of English.

I try to assess talks by how much they have considered the audience - considering what the audience already knows, what may be obvious (or not, usually), what the really important things in the work are, and what can be skipped without degrading the story. But how to do this without (subconsciously) judging the fluency of the speaker's language and delivery is not entirely clear. I'm sure there is already bias in how the quality of one's English influences paper acceptance (either through clarity or unknowingly discriminatory reviewers), so adding an additional layer on the presentation quality may exacerbate the issue. On the other hand, communication is really important for scientists, and the conference should do what they can to ensure the content is high quality. Maybe some sort of (optional) pre-conference speaking workshop for those invited to give orals and spotlights?

Ranting aside, a selection of talks I took note of:

Posters

The quiet poster hall on Monday morning.

My experience of the poster sessions suffered the most as a result of jetlag, so I ended up looking at far fewer posters than I would have liked (even accounting for my eternally overambitious plans for poster sessions). This was also the first year where I got invited to ~cool parties~, so I went to some of those, too.

The hall for the posters included what seemed like gratuitous space between rows, but it filled up rapidly (the crowd at the Capsules poster was sizeable). I admit I always think about roller derby these days when I'm trying to get past crowds of people, but hip checking strangers isn't a great way to do poster sessions (I assume).

My poster session strategy is the following:

A humble plea to poster presenters: please don't stand directly in front of your poster while you're talking about it, I can't see and I don't want to get so close to you that you start talking to me.

Here's a little caveat about this part of the blog post: I didn't visit all these posters. I'm just taking the opportunity to mention more interesting papers.

Example of a typical "meme"

Machine Learning for Health (ML4H)

I speculate that they're moving away from the previous acronym (MLHC - machine learning for health care) due to a collision with the MLHC conference (previously abbreviated MUCMD). Apparently MLHC (the conference) will be in Stanford in 2018, which is a shame because I feel I should attend it, but I really didn't enjoy travelling to/from California for NIPS. Also, I think conference organisers should be avoiding the USA (or any other country with restrictive or racist visa policies) if at all possible right now.

Anyway. The workshop, unrelated to the MLHC conference, was an all-day affair on the Friday of NIPS. There were all the usual things: invited talks, spotlight talks, (frustratingly short) poster sessions, and people crammed into one room for 8 hours. I missed the panel because I was stuck at lunch, and I missed Jure Leskovec's talk because I was ~ networking ~. For the rest, I took some notes.

Talks:

As I mentioned, there were two poster sessions. I spent the first one presenting my poster, and much of the second one talking to people, so I didn't get to see too many posters. I've described a lot of work from other people in this post, so let me do the same for myself. At WiML and ML4H I was presenting (variations on) this poster: (right)

Summary of the related paper:

I held a small reading group in my lab about interesting contributions from the ML4H workshop, so I'll briefly summarise two papers of interest to me:

Conclusion

This has been an exceedingly long blog post and I hope you're not as exhausted as I am, but this is basically an accurate depiction of my experience of NIPS. A lot of stuff, all the time. I have not even mentioned the Bayesian Deep Learning workshop. During the lunch break on the final day I grabbed a burrito and almost fell asleep. I was not the only one. The convention centre by that point was gradually emptying, with scattered people dozing off in chairs, and a prominent left-luggage zone where the registration tables had been. There was a clear sense of winding down, perhaps because the process had already begun for me. I stayed only briefly at the closing party (missing some unpleasantness, it sounds like), and instead walked/skated thoughtfully back to my Airbnb along the beach, pausing to look at the stars and listen to the Pacific Ocean.