NIPS 2016

We return for another installment of Stephanie Summarises a Conference. My previous work in this area is NIPS 2015, AAAI 2016, and ICML 2016. I was pleasantly surprised at NIPS to be asked if I was going to write one of these again. Apparently someone somehow found my blog. Ignorance of this is one of the downsides (??) of not having creepy tracking analytics.

This time we get a table of contents so I can be guiltlessly verbose (I fear how long my PhD thesis is going to be):

Women in Machine Learning Workshop

"What are women and how can machine learning stop them?"

I didn't register for WiML in time last year, so this was my first time attending. I also managed to miss all the Sunday events by arriving to Barcelona at midnight that night. There was a workshop on Effective Communication where I could perhaps have learned how to write shorter blog posts.

My feelings about having 'woman-only/woman-centric' events are complex, poorly-understood and otherwise beyond the scope of this particular post, but the reality is that women are wildly underrepresented in computer science and machine learning is no exception (about 15% of the 6000-odd NIPS attendees were women, and I don't know what fraction of those were recruiters). I'm so used to being surrounded by men that I barely notice it (except for the occasional realisation that I'm the only woman in a room), so having a large conference hall full of women for this workshop was a bit surreal.

Interesting talks/posters:

I accidentally presented my poster for most of the poster session and therefore missed out on going around to others. This is a compelling argument for having co-authors who can share the load. For the record, the work I was presenting was Learning Unitary Operators with Help from u(n), which I did with my advisor Gunnar Rätsch, and which will be appearing in AAAI-17. I also presented it at the Geometry in ML workshop at ICML, see my post here.


What I found especially valuable and unique about WiML were the roundtables - one for research advice, one for career guidance. In each one there were subtables for specific topics, with 'experts' to extract wisdom from.

I shamelessly hogged space at the healthcare research roundtable in the first session to listen to Jennifer Healey. She's a researcher at Intel Labs working on using sensor data for human health. That is, if you have continuous audio recording (as one can get from a phone microphone), you can identify a person coughing, measure qualities of it, its frequency, onset and so on. This information is incredibly valuable for making diagnoses and treatment decisions, and it's the kind of data that one could reasonably imagine everyone collecting in the future. One thing I really enjoyed about the discussion was that she was quite aware of the HORRIFYING PRIVACY IMPLICATIONS of this kind of data, and the need to avoid storing (and calculating on) this data on The Cloud. I'm really excited about this avenue of healthcare (as I say every time it comes up) and I'm really glad to hear a senior researcher from a big company talking about the importance of the privacy considerations. As was mentioned in the ML and Law symposium, all personal data you collect is a privacy vulnerability. But collecting this data could have such massive positive healthcare implications that 'solving' the privacy problem is really important. Especially if the data is going to end up getting collected anyway...

The second roundtable I went to (about careers/advice), I spoke to some people at Deepmind about working there (me and everyone else at NIPS, it feels like...), and some other people about how to decide between industry (that is, industrial research) and academia. Both experts at the industry/academia table were in industry, so I'm not sure I got an unbiased perspective on it. The context for all of this is that I'm a 'late-stage' PhD student (the idea of that is rather scary to me - there's still so much to learn!), so I'm looking for internships (got any spare internships? contact me) and thinking about post-PhD land. The most concrete difference I learned about was that in companies, you may need to send your paper to the legal team before submitting it to a conference, in case they want to patent something first. I'd imagine this also applies to preprints and code and so on. Otherwise, the level of intellectual freedom one enjoys seems to vary, but everyone I spoke to (from a biased sample) seemed largely unconstrained by their industrial ties.

I'd imagine there's a gulf of misery between brand-new startups that have yet to become overly concerned with Product, and established tech companies with the luxury of blue-skies research labs, where you don't get to do cool things and instead must live in a box desperately trying to demonstrate the commercial viability of your research. I'd also imagine that said box-dwellers don't attend roundtables (how do you fit a round table in a square box?).

The final notable thing that happened at WiML was me apparently winning a raffle, but being shamefully absent. I was upstairs charging my laptop and catching up with a friend from MLSS, blissfully ignorant of the prize I would never receive.

The Main Conference

Invited Talks

The main conference opened with a talk (the Posner Lecture) from Yann LeCun. LeCun is famous enough in machine learning that people were excitedly acquiring and then sharing selfies taken with him (a practice I find puzzling), so the things he said will likely echo around the community and I need not repeat them in detail here. In gist he was talking about unsupervised learning (although focusing on a subtle variant he called 'predictive learning'). He used a cake analogy which spawned parodies and further cake references throughout the conference/social media. The analogy is that reward signals (as in reinforcement learning) are the cherry, labels for supervised learning is the icing, and the rest of the cake is essentially unlabelled data which requires unsupervised learning. The growing importance of unsupervised learning is not new, I can say from my intimidating one year of previous NIPS conferences.

Marc Raibert from Boston Dynamics gave an entertaining talk about dynamic legged robots. This featured many YouTube videos I'd already seen, but was happy to gormlessly rewatch. One amusing thing is the fact that they can't use hydraulics in domestic robots, because they leak. That's a great example of a real-world problem. It might be common knowledge amongst roboticists, but 'you can't use hydraulics because nobody wants oil and stuff on their carpet' would not have occurred to me if I for some reason needed to design a robot. Now, maybe I would not need to design a robot directly, but it's not entirely unlikely that I could design an algorithm making assumptions about the kinds of movements, or the cost of those movements, that a robot could make. And this is why 'domain experts' will always be needed. Probably.

At the end of the talk, someone asked if Boston Dynamcis uses machine learning. They do not. Maybe they should?

Saket Navlakha spoke about 'Engineering Principles from Stable and Developing Brains'. Part of this talk was based on this PLoS CB paper where they compare neural network development in the brain to that of engineered networks. In brains, connections are created rapidly and excessively, and then pruned back over time dependent on use (they demonstrate this in mouse models). This is to be contrasted with engineered networks, where adding and removing edges in this way would be seen as wasteful. They demonstrate however that the hyper-creation and then aggressive pruning results in improved network function. They're particularly interested in routing networks, so the applicability to artificial neural networks is not immediately apparent.

Susan Holmes gave the Brieman Lecture, which exists to bridge the gap between the statistics and machine learning communities. This was the single talk of the conference where I took notes, because the relevance of the topic to me and others in my lab overwhelmed the need to preserve precious limited laptop battery. The title of the talk was "Reproducible Research: the case of the Human Microbiome", and so was mostly a story about how to do reproducible research, in the context of microbiome analysis. One really cool thing she mentioned was a web application called shiny-phyloseq, which seems to be an interactive web interface to their phyloseq package. However, it also (I think) records what you do with the data as you explore, which you can then export as a markdown file to include with your paper. I try to emulate this by pipelining my analysis in bash scripts (or within python), but having something to passively record as you interactively explore data seems additionally very beneficial. The garden of forking paths is a risk during any data exploration. Also, the garden of forgetting exactly what preprocessing steps you did.

There was a touching memorial to Sir David MacKay during one of the sessions. It's easy, as an early-stage scientist, to get swept up in the negative aspects of academic culture (looking at you, Publish or Perish) and lose sight of the reasons for doing any of this. Hearing about scientists like MacKay, who both think and care deeply, is genuinely inspirational. The only book on my Christmas wishlist this year is "Information Theory, Inference, and Learning Algorithms".

Interesting Papers/Posters

Necessarily, a subset of the interesting work.


Reinforcement Learning

Recurrent Neural Networks

Machine Learning and the Law Symposium

I was and remain to be confused by the choice of symposia. The options were: Deep Learning, Recurrent Neural Networks, and ML and the Law. RNNs aren't deep? What was the DL symposium covering? Deep but Not Recurrent Learning? Weight-Sharing Is OK but Not Over Time, Never Over Time? As evidenced by the title of this section, I didn't attend either of them, and I also didn't attend enough of the Counterfactual Reasoning workshop on Saturday to say what would have happened if I had gone to them, but there seems to be a naming/scope issue here. Whatever it was, the RNN Symposium was Hot Shit and had to switch rooms with us ML+Law people during the lunch break. As soon as the room change was announced, people started appearing at the fringes of the Law symposium and may have been inadvertently exposed to some meta-ethics. I'm not sure how this planning error occurred - it is natural to assume that most of the growth in NIPS attendance is coming from DEEP LEARNING, which should (??) include RNNs, so that symposium was likely to be popular. Maybe they thought enough people would go to the other DL symposium.

The real question is - did non-DL non-justice machine learners feel cheated of a symposium? Am I wrong to try to place the RNN symposium inside the DL one?

Having just published a paper (arguably) about RNNs, I should have gone to the RNN symposium, but I can't resist thinking about the broader social impact of machine learning. I've also found myself thinking about morality and justice (and therefore law) more than usual lately, so I had to attend this. Discussions of normative ethics at a machine learning conference? Yes.

I'd consider this symposium a law-oriented follow-on to the 'Algorithms Among Us: the Societal Impacst of Machine Learning' symposium at NIPS 2015 (see my summary here. Having a focus is good. The impacts of machine learning on society are widespread, so trying to cover too many all forces a shallower treatment. High level talk is well and good, but getting stuff done requires being specific. This is actually a point that was raised during one of the panel discussions: how do we balance the need in computational science to formulate very specific, quantified definitions of things (like discrimination) with the requirement of margin of interpretation in law? I was surprised, as a non-lawyer, to hear that such ambiguity could be tolerated, much less desired. The example given for this was in discussions where compromise may only be attained through baking some ambiguity into an agreement, which would then (I suppose) later be argued over as necessary. This leads to another point which was made - law is not a monolith, laws are not absolute immutable statements - law is a process, an argumentative tradition (at least in the US), evolving and iterating and requiring justification at all times (get it - justice pun!). How to integrate algorithms into this process is not as simple as treating them as Truth Functions (shout out to my main man Wittgenstein) on Evidence ... or is it? I get ahead of myself.

Legal Perspectives

Technical Perspectives

There were more talks, but I was drifting into the semi-delirious pre-fever stages of the Conference Flu at this point.

Panel Discussions

The discussion spotlight was 'Regulation by Machine' from Benjamin Alarie. A question - how to use AI to make better laws? My notes are sparse but a recurring theme (also in MLHC) is that we should use machine learning to help and augment humans, not to replace them. So he was speaking about using ML to - for example - help to predict if it's 'worth' taking a case to court. Apparently many cases go to court which are 'overdetermined given the facts', and it's somewhat easy (citation needed) for an algorithm to identify which these are.

My notes on the actual panel are sketchy at best. It may have been the time or how sick I was but, it felt like people were saying a lot of interesting things without obvious argumentative structure or direction, so it's hard to summarise any salient points. Here are some decontextualised, paraphrased snippets:

And a final shoutout to Chief Justice John Roberts is a Robot - Ian Kerr and Carissima Mathen.

Machine Learning for Healthcare Workshop

With the caveat that these are workshop contributions, here are some interesting papers/posters (with accompanying arXiv papers, so I have a chance to remember anything about them):

Mandatory shout-out to my contribution to the workshop: - Neural Document Embeddings for Intensive Care Patient Mortality Prediction - Paulina Grnarova, Florian Schmidt, Stephanie L. Hyland, Carsten Eickhoff - we used document embeddings to predict patient mortality in MIMIC-III, purely using text notes. The embedding procedure uses two layers of CNNs - word vectors are combined into sentence vectors (with a CNN), and sentence vectors are combined into patient vectors (with a CNN), and we use target replication to improve predictive accuracy. This was fairly preliminary (there are many other factors to consider, as ever), but we beat previous work using topic modelling on the task, which is encouraging, and perhaps unsurprising given LDA's inability to deal with multi-word phrases.

This is only a snippet of the interesting work presented at the workshop. I unfortunately came down with Conference Flu about half way through NIPS, and was at my sickest during the MLHC workshop (ironically), so I didn't get to speak to as many poster presenters as I would have liked.

Miscellaneous Comments/Observations


I feel less obviously exuberant about NIPS than I did last year, which I attribute to a combination of having been (and continuing to be somewhat) ill, and being in the development stage of several new projects where I just want to be getting stuff done.

As I've mentioned before, I think about approaching research in an exploration-exploitation framework. At this NIPS I realised that even within the exploration mode, one can explore exploitatively. That is, you can distinguish between diversity-increasing exploration (seeing areas of the state space/field you've never been in before) and depth-increasing exploration (refining your knowledge of partially-explored states/topics). The latter is arguably a kind of exploitation, because it's exploration with the aim to increase knowledge of things you are intending to use later. You hope.

Bringing this strained analogy back to conferences, this makes the difference between going to talks on things you already sort of know and going to totally new topics. I tried a bit of the latter, because chances are I'm going to read papers relevant to me regardless, but I found spotlight talks suboptimal for learning new ideas without sufficient background knowledge. An alternative approach would be to be incredibly exploitative, pre-emptively read the relevant papers and then talk to the authors at the poster sessions. Perhaps next year I'll be organised enough to do that, because unless you go to the tutorials, 15-minute talks of questionable presentation quality on cutting edge research are not good ways to learn new topics.

What is a good way to learn a new topic (personally), is to write about it. I've been working on a pedagogical post about sparse Gaussian process classification, which will be up next, after a brief diversion into roller derby.