apeirotope

dudley dodos: everyone can learn about the environment

2021-04-16T00:00:00+01:00

When I was born in 1989, the CO2 concentration in the atmosphere was 350 parts per million. 350 parts per million is also considered the safe upper limit for atmospheric CO2, giving rise to the climate group 350.org, which was founded in 2007. Today atmospheric CO2 is around 407 ppm. In 2007 it was about 380 ppm. When my parents were born, it was less than 315 ppm. You can read more numbers like this here.

1989 is also the year a "1.5/10" board game called Dudley Dodos was published. As recognised by BoardGameGeek reviewers, the game is essentially an "eco-friendly" trivial pursuit knockoff. I don't know how old I was or how often we played this game when I was a child, but it's lodged in my memory alongside The Animals of Farthing Wood and stacks of National Geographic magazines, filed under "reasons I ended up like this".

I was lucky to be in Dublin over Christmas this year and cracked open the box (my parents still have it, of course). The gameplay consists of answering environment-themed trivia questions and moving around a board, presumably for some end goal I no longer remember.

It has a Monopoly-esque mechanic of giving you random cards when you land on certain spaces. In Monopoly these cards say things like "Go back three spaces" or "You have won second prize in a beauty contest, collect $10". In Dudley Dodos they're called "environment" cards, and they're a little different. In this post I'm going to go through them all. Let's start with the nice ones.

The protagonist of the game is the eponymous Dudley Dodo, a dodo who has evidently emerged from hibernation to help make everyone more aware of our fragile environment. He wears cute boots and has a big heart on his chest.

The ban on CFCs was introduced by an international 1987 treaty called the Montreal Protocol, which came into force shortly before I was born. Incredibly, the treaty was brought into effect a mere ~14 years after CFCs were first discovered to be harmful to the ozone layer, and 2 years after a hole in the ozone layer was discovered over Antarctica. Today in 2021, the hole is still here, however its size has stabilised in the last 20 years or so, and there is evidence it is beginning to recover. You can look at it here.

and

Holes in the ozone layer, acid rain! It certainly feels like environmentalist discourse has moved on since 1989. I guess there was that mention of the greenhouse effect on the previous card, but that's probably not going to come up again.

Acid rain is one of those things I remember hearing about as a child (I wonder where from), and which doesn't get mentioned very often any more. I'm not the only one who's confused by this. According to this guy, acid rain was basically caused by emissions from coal-fired power stations. Since there are now ~~zero~~four coal-fired power stations in the UK, I guess it's just not a problem any more. To be more specific, in 1989 the UK was generating 67 Mtoe (Mtoe = millions of tonnes of oil equivalent) of energy from coal, whereas in 2017 it was 10.1 Mtoe and is presumably only going down. At least if there's one thing we can all agree on in 2021, it's that coal is on the way out. What's that, the UK government is building a new coal mine in Cumbria?

Love the cold war energy off this one. Beyond being an affront to humanity, the machinery of war contributes to the climate crisis and is terrible for the environment.

"The U.S. Department of Defense has a larger annual carbon footprint than most countries on earth. With a sprawling network of bases and logistics networks, the U.S. military is the single biggest emitter of carbon dioxide in the world aside from whole nation-states themselves." Murtaza Hussain citing Neta Crawford.

I don't know if the level of ambient warfare has decreased since 1989, but we seem to have avoided having any more world wars.

Okay, so we might still be losing forests but the rate of deforestation has decreased since 1990, and legally protected areas have been created, which currently shelter 18% of the world's forest area. The game comes across as a bit naive here - the biggest cause of permanent deforestation isn't our lust for fresh paper - it's the conversion of forest land to something else, like agricultural monoculture (beef, palm oil, soy). If you're reading my blog you probably already know this, but just to be extra clear - most soy is grown to feed animals, not humans. Leave me and my beer-battered tofish and chips alone. Speaking of fish...

In 1989, an estimated 86.9 million tons of fish were captured from the wild. Apparently yields have been declining ever since, and things basically look fucked - although if you want nuanced details, you could read this UNFAO report into sustainable fishing. On the upside, humpback whales have returned to South Georgia after losing over 90% of their population to commercial whaling in the 20th century.

A lot of people in the world depend on the sea for their protein... do you?

hahahahahaha next

Okay but seriously, the UN Framework Convention on Climate Change came into being in 1992 with its inaugural summit in Rio. In 1997 they created the Kyoto protocol which legally bound signing countries to reduce their greenhouse gas emissions. The US didn't participate in this because they are big babies (except that real babies would support climate action as a matter of survival). My subjective assessment is that governments have been making lots of claims in recent years ("net zero by 2050", "petrol car sales ban by 2030", "leopards will eat some faces, but not yours"). According to the Climate Action Tracker, current pledges put us somewhere near 2.1 to 2.6 degrees of warming by 2100, while current policies put us closer to 2.9C. It's too bad we won't live. But then again, who does?

An international ban on the trade of ivory came into force in 1989, after an exposé into the trade and the poaching of African elephants. The topic is more complicated than "ivory bad, therefore must be banned" - arguments have been made that legal, controlled ivory trade can be used to support elephant conservation efforts, and existing bans are undermined by ongoing illegal trade and (seemingly) widespread poaching. Unfortunately, from what I can gather the African elephant remains in decline. When the CITES ban came into effect in 1989, the population of African elephants was no more than 600,000 (down from twice that in 1979), and in 2016 it was estimated at 352,000.

On the upside, if elephants go extinct it brings us (humans) one step closer to being the largest, most important animal. Whales: we haven't forgotten you.

When this game came out, the internet was functionally nonexistant. Social media wasn't even a distant nightmare. I grew up alongside the internet (I have been online for over two decades now, and I will never log off), so it's hard for me to even think of a non-internet world. I take it for granted that I can have friends in different countries, that I can learn about things happening around the world in near-real-time, and access information on essentially any topic.

I don't think this is sufficient for a proper internalisation of "interdependence", however. Late-capitalist society effectively separates us from knowing about other people and countries. Modern supply chains for goods are complex and opaque. You may be able to learn where something you're purchasing came from, but this knowledge is not designed into the system. Intermittent scandals about slave labour, child labour, label fraud, and sweatshops operating countries they're not expected in underscore the reality that many people don't know much about how the things they own were produced. These details are not considered important to know.

Properly understanding the extent to which we are all interdependent requires more than being passively exposed to the internet at large. It necessitates (I think) more fundamental shifts in the emphasis of society. I don't know exactly what these would be, and I do not want to be misunderstood as having a romanticised version of a simpler past to which return is possible or desirable. But I do know that I want something that's better than this. So I choose to believe that the optimism of the game designers was well-placed, even if we haven't achieved ecotopia yet.

We have now come to the end of the positive cards. Many of the themes covered by the positive cards are also covered by the negative cards. I was going to write something snarky about this, but instead I quietly contemplated how often "wins" are really just avoided harms (less killing, less acid rain, more promises to avoid doing bad things). Is environmentalism a fundamentally conservative ideology? The answer, you may be surprised to hear, is "what does 'conservative' even mean? Let's agree on semantics before we have a discussion."

As far as I can tell, dumping nuclear waste in the sea has been banned since 1993 (or maybe earlier, or later) by the London Convention (or maybe Protocol). More confusingly, the environmental impacts of nuclear waste dumping seem to be not very clear and hard to measure. This is like the time (summer 2019) I wanted to learn about the impact of Chernobyl (original event, not 2019 HBO show) and found it killed between 60 and 60,000 people depending on methodology. This is not surprising given radiation has somewhat diffuse effects and causal inference is hard in general. Adding the literal diffusion of radioactive material in the ocean no doubt compounds the issue. In the absence of glowing barrels washing up on beaches to be picked over by three-eyed sea birds, I'm taking this one as a win! Thanks Dudley, you made us aware of our fragile environment.

As per the above, determining the direct impacts of nuclear fallout is challenging, but we can look at some examples.

Bikini Atoll is an atoll (a ring-shaped coral reef) in the Marshall Islands that was used extensively for nuclear weapon testing by the US between 1946 and 1958. A 2008 study (Richards et al., Marine Pollution Bulletin) indicates that about 70% of previously-observed coral species can still be counted at the site:

"The case of Bikini Atoll demonstrates that coral reef communities can recover from and exhibit resilience to major disturbance events. In this situation, the visible impact and recovery of the reefs from the anthropogenic impact of atomic testing can be compared to those following natural disturbance events such as cyclone/hurricane damage. Bikini Atoll’s reefs undoubtedly benefited from the post-testing absence of human disturbance, the presence of uninhabited and non-impacted neighbouring atolls,and a supportive prevailing hydrodynamic regime for larval import"

The Semipalatinsk Test Site, aka "The Polygon", is a (shockingly large) region in Kazakhstan that the Soviet Union used for its nuclear weapons testing. Overground testing ended in 1963, with underground testing continuing until 1989. Overall about 400 Hiroshima's worth of nuclear weapons were tested at the site, with seemingly little regard for human health. Here is a Nature news feature on the health impacts on people living in the area. The USSR set up a nondescript medical centre in the region for monitoring radiation exposure, the data from which now serve as a valuable resource for understanding the impact of long-term low-dose radiation exposure on people, as well as their children. Here are some photographs.

As an interesting aside, the US also tested some nuclear weapons at the Semipalatinsk Test Site in the 90s, for the purpose of calibrating instruments for detecting nuclear weapons tests. I can feel myself sliding into a tangent about nuclear weapons and I am going to claw back to the matter at hand: unrelentingly depressing environmentalism.

There are five species of rhino. According to the WWF, there are "over 20,000" white rhinos left (although, if you consider just the northern white rhino sub-species there are exactly 2 individuals remaining from that species), around 5000 black rhinos, "more than 3,500" greater one-horned rhinos, around 80 Sumatran rhinos, and around 60 Javan rhinos. That's it for the rhino count. They mostly live in reserves or zoos because people are obsessed with killing them for their horns. It's not uncommon, to want to kill animals for their body parts. But "hunting to extinction" seems like a bad business strategy.

On the upside, walruses don't appear to be at any immediate risk of extinction due to poaching.

(see above) Not any more, mostly!

DDT is one of those environmentalist things I hear mentioned a lot but don't specifically know much about, similar to glyphosate (except for this incredible video where Patrick Moore claims it's safe to drink before immediately refusing to drink it). DDT is an insecticide and a "persistent organic pollutant" (e.g. it does not degrade in the environment and will accumulate in organisms). This bioaccumulation makes it particularly nasty to apex predators, and apparently it was responsible for the near-extinction of the bald eagle in the USA. Its insecticidal powers makes it incredibly potent against malaria and typhus (both of which are spread by insects), and was seemingly used quite effectively to reduce or eradicate malaria in the US and parts of Europe.

DDT is famous I think in large part due to the seminal environmentalist book "Silent Spring" by Rachel Carson. I haven't read it because even the name of it makes me sad, but it seemingly kicked off a fairly widespread (in the US) movement against DDT, culminating in a 1972 ban on DDT for agricultural use. DDT is largely globally banned as of the 2004 Stockholm Convention on Persistent Organic Pollutants, but it's still used for vector control in some countries, albeit at declining rates. Continuing the trend of the bad cards being more optimistic than the good cards, DDT is mostly not a thing any more! I will not take further questions on persistent organic pollutants at this time.

See above re: "what happened to acid rain?". I went looking for an example of a forest ruined by acid rain, and I learned about the Black Triangle (distinct from the Polygon, which had an unspecified number of sides). The Black Triangle is a famously polluted region in the borderlands between Germany, Poland and Czechia. Up until 1989, the area was characterised by heavy industry powered by locally-strip-mined brown coal (the worst coal), as well as uranium mining, and concomitant environmental devastation and impact on human life. Allegedly since the dissolution of the USSR there has been an attempt to restore the area. However, the Turów Coal Mine (a large open pit lignite mine in this area) just got its licence renewed until 2026 (if not longer), so it's not clear how seriously everyone is taking this idea. On the upside, there are campaigns to stop this dreadful mine.

An aside about brown coal: also known as lignite, this is some terrible-quality coal. It's got a relatively low carbon content, high moisture content, and high ash content. I'm not a coal scientist (although I may be by the time I finish this post), but all signs point to low quality product. Its heat content is low enough that you have to burn a lot to get any useful power out of it, it's hard to transport, and it's really bad for people. Lignite was also used extensively in East Germany for energy sovereingty, and Germany to this day remains its largest producer.

Musical interlude about (children) working in coal mines: School Day's Over - Luke Kelly.

The earth's population has risen by ~2.5 billion since 1989, now standing at around 7.7 billion. The rate of growth seemingly peaked in 1963 at 2.1%, and has been decreasing somewhat since then, now currently around 1%. Forecasts indicate that we might settle out at around 11 billion people by 2100, assuming things don't deteriorate rapidly (thunderdome-style) over the next century. There are many things to be said about population growth in the context of environmentalism, and I've started just linking to this article when it comes up. It's not not an issue, but it's also not the main issue, and making it the focal point has a way of getting kinda racist real fast.

"The problem is extreme inequality, the excessive consumption of the world’s ultra-rich, and a system that prioritises profits over social and ecological well-being. This is where where we should be devoting our attention".

This isn't to say that farming and fishing haven't been under pressure. Looking just at meat: between 1989 and 2018, global meat production went from 172.8 to 341.2 million tonnes per year. This includes 69 billion chickens, 1.5 billion pigs, and 300 million cows per year. This ~100% increase is disproportionate to the population increase (~50%). A lot of this is driven by China going from a country with relatively low meat consumption per capita to something approaching European countries. China's consumption was ~22kg/person/year in 1989, and was ~61kg/person/year in 2017. Compare this to the UK which went from ~72kg/person/year to ~80kg/person/year over the same period. Surprising to me, the US has a much higher per-capita meat consumption than the UK, at ~124kg/person/year in 2017. I'm happy to report that Ireland's meat consumption has decreased since I was born, from ~86kg/person/year to ~79kg/person/year. While it's easy to blame China for its increasing meat-lust, considering per-capita consumption is important for remembering who the real villains are: Australians.

and

These go together because, as I mentioned already, agriculture (specifically cattle) is the leading cause of deforestation.

Extinction is hard for me, so I put this last. I don't know how to deal with the fact that hedgehogs are now at risk of extinction in Britain alongside water voles and red squirrels. I don't want to read about how a quarter of known bee species have not been sighted since 1990, or how melting glaciers are driving Alpine plants to extinction.

I distantly accept that extinction is a normal part of ecology and a necessary part of evolution. If nothing went extinct, the world would be teeming with goo or something (I don't know I'm not a biologist, leave me alone). But I still approach the thought of the dodo and the thylacine with a sense of grief. I look at the last photos of the thylacine as the last remaining fragments of a world we can never return to. Extinction to me, like death, is bound up with "never again". A door closing forever, a flame going out, something lost not to be found. Perhaps it is some immature refusal to accept the finiteness of life, a desire to run from endings and the loss of potential futures. I feel it when I think about an ancient forest being felled (you can't just grow that back!), or the (likely) loss of coral reefs, or the destruction of ancient cities and art, sacred sites, the loss of endangered languages. It is so much easier to destroy than to create. And humans are so, so good at destroying. The current rate of extinction is estimated to be 100-1000 times the "normal background" rate. We've built incredible technology to dredge the oceans and raze forests, to replace diversity with monoculture, to try to control nature, as if we were not ourselves part of it.

Gee, I wonder how I ended up like this.

And now for a message of hope:

Hopelessness is not an option. In the words of Rebel Steps:

Believe in yourself, trust one another, and get organised.

Good luck!

redesign

2021-01-04T00:00:00+00:00

I have updated the design of the website! This started off as 'drop in a new Pelican theme and be done' and grew into a micro-project to do while confined to my parents house over Christmas (pandemic etc). I also watched lots of medium-to-terrible TV (GYNOTOPIA, SERIOUSLY?), which is a good accompaniment to making tiny tweaks in CSS and seeing what happens.

For posterity, the previous version of apeirotope looked like this:

... which was an edited version of the medius theme, which was designed to imitate Medium. Post example:

The new version, also for posterity (the internet is a flimsy place for memories) looks like this:

Post example:

The new design is a modified version of the voce theme - my fork is here. You can do a diff between my fork and the original one for the most comprehensive account of the edits I made, but I'll describe them in English here. Also, for comparison, here's what apeirotope would look like in mostly-plain voce:

It's not really a fair comparison because the voce theme uses tags carefully and sparingly, whereas I use them freely, and instead use categories (which it doesn't really use) to categorise my posts. You can see a much better version of the voce theme on its creator's personal page.

Particular edits I made:

Obviously, get rid of the milieu of tags. I experimented with replacing them with my categories, but I didn't like the result. I settled with fully removing that section, and adding pages for Tags and Categories to the navigation bar.
Removing more elements, I got rid of the "logo" area and (not shown) the copyright stuff at the bottom. As far as I can tell from 10s of searching, those types of copyright declarations are mostly for scaring people off plagiarising your stuff, but aren't actually necessary to legally assert copyright. Following up with a further 20s of searching, I'd probably put a CC-BY-SA license on this site if I had to. I haven't had any legal issues in the 5 years or so I've had the website, so maybe copyright will be abolished soon and I won't need to think further about this.
It looks like I added an italic serif font, but he (Benjamin Lin, voce creator) seems to have done a nontrivial style update four days ago, which included removing it. This is funny to me because the previous update was in April, so I guess we were both spending part of our new years time editing the voce design. Hi Benjamin!
It's ideally not visible at all, but I ripped out a bunch of surveillance-flavoured internals, including: anything to do with google analytics, and (hopefully) all calls to external sources, including fonts, bootstrap CSS, etc. Webfonts are hosted locally for apeirotope. Possibly didn't catch everything and maybe my website is slower now because a weak little OVH VPS is serving everything. My guess is that this site doesn't do enough traffic for that to be a real concern, but given the lack of analytics/me bothering to look at nginx logs ever, I have no idea.
Introducing my "colour-coding categories" nonsense (which I already had in medius, but required rejigging given the new templates in voce). This was probably the most complicated part because I sort of understand CSS, but not well enough to readily diagnose why certain styles are overriding each other. Concessions had to be made, but I am fairly happy with the result.
Directly edited minified bootstrap CSS to remove a weird navy colour that kept appearing in my unflavoured links. Numerous other minor style edits, including removing that (admittedly lovely) turquoise colour, and making things generally smaller.
The tags page sorts by number of posts, not alphabetically. Nobody should look at the tags page, though. I have a link-rot issue where I don't want to modify categories (since they define absolute paths to posts), but it's open season on tags and some day I may do a pass on de-duplicating them.
The archives page (new!) only segregates posts by month if there are more than 4 in a year. I am most proud of this edit because it required writing some Jinja, which first required learning the word "Jinja" so as to upgrade my search queries from "if statement pelican".

In summary, this was NOT procrastination from generating other types of content for the website.

podcast review 2020

2020-11-29T00:00:00+00:00

One of the nice things about having your own website is that you can do whatever you want. Today I am going to recommend/describe some podcasts I listen to. I usually do this directly to whoever speaks to me after I find a new one, but now I can do it to people on the internet too.

This will be more of a review than strictly a list of recommendations, so I'm not saying all of these are great. But I did listen to them.

Warning: some of these podcasts will try to sell you a mattress.

Reply All is probably the most generic podcast recommendation I could give. It's slickly produced, it's reliably entertaining, it has interesting trivia and moments of heart and touches on contemporary issues in a way which is not especially challenging. You sort of know what you're getting, even if you don't know what it'll be about. I have a strange memory of listening to the first few episodes of this when it started back in 2014 and being like "what is this podcast supposed to be about". Then I stopped listening for 5 years and I still don't entirely know. It's just about... stuff, you know? Stuff on the internet. Episode rec: Bedbugs and Aliens
99% Invisible is another reliable recommendation. The host (Roman Mars) has a strange and specific way of speaking which subtly aggravates me, but the content is really great. It's about "design", which is pretty broad - they talk about cities, infrastructure, architecture, objects, history, etc. The stories are interesting and hopefully mostly factual and the urbanism bias is very much up my alley. Episode rec: Palaces for the People (spoiler: they are LIBRARIES)
The Adventure Zone is a RPG podcast, which is a genre of podcast where you listen to people play role-playing games (with some editing). TAZ features known podcasters "the McElroy family", who seem to be lovely people and also very funny. I listened to TAZ Amnesty, which was a Monster of the Week campaign. They managed to blend an interesting world, heartfelt story, and absurd humour. I understand there to be many RPG podcasts out there, but I have mostly only listened to this one. Episode rec: Episode 1 of Amnesty I guess.
Mythology (on the Parcast Network!) is dramatisations of stories from mythology, in that they get voice actors to play out stories from mythology. I'm not mythology buff and suspect this is some entry-level mythology content. Please do not @ me with mythology-related screeds. It's light-weight story telling, although the reuse of voice actors gets a bit confusing if you listen for too long at once. The oddest thing about this podcast is how I get the feeling it's made by aliens who want to blend in, but who are too enthusiastic about mythology to pull it off. I will not elaborate on what I mean by this. Episode rec: The Abduction of Persephone.
The Anthill is a podcast from The Conversation. Their sthick is getting academics directly to write articles (or be interviewed on a podcast) on topics of their expertise. The format seems to be multi-episode mini-series with long gaps, so it's not a "reliable weekly" sort of thing. Direct academic involvement means the quality of information should be pretty high, and they better not have lied to me. I've only listened to three of the mini-series (India Tomorrow, Conspiracy Theories, Recovery) and they were all great.

This concludes the list of reasonably generic podcasts I listen to. Now comes the more esoteric stuff, I guess.

Lingthusiasm is about linguistics. It's a podcast hosted by two linguists, talking about linguistics. The enthusiasm they have for linguistics is really charming. I don't listen to this one all the time because whenever I learn things about linguistics I forget things about computers, but it's another podcast where I will reliably come away having learned something. About linguistics. Episode rec: How to Rebalance a Lopsided Conversation or Sounds You Can't Hear.
Drilled is a "True Crime Podcast about Climate Change". It focuses on fossil fuel propaganda and it's absolutely enraging. I actually stopped listening to it because I don't like making myself so simultaneously miserable and angry, but I did find it genuinely very informative. The second season (about crab fisherman) was a very different sort of climate story to what I'm used to. Overall, it's got that true crime investigative energy and it will make you want to burn Exxon to the ground.
Maintenance Phase is about debunking wellness culture, health and weight-loss fads. It just started so I don't have a lot to go by, but the first few episodes have been strong. I find the discussion of dieting and diet culture very enlightening, because I'm a lucky person who eats whatever I want without thinking about it really. The hosts have something of a similar conversational style as Lingthusiasm, which is very enthusiastic in a way I feel is quite American, but neither of the hosts of Lingthusiasm are American, so that's on me and I need to think about that. Episode rec: Moon Juice.
Revolutions is my newest podcast addition and therefore the direct inspiration for this post. This is a history podcast about "great political revolutions" and has been a hard sell to my friends. I am shamefully ignorant of history and find myself going on occasional history splurges trying to catch up (I was really obsessed with World War 1 for a while after reading The Guns of August). It turns out that history is really interesting when it's told to you like a story, and not as a list of events to memorise. I've started right at the start of this podcast (in 2013) so am still learning about the English Civil War of 1642. I have good reason to believe there will be more revolutions to come.
Imaginary Worlds is about sci-fi and fantasy, with a focus (ostensibly) on world-building. I think it's broader than that though, and the episodes often touch on wider issues relating to the media, or more fundamental ideas which are explored in sci-fi/fantasty. I only listen when I've read/seen the thing, so I don't listen to this one all that often. I do love imaginary worlds, though. Episode recs: The Power of the Makeover Mage.

This concludes the list of reasonably generic esoteric podcasts I listen to. Now for the flagrantly political ones.

Mass for Shut-ins is the podcast version of Gin and Tacos, a somewhat snarky political scientist who writes about American politics. I mostly view the podcast as the extension to his blog, which I've been reading on and off for many years now. I find what he has to say about the machinations of American politics interesting, e.g. FDR's Court Packing Scheme.
Citations Needed is about "the intersection of media, PR, and power". This is a very well-produced podcast that will consistently point out how something you thought was benign is actually bad. It's interesting in a depressing way as a result, although it can be validating to have your suspicions (that something you thought was benign is actually bad) confirmed by someone else. Examples include The Pro-Gentrification Aspirationalism of HGTV's House-Flipping Shows and Incitement Against the Homeless (Part I) - The Infestation Rhetoric of Local News. My enthusiasm for this podcast started to wane after I felt like it was making me hopelessly cynical, so I only listen intermittently now.
Current Affairs is the podcast to accompany the magazine, of which I am also a subscriber. Current Affairs (both magazine and podcast) sit nicely in the intersection of serious leftist critique and frivolity. The typical podcast features some subset of the editors of CA talking about current events and more. My favourite segment is "Lefty Shark Tank", where someone proposes an absurd-sounding policy which will be judged by the others. Policies have included "Lower the voting age to zero", "Elected officials should have to wear burlap sacks", and other less memorable things that sound ridiculous but have surprising implications. It's difficult to select an episode to recommend as the panel episodes are basically miscellany and news, so take this Episode rec: An Analysis of Birds, Large and Small, or this horrifying/slightly inspiring one where two of the lawyers on the CA team talk about their work: Immigration Update: Detention and COVID. Not sure why CA has so many lawyers.
Working Class History is a history podcast, focusing on the role of normal (=working class) people in history. I got into it via a crossover they did with Srsly Wrong on mutinies. I find this podcast can be a bit dry or slightly more taxing to listen to than some of the other more podcasts, so need to be in an attentive mood, but the subject matter is generally fascinating. Slightly esoteric subject matter means even people who know more history than me will probably learn something, but you must understand I'm really starting with nothing here, August 1914 notwithstanding. Episode rec: miniseries on the Columbia Eagle Mutiny, or miniseries on The 43 Group if you're less interested in mutinies than I am.
General Intellect Unit looks at the intersection of technology and left politics. I only got onto this one recently but suspect I will have many thoughts on it in future (given I am a Tech). They're quite interested in cybernetics and a bunch of the episodes are essentially summaries of the main ideas of Stafford Beer, e.g. Designing Freedom, but they also cover the contents of related books, like Red Plenty, or sci-fi I already like, like The Dispossesed.
The Red Nation Podcast is about "Indigenous history, politics and culture from a left perspective". As an Irish person living in England, this podcast is like a dispatch from a totally different world, but it's a welcome change from the typical US-based perspectives. There's a bit of an academic energy to it at times, but mostly I find it interesting to hear what Indigenous people are saying about current and historical events. Episode rec: The Fourth of You-Lie.
We Don't Talk About the Weather is a podcast I found while trying to get away from American podcasts (sorry, rest of list). WDTATW is two London-based guys talking about current affairs, mostly in the UK. The podcast aptly self-describes itself as "sounding like screaming and crying", which captures some of the spirit if not the actual energy levels of the podcast. I mostly use this as a palette cleanser from all the US-based takes beaming into my brain 23 hours a day, especially if something has happened in the UK that needs analysing. Unfortunately something has always just happened in the UK that needs analysing, and it's usually terrible. I feel obligated to be at least somewhat informed about the politics of the country I live in, so here we are. "Boris Johnson is a pagan" comes up more often than I think is normal. Episode rec is a bit hard but take I Can't Believe I "Forgot" 5G.
Trashfuture is one I don't really listen to any more but want to include as part of the "Leftist Podcasts from the UK" genre, in case someone else likes it. It's a bunch of people discussing and making jokes about current events and politics. I stopped listening to it because it's too snarky and chaotic for me (people interrupting each other, etc), but if you like Spicy Takes and Internet Leftist Memes or whatever, you might like this. I will admit to enjoying the parts where they make fun of weird startups, though. Episode rec: MIT Media Lab After Dark Part 3: We Just Make Boxes here feat. Sarah Taber.
Srsly Wrong is a "utopian leftist comedy podcast", and it's my favourite podcast. The first time I listened to it I couldn't figure out what was going on because the format is "mostly serious discussion of ideas interspersed with bizarre sketch comedy". What I really like about this podcast is more the underlying ideology espoused by the hosts rather than the format of the podcast itself (contrast this to Reply All, where the content is somewhat irrelevant but the experience of listening to the podcast is generally pleasant), although I also greatly enjoy the bizarre sketch comedy. I will write more about different ideas from this show, or some steps removed from this show, in due time. Discovering this podcast probably made my 2020. Episode recs: Library Socialism and Usufruct, Trash!.

This has been a non-exhaustive list of podcasts I have listened to in 2020. If you have suggestions for other podcasts you think I might enjoy, please let me know using normal communication channels.

According to my podcast app (Pocket Casts), I have spent 19 days and 21 hours listening to podcasts since November 2018, which is about 477 hours. Is that a large number? Compared to the amount of time I have spent playing Dota 2, the answer is no.

new site for research

2020-11-05T00:00:00+00:00

I made this tweet several months ago:

(This website is now an elaborate mechanism for me to retweet myself.)

In theory apeiroto.pe counts as a personal research page, but it's not of the "glorified CV" variety I was thinking about when I wrote the tweet. When I started apeiroto.pe I liked the idea of being unashamedly multifaceted (get it?), and I still do. But for the sake of those people who are only interseted in the facets of me which pertain to research, I've created a dedicated research-facing website: sthy.land.

It's very sparse, and I don't 100% love the theme (it is of course also using Pelican), but it's good enough for now. Correspondingly, the "research" page on apeiroto.pe will link there from now on.

This sort of post is usually accompanied by a promise to update more often. I refuse to make such a promise.

setting up mastodon with hometown

2020-05-07T00:00:00+01:00

This is a 'how to do something' sort of blog post, I might write about why later. The something is running a social network for my friends.

The starting point is probably run your own social, which covers a lot of the 'why' and some of the 'what'. Chances are if you are reading this post you already know about it, but if not it's a good read and probably a prerequisite for caring about the rest of this post.

In case it wasn't apparent from this website, I like running my own things. I decided that lockdown (metadata: it has been approximately 55 days since I had a face-to-face conversation with someone I know) would be a good opportunity to find new ways of interacting with my friends. Maybe that's getting too far into the why, though. I grabbed another VPS (total VPS count: 3) and decided to set up a mastodon server running the hometown fork.

The basic gist of running mastodon with hometown is to follow two sets of instructions:

This is basically comprehensive but I still messed it up the first time, so in this post I will walk through the steps with some inane recipe-tier flavour-text on the side. Also, the hometown migration doc warns that it's not for beginners, so maybe I can help pseudo-beginners? Semi-nerds like myself still deserve autonomous social networks.

Step 1: Procure a server

My existing VPSes are with Heficed, which I knew as Host1Plus. I originally got a VPS with them many years ago because I wanted to run a Tor exit out of an African country, and they had a data centre in Johannesburg and didn't seem to care about the Tor exit thing.

For reasons I price I opted for OVH this time. In case it's useful, my VPS has 8GB of RAM and 160 GB of storage, and is running Ubuntu 18.04 (this is the recommended OS from the Mastodon guide). I put it in a London data centre, maybe one day I can visit it.

Full disclosure: I initally only paid for one month of the VPS, then ignored 2 weeks worth of warning emails, and everything from I T E R A T I O N 1 got wiped. There was nothing I could do, in such an unprecedented time. In the current iteration I've paid for a year up front. This isn't immensely cheap (it cost about 200 pounds), but I'm one of the lucky ones right now because I have a job. Maybe if it takes off I'll ask for donations from the users.

Step 2: Install Mastodon version 2.9.3

This is where we approximately follow the mastodon installation instructions, but make sure to end up with version 2.9.3 installed. This is (at time of writing) necessary for the subsequent hometown migration*. update from 30 minutes after I wrote that: the hometown wiki looks a bit out of date, there are versions of hometown that are aligned with recent versions of Mastodon e.g. v3.1.2, but I don't want to rewrite this entire blog post so ???

This was the hardest part because you have to deviate slightly from the provided instructions. I'm assuming you're looking at them (time of writing: 7th May 2020) and basically following along, so I don't have to reproduce all the steps here. By the time that page is out of date, this one will be too.

Preparing the machine is all fine.

Installing from source is fine until it comes to installing Ruby. They recommend this:

> RUBY_CONFIGURE_OPTS=--with-jemalloc rbenv install 2.6.6
> rbenv global 2.6.6

This version of Ruby is too new! I went for version 2.6.1. I don't remember how I figured out this version would work, a dream maybe?

Postgres installation instructions are all fine. Be careful if you already have the remnants of failed installations lying around (who would have that?!) with misleading table names. In a way, losing everything to the big VPS wipe was a gift.

Vigilance is once again required while checking out the Mastodon codebase. They tell you to do this:

git clone https://github.com/tootsuite/mastodon.git live && cd live
git checkout $(git tag -l | grep -v 'rc[0-9]*$' | sort -V | tail -n 1)

That second line is trying to check out the most recent version of Mastodon, and we're not here for that. I cheated by taking the commit hash for v2.9.3 from the Hometown instructions, which is gonna give us this instead:

git clone https://github.com/tootsuite/mastodon.git live && cd live
git checkout 06f906acace5770fc10f333a203b036c5b72c849

You can sanity-check this has produced the correct version with

git describe --tags

where it will dutifully give you v2.9.3, probably.

The rest of the Mastodon instructions are basically as-is.

Before you do the interactive setup wizard

RAILS_ENV=production bundle exec rake mastodon:setup

you should probably do step 1.5, though.

Step 1.5: Get a domain and mail server

The Mastodon server running on the VPS needs some way of being known to the outside world, and domain names are a recent technology purporting to improve on the otherwise-flawless system of memorising ipv4 addresses.

I use Namecheap for my domains (including apeiroto.pe and my Mastodon server). The basic idea is you rent a domain from them, because they're internet landlords I guess. Once you own your domain you can update its A Record (for me this is in the domain management page) to include the IP address of your VPS.

I'm including 'get a mail server' in here because I did that with Namecheap as well. I briefly considered running my own mail server on the VPS, but I'm not at that level of running my own things. Not yet. During the mastodon setup wizard it will ask for various details of your mail server, which should be available from whatever provider you go with (e.g SMTP address, username and password and such). I'm like 30% sure the way I set this up has left it open to harvesting by spambots, so I'm not going to give any further advice in case the spambots find me.

The reason you are giving it an email address is so it can send things like Forgot Password? emails to your users.

Step 2.05: Finish step 2 and then check

Before proceeding, go to your new domain in the browser and check that:

mastodon is working
mastodon is version 2.9.3 (near the bottom of the page)

You could stop here and be content with your Mastodon server running a slightly out of date version of Mastodon. You could be happy.

Step 3: Migrate to hometown

Now we refer back to the hometown wiki page on migrating from mastodon mentioned before.

Because we ensured to install version 2.9.3 before, this should mostly Just Work. The only slightly odd thing for me was getting this error:

> git fetch --tags
> git merge v1.0.1+2.9.3
merge: v1.0.1+2.9.3 - not something we can merge

This might be some git config issue but changing to

git fetch --tags -all

and then merging as described fixed it.

Following the rest of instructions as they are should result in your Mastodon server now including a mention of hometown at the bottom, indicating that it works.

Step 4 (optional): Make local-only posting default

For me, the reason to use hometown was that sweet local-only posting, but it's still possible for users to post publicly. Following all the above steps, it ended up that public (or federated to be more specific) posting was the default. Having to toggle local-only on every tweet (I refuse to say toot because it sounds stupid as hell) was a pain and could easily result in accidental public posts. Thus began a saga of trying to figure out how to change this behaviour.

I don't know much Javascript and I know even less Ruby, so after some fruitless attempts to reverse-engineer the codebase I got some friends involved, who are faster at reverse-engineering and who did not repeatedly pause their progress with games of starcraft 2. Eventually (like, four hours later) Paddy realised that this is just an option you can set in a config file:

here is the line

In case of link rot, this is the settings.yml file in the config subfolder of the repository. The config option is default_federation, and you want it to be false for local-only to be default. I'm looking forward to discovering what the other options do.

Step 5: Give back to community

... by using acquired wisdom to update their documentation (maybe)

While double-checking things for this post I realised that the hometown installation document seems to be out of date.

Remember how we painstakingly made sure we had v2.9.3 of Mastodon installed, so we could use hometown? It looks like there's already hometown v1.0.3 for mastodon v3.1.2 so that was actually unnecessary and we could have just installed whatever Mastodon version we wanted. Anyway I hope you enjoyed this blog post, I'm going to go mass queens before they get nerfed.

the urgency of slowness

2019-05-19T00:00:00+01:00

My parents are not originally from Dublin. My father came to Ireland from Chile in the 70s, and my mother came from Wicklow a bit later. In both cases the rest of their family stayed behind for the most part, so visiting family has always meant travelling. My family regularly made the 18-odd hour journey from Dublin to Santiago during my youth. I've since learned that regular transcontinental travel is not the norm, and you don't get free food on short-haul flights.

Driving down to Wicklow we used to pass through the Glen of the Downs. In the Glen, environmental and political activists demonstrated in the forest and painted messages on the wall overlooking the valley. I grew up wanting to be an eco-warrior like them. I wanted to live in a tree in the hills, with a valley to protect. Along the way I forgot about living in a tree and I got caught up in normal life, a life lived safely within the boundaries of social acceptability. Only once have I appeared on television during a protest about anything. In the light of the (semi-)recent IPCC report about the increasingly catastrophic outlook for the planet's climate, I wonder if I and others like me will be looked back at with scorn. Scorn for our inaction, our unwillingness to sacrifice anything, our complacency in the face of unfolding disaster.

I try not to be complacent. In the calculus of personal responsibility for climate change, I think I fare better than others in my situation. I'm going on 8 and a half years of vegetarianism. I don't drive: I don't actually know how to drive. Privileged as I am to live in cities, I cycle, walk, or take public transport everywhere. I donate to environmental charities. I don't buy new things very often, I avoid plastic. Every time I brush my teeth, every time I wash dishes, I think about water conservation. I don't leave the light on. I hang my clothes out to dry. But I am still a (north-western) European, and I'm well-off. The comforts of my life come at a premium that I will barely have to pay. I take warm showers, I enjoy continuous electricity and high-end consumer devices. And I fly. A lot.

In June of 2013, I thought of myself as well-travelled. I had been to Chile countless times; once to Easter Island, and another time to the Atacama desert. I had been to California and Florida, Turkey and Vietnam. When I decided to move to the USA for a PhD I knew it was a more serious kind of emigration than moving to England had been. But at the time I had not deeply thought about being a scientist as well as an emigrant. Between conference travel and visiting home and family in Europe, over those 5.4 years my propensity for air travel increased precipitously. I started writing this en route to South Africa in January 2019 (it's been in the outbox for a while). Between November 2018 and February 2019 I will have been in Zurich (home), San Francisco, New York, Montreal (via Geneva), Dublin (other home), London, Cape Town (via Johannesburg, Paris, George, and Amsterdam), and Mexico City. That's four transcontinental trips in as many months. This would have been unthinkable to a teenage version of myself. One acclimates quickly.

The problem is that flying is very convenient. Flying is unbelievably fast. Flying allows you to convert money into time, in what is so obviously an excellent trade that anyone with the means would surely be foolish to turn it down. This speed opens up the possibility of spending the weekend in another country and returning for work as usual on Monday. Going to the other side of the planet becomes inconvenient only because 12 hours on a plane can become uncomfortable. The entertainment system might stop working, the crying of a child may interrupt already fitful sleep at 35,000 feet, and the food may be bland or unrecognisable. The idea that air travel is damaging in some abstract way to the environment does not practically feature into the considerations one makes around time, money, layovers, legroom, screaming children, discarded water bottles, city views, packing. There is a sense that flying might be contributing somehow to climate change, but for an extra 10 euro you can offset your carbon, and climate change is sort of not obviously happening right now, and there's still some time left, and anyway it's those top 100 companies that are really causing it, so it's probably fine. You move on to seat selection: windows are good for long trips because you can lean against the wall, but what if you want to go to the bathroom?

And so it goes.

In the winter of 2018 I finally got to take the "Adirondack" between Montreal and New York. This is a passenger train route which travels through the Hudson valley and the Adirondack mountains, and it takes about 11 hours end to end. I had previously thought about it in 2015 when I was still living in New York and Montreal was hosting a popular conference in my field. In the end I decided I couldn't afford to add two days of travel to the trip, even if those days could be spent preparing for the conference. I figured any such train-based preparation would be inferior to what I could do from my office, so the most efficient solution would be to minimise travel time. Besides, the money wasn't mine, and everyone else was flying. In December 2018 the conference came back to Montreal, and my PhD defense required me to return to New York, so I killed two non-metaphorical endangered species with one stone and made the transatlantic trip. This time, when the opportunity arose, I took the train.

I'm not a stranger to long rail journeys. Like many middle-class Europeans, I spent one intense summer travelling our small continent on trains of varying speed and quality. These at-times ten-hour journeys were spent eating illicit sandwiches smuggled from hostel breakfasts, debating still-intractable philosophical positions, and otherwise smoothly blurring into the rest of the trip. So I was not worried - I knew I could entertain myself, even without friends on hand to practice Hungarian with. We left Montreal at 10am and arrived in New York's Penn station after 9pm. Over eleven hours I watched the landscape change almost imperceptibly slowly. The urban area of Montreal slipped away as the snow receded. Frozen lakes cracked and melted, and bare trees gave way to the distant brown-green valleys of Vermont. I saw tree swings and tree houses and a pair of wooden chairs on a hill overlooking the river. I thought about life in these places. I speculated. After five years under the abstract sense of panic I attribute to doing a PhD, I basked in the luxury of time. For eleven hours, I didn't really do anything. I didn't try to pass the time, nor did I clamour to use it. I sat, mostly, gazing out the window as the sun moved across the sky.

Choosing to fly is a complex decision. Due to economic and political forces I have yet to understand, it's often the cheaper option, and not everyone has the same abundance of time as an academic who just finished her PhD, nor the same willingness to spend ten hours looking at variations on wintry landscapes. I can't ask people to stop flying, I wouldn't even ask it of myself. But I have come to realise that the idea that flying turns money into time is an elaborate self-deception. You can't buy time. You can only (if you are lucky) choose how to spend it. Spending it in paradoxically-monotonous bursts of speed seems to be the default choice for anyone with the money and the inclination. It was for me, but I'm done. I'm done with priding myself on my tolerance for complicated itineraries, I'm done with "at least you can watch movies!", with habitually storing my toiletries in a plastic bag. I'm done with finding the view above the clouds unremarkable. I'm done with the urgency of flying, the devaluation of time spent doing little. I'm done with a culture of treating environmental issues as sources of guilt and little else, a carbon offset to be purchased, and a hope that everything will get better by someone else's hand. Moving slowly might not save the world, but at least I'll see the trees while they last.

yes, but did it work? evaluating variational inference

2018-06-03T00:00:00+01:00

This post is about the paper Yes, but Did It Work?: Evaluating Variational Inference by Yuling Yao, Aki Vehtari, Daniel Simpson, Andrew Gelman, which will appear at ICML 2018. I'm going to try to summarise/explain the paper in my own words, largely for my own benefit. I'm also going to do this without writing any mathematical formulae, because I don't remember how to do LaTeX with my website, and I don't feel like shaving that particular yak right now.

After the accepted ICML papers were announced, I went through it hunting for relevant work. I've decided it's a better use of my time to read papers that have been accepted somewhere, rather than drowning under the firehouse of my arXiv RSS feed. This paper ticked two boxes: variational inference, and knowing if it worked. It also ticked a third, secret box of "titles that make it sound like the paper will have been written in a casual, conversational style, eschewing the tradition of appearing smarter by obfuscating the point".

Single-line summary: they describe two diagnostics for evaluating the variational posterior, with different properties and use-cases.

So let's talk about these two diagnostics.

Pareto Smoothed Importance Sampling (PSIS)

At this point I realised the auther overlap between this paper and Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC, which in turn builds on Pareto Smoothed Importance Sampling.

So what is PSIS and how is it useful for evaluating VI?

Importance sampling is a technique which enables us to estimate expectations under a distribution which is difficult to sample from (the target distribution), using an approximating proposal distribution. You sample from your proposal distribution (which is easy to sample from), then weight those samples by the ratio of target distribution to the proposal distribution (evaluated at the sample point). These weights are called importance ratios.

Pareto smoothing comes in because in the event that the proposal distribution is a poor fit to the target distribution, these weights can have a very high variance. The proposal distribution is the denominator in the importance ratio, so if you imagine that this distribution is a lot thinner than the target distribution - that is, it's near zero in regions where the target distribution is not, you can end up with some very large importance ratios - high variance. This means that you would need a lot of samples to estimate the expectation value of interest. Pareto smoothing is a way to control this variance. It builds on the idea of simply truncating the importance ratios (Truncated Importance Sampling, Ionides 2008) by instead fitting a Pareto distribution to them.

Side note: Part of the motivation of using the Pareto distribution at this point, I think, is to use its fitted parameters to do diagnostics on the proposal distribution. This is exactly what "Yes, But Did It Work?" is doing, but they already talk about it in the original PSIS paper, so I guess part of the novelty of this ICML paper is bringing it explicitly to the VI area. More about VI when I'm done with this Pareto stuff.

So how does fitting a Pareto distribution to the importance ratios help? In practice, you fit the Pareto distribution, and then instead of simply truncating the top M importance ratios (M is chosen empirically/arbitrarily) you replace them using the inverse cumulative density function of the Pareto distribution you fit. This replacement operates on the ranks of these importance ratios (so the smallest of the M, the second-smallest and so on), replacing those with what you'd get in a Pareto distribution ranked by CDF. This reminds me of rank-based inverse-normal transformations I've seen used in genetics (weirdly difficult to find papers about this, here is an R vignette). They argue that this produces an IS estimate that is less biased than what you get using truncated IS. Moreover, you can inspect the parameters of the fitted Pareto distribution to do diagnostics.

The reason they use a Pareto distribution to model the top M importance ratios is because It Is Known. Rather, it is shown in Pickands, 1975 to be an appropriate choice. To be specific, they use a generalised Pareto distribution. This distribution has three parameters (location, scale, shape), and it has the property that it has finite moments up to order 1/k, where k is the shape parameter. That means that if k > 0.5, the variance of the importance ratios is infinite, but if k < 1 at least the mean exists. They point to 0.5 < k < 0.7 as a regime where the importance sampling procedure exhibits a practicaly useful convergence rate. Side note: I don't quite see where the jump from modelling the variance of the tail of the importance ratios to modelling all the importance ratios happened. I suppose if you observe that your tail has a finite variance, then you must have finite variance in the rest of the values, but I would have expected an additional step to extend the conclusions made about the fit of the Pareto distribution to the rest of the importance ratios.

Now, relating this back to variational inference is straight forward: replace "target distribution" with "variational posterior". PSIS, via the shape parameter of the fitted Pareto distribution, gives us a diagnostic for how well the variational posterior fits with the true posterior.

But wait... don't we need the true posterior to calculate the importance ratios? Isn't this circular? The answer is that you can use the joint distribution (p(z, x) rather than p(z|x)) because the estimate of k is invariant to a constant multiplicative factor, which will be p(x).

The diagnostic approach is thus:

Run VI, get variational distribution q(z) approximating p(z|x).
Sample a bunch of zs from q(z)
Calculate p(z, x) for all the zs (remember, x is known - it is a specific dataset), and get the importance ratios p(z, x)/q(z)
Fit a generalised Pareto distribution to the largest M importance ratios
Report the (estimated) shape parameter k
If k > 0.7, the VI approximation is not reliable
If k < 0.5, the VI approximation is good, and PSIS can additionally be used to calculate further divergence measures

They touch on two other points in this paper, regarding PSIS:

The shape parameter k is invariant under reparametrisation, but reparametrisation can influence the VI procedure and produce better/worse proposal distributions. So looking at k can help guide reparametrisation efforts
Marginal PSIS diagnostics are not useful. These marginal diagnostics would be doing the above procedure, but instead of sampling full zs, sampling only 1 dimension at a time. Compared to PSIS diagnostic evaluated from the joint distribution, these marginal ks are never larger (usually smaller) than k, and can be misleading. Also, this means you need access to the marginal distribution p(z_i, x) (or p(z_i | x)) to get the importance ratios, which may be unavailable. So don't do it.

Variational Simulation-Based Calibration Diagnostic (VSBC)

The PSIS diagnostic looks at the full approximate posterior. However, sometimes you don't need to properly approximate the full posterior, and can get away with producing useful point estimates. VSBC evaluates the quality of point estimates. It is based on Validation of Software for Bayesian Models Using Posterior Quantiles (Cook, 2006).

The key observation from (Cook et al., 2006) is going to be fun to explain with no proper equations. Let's try: suppose we have access to p(z), p(x|z) and p(z|x) (this will be approximated by p(z) shortly). We simulate an x by first sampling from z, then p(x|z). Now, we then sample multiple z's from p(z|x). We can ask what fraction of those sampled z's are smaller than the original z - we call this the calibration probability. Now, if we were to do this multiple times (picking a z, then x, then multiple resampled z's) we would get a distribution of calibration probabilities. And that should be uniform. I think this is Cook's observation.

So to relate this to VI, we can perform the above procedure, replacing the true posterior p(z|x) with the approximate posterior q(x). (This means we have to do a full VI step for each dataset x we sample!) We could then in principle ask how far the distribution of calibration probabilities deviates from normal, but in this paper they suggest (following on from other literature) to instead measure how asymmetric this probability is.

Thus, the VSBC diagnostic is to test for asymmetry in the distribution of calibration probabilities. They do this using a Kolmogorov-Smirnov test between the distribution of probabilities and one minus that distribution. More specifically, they actually focus on marginal probabilities - so where I said 'z' above, imagine this is one dimension of z. Thus, they look at marginal calibration probabilities. This is necessary because z < z' only makes sense for scalars.

So running the diagnostic means running VI multiple times over simulated datasets. If your generative model of the data is poor, this diagnostic won't tell you much about how your VI scheme will work on real data, or indeed on a given instance of real data, since VSBC gives average performance. An advantage of VSBC over PSIS is that it looks at marginals, so you can potentially identify which dimensions in z are problematic during fitting.

Applications, etc.

Given these two diagnostics, they then show how they can be used in a couple of different settings - Bayesian linear regression, logistic regression, a hierarchical model (the famous Eight-School model), and a cancer classification application. In all cases, they use mean-field Gaussian automatic differentiation variational inference.

The big question for me and probably a lot of other users of variational inference is how well these can be applied to the types of posteriors we try to approximate using hideous neural networks. VSBC may be computationally impractical because it puts the whole VI procedure inside an inner loop, although it's easily parallelisable. High-dimensional posteriors are problematic for importance sampling and thus PSIS, although I don't know what "high" is - 10, 100? 1000?? Multimodality in the posterior is also a challenge, as they point out in the discussion - the VI approximation could completely miss a mode, but the PSIS diagnostic would nonetheless indicate all is well. They suggest to use PSIS to evaluate some other divergence (such as a KL divergence) to diagnose this case.

In summary, this has been a post about evaluating variational inference using two diagnostics - Pareto-smoothed importance sampling, and variational simulation-based calibration. At its core this paper feels like an application of previous/existing work to a slightly new domain (variational inference). I'm curious to try these diagnostics on my own variational posteriors. Code is seemingly available (maybe just for PSIS) - R package (loo), and also a Python/Matlab port.

NIPS 2017

2017-12-20T00:00:00+00:00

I'm continuing my tradition of summarising conferences I attend. Previous posts: NIPS 2016, NIPS 2015, AAAI 2016, ICML 2016. I also went to AAAI 2017 to present my work on unitary recurrent neural networks, but didn't write a summary.

This was my third time attending NIPS, but my first time attending NIPS with jetlag. The advantage of jetlag is that it provides a topic of small talk less agonisingly self aware than the weather (weather readily avoided by waking up at 6am). The downside of jetlag is me standing glassy-eyed in front of a poster, trying to formulate intelligent thoughts but just yawning really, really obviously.

After a few days of complaining about the jetlag I realised I was probably exhausted because NIPS is exhausting. The problem is early mornings, listening to talks, bumping into people I know, talking to people I don't know, having meetings, talking to recruiters, talking over dinner, going to poster sessions, talking at posters, finding people who I had previously talked to as strangers but who are now acquaintances and talking to them again, and so on. Having gone twice before did not teach me moderation, and I was hoarse by Thursday. I also experienced an interesting fluctuation in my desire to do research, which I have depicted in the following graph: (enthusiasm has since returned, luckily)

Figure 1: We observe that research enthusiasm of the PhD student is a nonlinear function of Days of NIPS (dNIPS), with two local maxima attained towards the ends of day 3 and 5. Data beyond day 6 could not be reliably collected due to hostility from the test subject.

This analysis clearly indicates that the optimal length of NIPS (including tutorials and workshops) is three days. Recent work (private communication) suggests that "taking breaks" can prolong the research-excitement peak, sustaining the individual beyond the end of the conference, providing hope for 2018. When I got back to Zurich I slept for 7 hours, arose for 7 (during which time I did a roller derby exam, but that's another blog post), then went back to bed for another 10. My body had no idea what was going on.

As in 2016, I'll organise this by topic. This post is rather long, but each section is largely independent so feel free to pick and choose.

Tutorials
Women in Machine Learning
Main Conference
Machine Learning for Health
Conclusion

Tutorials

The first day of WiML actually coincided with the tutorials, so I was only able to attend those in the morning. I went to A Primer on Optimal Transport. I then got Baader-Meinhof'd about it for the rest of the conference.

I was twenty minutes late to the tutorial. This is decided to commute to the conference on roller skates (see frame from snapchat selfie video, right), and on the first day I misjudged how long it would take (my Airbnb was about 3 miles away).

Unfortunately missing the start of a tutorial, especially a mathematical tutorial, can be fatal. I arrived in the middle of an explanation of how Kantorovich's formulation of optimal transport relates to Monge's formulation and I had no reference for what was going on. I tried to sneakily look things up on Wikipedia to catch up, but all was lost and I came away from the tutorial with only an intuitive understanding of the optimal transport problem, and that Wasserstein barycentres are better than l2 averages for combining distributions, usually. In case you missed it, here are the slides (pdf). I said to myself that I'd go and learn optimal transport real quick and give a coherent summary of it here, but I also want to write this post before I graduate.

Women in Machine Learning Workshop

WiML took place on the tutorial day (Monday), and also on symposia day (Thursday). I am not sure why they split it up like this.

Last year I said that 15% of the 6000 NIPS attendees were women. I don't recall them releasing statistics about attendee demographics this year, but apparently 10% of unique authors amongst submissions were women (amongst accepted submissions? unknown), so the gender situation in ML is still pretty dire. Fixing this is a hard problem and not really my area of expertise (except for what I know from invariably being involved in conversations about Women in STEM), but I'm pretty sure events like this help. Why do I think that? Well, this year was the first instance of the Black in AI workshop, and while I didn't attend (I was at the healthcare workshop), even seeing people tweeting about it made me way more aware of the work being done by Black researchers. So hopefully WiML also alerts people to the existence of good work getting done by women. Oh, and travel grants! I could imagine in this era of NIPS-selling-out-rapidly that pre-purchasing tickets to redistribute to minority groups could also play a part in promoting diversity. Weird to think of women as minority group, but apparently we only comprise 49.58% of the world's population these days.

Interesting talks/posters:

(contributed talk) Peyton Greenside spoke about using graph convolutional networks to model Hi-C and also ATAC-seq data. I wanted to talk to her at the poster session, and once again (this happened last year too) her poster was on the other side of the board to mine. You can find her talk at 1:14 into this video.
(invited talk) Joelle Pineau spoke about Improving health-care: challenges and opportunities for reinforcement learning. The talk focused on slow and small research: research with small sample sizes, acquired slowly. She spoke about designing treatment strategies for epilepsy, probably referencing this paper: Adaptive Control of Epileptiform Excitability in an in vivo Model of Limbic Seizures (or this one but I can't find the PDF). The idea is that brain stimulation can prevent seizures (cool!), and you can use reinforcement learning to build a controller (controlling the frequency of applied stimulation) to achieve the same level of seizure control while minimising the required amount of stimulation. One lesson she highlighted from this work is that models (in the 'animal model' sense) are important (they use a seizure model from mouse brain cells, I think), and having existing baselines to build from also helps. She also described some work on bandits to do cancer treatment optimisation, which I think I actually already wrote about in my ICML 2016 post.
(invited talk) Nina Mishra spoke about Time-Critical Machine Learning. She spoke about anomaly detection on huge streams of data, using Random Cut Forests, and she spoke about machine learning in medical emergencies (probably this paper: Time-Critical Search). When faced with a medical emergency, people will ring the relevant emergency number, and then, a lot of people will turn to Google for help. This isn't always the most efficient way to get useful information, so they did some work on trying to detect (using search query and other metadata such as time, location, query history) whether or not a person was in an emergency situation, with the intention to give more relevant results. Someone asked if it wouldn't be easier to just make a special emergency-search app, but Nina pointed out that nobody wants to download an app in an emergency situation. (I do wonder if phones could come with such an app by default, but making that standard is a whole other challenge). She did however describe a possible emergency app, which I think was called Samaritan (reminding me of the very cool GoodSAM app), that guides a user through performing CPR. Part of the procedure involves putting the phone on the person's chest and using its accelerometer to guide CPR compressions. Nice use of ubiquitous smartphone tech.

Regarding the poster sessions, I spent all of the Monday session presenting my poster (see the Healthcare workshop below), and much of the Thursday session talking at my friend's poster (Learning the Probability of Activation in the Presence of Latent Spreaders) and sneaking peeks at the Interpretable Machine Learning symposium - video of a panel session here, and video of the debate here.

Roundtables

As in previous years, the roundtables were one of the highlights of WiML for me. It's a great opportunity to meet senior scientists I might not otherwise be able to, and also to get to know some of the other WiML attendees.

I ended up going to four tables - two career-based, two topic-based:

Choosing between academia and industry - I went to the same topic last year, but this time the table mentors were both in academia, so I got a somewhat different perspective. This is also a question I've spoken to people about and thought about, so I didn't learn much, but it's useful to have one's thoughts externally validated. The gist is that academia gives more freedom, at the cost of stability, potentially having to teach, having to supervise students, and having to ~~beg for money~~ write grants. Not all of these are necessarily negatives - some people like teaching and supervising (nobody likes writing grants). Meanwhile, industry may limit research freedom, but provides more stability, and (usually) freedom from having to run your own lab with all that entails.
Establishing collaborators/long-term career planning - the roundtable I attended wasn't especially enlightening on this topic, but the talk from Raia Hadsell touched on it, and gave some good long-term career advice. The advice was this (taken from one of her slides):
- If you like to go deep, make some room for novelty and risk.
- If you are a renaissance woman, try going deep.
- NIPS and WiML are your community - be a participant.
- speak loudly. ask questions. be strong
I'd not self-identify as a 'renaissance woman' (I go for 'attempted polymath'), but I tend to aim for multifaceted (see the name of this website), so the advice to go deep was hard to hear, and therefore useful. (I just love when people tell me things I don't want to hear, it's why I use twitter.)
Generative models - a lot of this roundtable consisted of me discussing evaluation of GANs with Ian Goodfellow. This was a bit selfish because it's a topic of direct relevance to my current work on recurrent GANs for medical data (see also below) and maybe less interesting to others. However, I also think evaluation is one of the most interesting GAN-related questions right now. There's understandably a lot of focus on the GAN objective and optimisation procedure, thinking about convergence and stability and so on, but optimisation without evaluation seems foolish.
Machine learning for healthcare - we discussed some of the big challenges facing MLHC, like data sharing, causality, and something else I've forgotten but lists should always contain three elements. I've not worked on causality before, but I'm increasingly aware of how causal reasoning (especially counterfactual reasoning) plays a role in trying to understand observational medical data. More about healthcare in the section on the healthcare workshop.

The Main Conference

Invited Talks

Long Beach in December: not bad

John Platt spoke about Powering the next 100 years (video), which was less environmentalist than I was hoping, and more about economics (also important, less exciting). He also spoke about nuclear fusion, which is very exciting, and possibly important (in the future). One issue I had with the premise of this talk is that I don't think we should be trying to expand US power usage to the rest of the world - the US uses disproportionately much energy relative to other developed nations (even with high standards of living, see also the 2000-watt society), so while it would be nice if we could, I would personally rather focus on minimising our energy consumption until it is sustainable to consume more. But anyway, assuming the premise, they use machine learning to optimise both the economics of power usage, and for identifying promising (and safe) experiments to run on fusion reactors.

I missed Brendan Frey's talk about reprogramming the human genome, and also Ali Rahimi's talk for the Test of Time Award. I sorely regret missing the latter talk because people kept asking me about it. I had to wait until I got back to Zurich to rectify the matter, but having now watched it (available here), I get the fuss.

So, regarding Rahimi's talk: Yann LeCun quickly posted a response, and Ferenc Huszár posted another response, and I should make a separate blog post to add my incredibly important opinions on the matter, but I'll just cram them right in here. Ali Rahimi's talk claimed that much of machine learning these days is alchemy - people are building models that work, seemingly by magic, which we don't quite understand. As a relative newcomer (remember, only my third NIPS) I can't hark back to any golden days of rigour and understanding, but I can certainly say that the things he suggested - simple experiments, simple theorems - are appealing.

My take: We should not make unsubstantiated claims in science. We should design experiments to test claims we make about our models, and we should not accept speculatory claims from others as fact. How often do papers today fail by these measures? Rahimi's talk implies this happens often enough to be worth calling out. I feel like I have read papers which make unsubstantiated claims, or over-explain their results, or introduce poorly-defined concepts, but I can't recall any to mind, so my claim must remain purely speculative.

What really resonated with me from Rahimi and also Huszár's points is that empiricism does not imply lack of rigour. A lot of what I do is quite empirical. A lot of what I do is somewhat applied. I've struggled with feeling like it's less scientific as a result. I've felt like I am "just" doing engineering. But the best way I have come to understand this work, which was captured in this point about empiricism, is that rigour does not need to be mathematical (forgive me, I am a former theoretical physicist, so this has taken me some time to realise). Experimental design is also rigorous when done well. Building a model to solve a problem may be a kind of engineering, but trying to understand it afterwards, forming hypotheses about its behaviour and then testing them - this can, and indeed should, be done rigorously. Otherwise, you show that a model exists which can achieve a certain performance on a certain task on a certain dataset, and little else.

The next talk I actually attended was The Trouble with Bias from Kate Crawford (video here). This was a great talk, and I'm glad it got a prime spot in the program. Not only was her public speaking skill commendable (the slides just vanished near the end and she barely skipped a beat), but the talk was really interesting. I admit I was worried I'd already know most of the contents, since I read things about bias on a semi-regular basis (somehow). Even if I'd known everything she was going to say (which I didn't), I'd consider this talk a good distillation and overview of the pressing issues. She made an illuminating distinction which I shall now paraphrase.

When it comes to bias, there are harms of allocation and harms of representation. Biased allocation is easy to see - someone got a loan someone else didn't, someone got bail and someone else didn't, etc. These are concrete and tangible, immediate, and easy to quantify. Representation on the other hand relates to impressions and stereotypes. Google searches for 'CEO' returning all white men is a representational bias, and its effect is much harder to measure. Images of Black people being labelled as 'gorillas' is representational bias and while clearly hurtful, the impact of allocation is not immediately obvious. Many people generally accept that this kind of representation is bad, but can we blame it for any particular instance of allocation bias? Usually not. Representational bias is diffuse across culture, difficult to measure, and may not have any immediately obvious impacts. An example from me: We as a society are starting to suspect that something about how women are represented in society may be influencing the rates of women going on to study STEM subjects. This representational bias may be slowly manifesting as a tangible absence of female engineers, but it is difficult to formalise or prove that these observations are causally related. And of course, machine learning algorithms (like literally any algorithm) can be biased in either of these ways (and presumably more). Once again: watch the talk.

Pieter Abbeel spoke about Deep Learning for Robotics - really, (deep) reinforcement learning for robotics. Probably the most important takeaway from this talk was the 1 second clip of Dota 2 1v1 mid he showed, establishing an important moment in both Dota 2 and NIPS keynote history. The non-Dota content of the talk was largely focused about meta-reinforcement learning, or 'learning to reinforcement learn', and architectures to achieve this. The idea is that you want to build agents which can adapt quickly to new environments, as humans do. One interesting idea was 'Hindsight Experience Replay', which assumes whatever ended up happening was actually the goal, and deriving reward from that.

Reinforcement learning agent re-evaluating its experience.

This converts the usually sparse reward in RL to plentiful reward signals, given the Q-function is augmented with a notion of a goal. He used the cake metaphor that everyone loved from Yann LeCun's keynote at NIPS last year, converting the cherry on top of a cake to multiple cherries on a cake. People can't get enough of the cake joke. It's Portal all over again.

I missed the talks from Lise Getoor, Yael Niv, and Yee Whye Teh because there is only so much time in a day.

Spotlights and Orals

First, a brief rant.

I was quite impressed by the quality of the spotlights and orals this year. Coming from the rather low bar of 'mumbling at a slide covered in equations' of previous years, I was glad to see that many presenters really put time into preparing their talk. These talks give people the opportunity to explain their work to potentially thousands of fellow researchers, so giving a terrible talk is insulting both to the audience and to the people who didn't get that opportunity.

I've thought about the implications of having an additional selection process for determining orals and spotlights. There's a trade-off between highlighting really good papers (with possibly terrible speakers) and highlighting less meritorious work (with a good communicator). There's also a challenge of being fair to non-native English speakers when assessing presentation quality - it would not be acceptable to condemn a talk on the basis of the speaker's command of English.

I try to assess talks by how much they have considered the audience - considering what the audience already knows, what may be obvious (or not, usually), what the really important things in the work are, and what can be skipped without degrading the story. But how to do this without (subconsciously) judging the fluency of the speaker's language and delivery is not entirely clear. I'm sure there is already bias in how the quality of one's English influences paper acceptance (either through clarity or unknowingly discriminatory reviewers), so adding an additional layer on the presentation quality may exacerbate the issue. On the other hand, communication is really important for scientists, and the conference should do what they can to ensure the content is high quality. Maybe some sort of (optional) pre-conference speaking workshop for those invited to give orals and spotlights?

Ranting aside, a selection of talks I took note of:

Bayesian Optimisation with Gradients - Jian Wu, Matthias Poloczek, Andrew Gordon Wilson, Peter I. Frazier. Augment Bayesian optimisation using gradient information - 'derivative-enabled knowledge-gradient (dKG)'. They put a Gaussian process prior over the function to be optimised, resulting in a multi-output GP for both function and gradient (the gradient of a GP is a GP). It works better than methods not using derivatives, but I rarely have access to derivatives when I'm doing hyperparameter optimisation in deep networks, so I'm not sure how useful it would be for me.
A Unified Approach to Interpreting Model Predictions - Scott Lundberg, Su-In Lee. The framework is called 'SHAP' (SHapley Additive exPlanations). The idea is to interpret the model by assigning features importance values for a given prediction. This work unifies six existing methods by proposing a notion of a 'additive feature attribution method'. They also find that their approach agrees well with human-derived feature attribution scores.
Convolutional Gaussian Processes - Mark van der Wilk, Carl Edward Rasmussen, James Hensman. They consider a patch-response function, which maps from image patches to real values, and place a Gaussian process prior on this function. Considering the sum of the patch-response function on all patches of the image as another function, its prior is also a Gaussian process. Computational complexity is a huge barrier here, which they address by using inducing points in the patch space, corresponding to using inter-domain inducing points (an idea which is already understood, if not by me).
Counterfactual Fairness - Matt J. Kusner, Joshua R. Loftus, Chris Russell, Ricardo Silva. Consider predictors as counterfactually fair if they produce the same result if a sensitive attribute were different. This means that any nodes downstream (in the causal graph) of that sensitive attribute may also be different. This implies that a predictor will necessarily be counterfactually fair if it is only a function of nodes which are not descendants of the sensitive attribute, unsurprisingly enough. They address the fact that this is rarely feasible (almost everything in a person's life may be affected by their race, for example), by considering other models. For example, using residuals of variables, after accounting for (using a linear model) the sensitive attributes. One nitpick: I take issue with the example they give in Figure 2 (level 2). They introduce a latent variable which is predictive of success (GPA, LSAT, first year law school average grade) independent of sex and race, and call this knowledge. I think this is a weird choice - surely knowledge is affected by sex/race, if only by influencing available educational opportunities and ability to study unimpeded (for example, the need to work during school/college, the need to look after family members). I am trying to think of another name for this node which is not plausibly influenced by sex or race, some sort of intrinsic attribute of the person - 'grit'? 'general intelligence'? 'luck'? (But who wants to base law school admissions on luck?) I can't imagine the authors were intending to make any kind of political statement about the nature of knowledge here, but it seems like a weird error(?) in a paper dealing with social issues.
Multiresolution Kernel Approximation for Gaussian Process Regression - Yi Ding, Risi Kondor, Jonathan Eskreis-Winkler. The popular method for scaling GPs is to approximate the kernel function using a low-rank approximation (the Nyström approximation). There are some issues with that: is a low-rank approximation reasonable? Which part of the eigenvalue spectrum of K' (that is, K + sigma I, which appears in the MAP estimate of the function) is the most important? This work proposes and develops a different kind of kernel approximation, depending on the data, where local factorisations are used, and it can be assumed that 'distant clusters [of data] only interact in a low rank fashion'. My cursory skim of the paper wasn't enough to get exactly what they're doing, but I love to see work questioning common practices and trying to understand/improve on them.
Doubly Stochastic Variational Inference for Deep Gaussian Processes - Hugh Salimbeni, Marc Deisenroth. Why do I always end up reading about GPs? I'm not even using them (right now?!). The tl;dr on this paper is that they got deep (that is, multi-layer generalisations of) GPs to work. Previously they didn't work particularly well because the variational posterior required each layer to be independent, an assumption which this work drops by introducing a new variational inference procedure (hence the title). They show that this model works even on datasets with a billion examples.
Style Transfer from Non-Parallel Text by Cross-Alignment - Tianxiao Shen, Tao Lei, Regina Barzilay, Tommi Jaakola. Separate content from style, in text. This is interesting to me because, like years ago, (2014) we had discussed using language embeddings to remove stylistic choices from the language of doctors, to try to standardise text across multiple authors. I'm not saying we have any claim whatsoever to the idea - ideas are cheap, implementation matters - but I'm interested to see that someone has - sort of - achieved something like what we wanted. They assume they have corpora with roughly the same content distribution but different style distributions, and try to learn a latent representation (which they formulate using a probabilistic model). I have a big armchair-linguist issue with the idea that style is independent of content, because if you consider content as meaning then a lot of meaning is conveyed through how someone says something, and indeed even in their examples, they consider 'sentiment' as style, in which case I actually don't know what they mean by content. They actually mention in the introduction that one can only hope to approximately separate style and content even with parallel data, but they never really clearly define what they mean by 'content' of a sentence.
Deep Multi-task Gaussian Processes for Survival Analysis with Competing Risks - Ahmed M. Alaa · Mihaela van der Schaar. The risks are competing because the patient can only die from one thing. The model attempts to produce survival times (time-to-event) using a deep, multi-task (since multiple risks) Gaussian process. They use an intrinsic coregionalisation model for the kernel functions to account for multiple outputs, which models task(=output) dependence independently of input dependence, but simplifies calculations a lot (I tried to build a more complicated multi-task kernel function once and it was a big mess). They also point out that using a deep GP alleviates some dependence on the exact form of the kernel function. This work (unsurprisingly) uses the 'old' (2013) work on deep GPs, so I wonder how much it would benefit from the improved deep GPs (see above).
Unsupervised Learning of Disentangled Representations from Video - Emily Denton, Vighnesh Birodkar. They want to separate time-varying and stationary parts of a video. Then you can predict future frames by applying a LSTM to the time-varying components. That's pretty neat! How do they achieve this? They use four networks - two encoders (one for scene (stationary information), one for pose (time-varying)), a decoder which maps pose and scene vectors to produce a frame, and a scene discriminator which tries to tell if pose vectors came from the same video. They construct loss terms to impose their constraints (separating time-varying and static elements), including some interesting adversarial loss terms.

Posters

The quiet poster hall on Monday morning.

My experience of the poster sessions suffered the most as a result of jetlag, so I ended up looking at far fewer posters than I would have liked (even accounting for my eternally overambitious plans for poster sessions). This was also the first year where I got invited to ~cool parties~, so I went to some of those, too.

The hall for the posters included what seemed like gratuitous space between rows, but it filled up rapidly (the crowd at the Capsules poster was sizeable). I admit I always think about roller derby these days when I'm trying to get past crowds of people, but hip checking strangers isn't a great way to do poster sessions (I assume).

My poster session strategy is the following:

before the conference: go through the list of papers and note the interesting ones
don't leave any time to actually read the papers
forget about the list, fight through crowds of large men to peer at poster titles
eventually, learn things

A humble plea to poster presenters: please don't stand directly in front of your poster while you're talking about it, I can't see and I don't want to get so close to you that you start talking to me.

Here's a little caveat about this part of the blog post: I didn't visit all these posters. I'm just taking the opportunity to mention more interesting papers.

The Neural Hawkes Process: A Neurally Self-Modulating Multivariate Point Process - Hongyuan Mei, Jason Eisner. Alongside optimal transport, Hawkes processes appeared in my radar of possibly-interesting terms this NIPS, so I decided to take a look at this paper. I got so engrossed that I realised I was actually reading the paper (I usually do a cursory skim to produce these summaries), so I've had to stop myself in the interest of giving other papers a chance. In short: a Hawkes process is a kind of non-homogeneous Poisson process (the rate of the process can vary in time) where events can increase the probability of future events (the events are self-exciting). In this work they generalise the Hawkes process (allowing for inhibitory events, for example) and use a continuous-time LSTM to model the intensity functions of the given events. Also, they use a meme dataset (amongst others) to train the model, so the paper includes amusing lines like

"We attribute the poor performance of the [non-neural] Hawkes process to its failure to capture the latent properties of memes, such as their topic, political stance, or interestingness".

The idea of trying to study memes computationally is funny, because even humans barely understand memes.

Example of a typical "meme"

Dilated Recurrent Neural Networks - Shiyu Chang, Yang Zhang, Wei Han, Mo Yu, Xiaoxiao Guo, Wei Tan, Xiaodong Cui, Michael Witbrock, Mark Hasegawa-Johnson, Thomas S. Huang. Like a dilated CNN, but... an RNN. They achieve this using dilated recurrent skip connections. This is different to the usual skip connection (which takes information from some previous state of the RNN) in that it doesn't rely on the immediately previous state. That's what makes it a dilation. You can stack layers with different dilation lengths to get a sort of 'multiresolution' RNN. If this sounds similar to the Clockwork RNN, you're right, but see section 3.4.
Z-Forcing: Training Stochastic Recurrent Networks - Anirudh Goyal, Alessandro Sordoni, Marc-Alexandre Côté, Nan Rosemary Ke, Yoshua Bengio. Yes, I care a lot about RNNs. I work on (medical) time series data, if that wasn't already apparent. This paper adds to the growing work on combining deterministic RNN architectures with stochastic elements (like state space models), hitting an intractable inference problem, and using variational inference with a RNN-parametrised posterior approximation. So what's new here? They observe that these models can often neglect to use the 'latent' part (the stochastic elements), so they add a regularisation term to the ELBO which 'forces' the latent state at time t to be predictive of the hidden state of the backwards-running inference network. And this works better, empirically. When I first saw this paper I panicked because the title makes it sound very similar to an idea I have been cooking up, an idea which I got stuck on because I was trying to explain an additional regularisation term in terms of a prior (on something). But these authors just go ahead and use a regulariser without any probabilistic interpretation, so it's probably fine to do that. Note to self: not everything has to be mathematically beautiful.
What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? - Alex Kendall, Yarin Gal. I first learned about (and immediately loved) aleatoric and epistemic uncertainty in my applied Bayesian statistics class back in Cambridge, so despite not featuring RNNs, I was interested in this work. In this context, aleatoric uncertainty is the uncertainty inherent to the observations, whereas epistemic uncertainty arises from uncertainty about the model parameters (which could in principle be reduced with more training). So this work studies epistemic and aleatoric uncertainty in deep networks (for computer vision), and shows that modelling aleatoric uncertainty improves performance in semantic segmentation and depth regression.
Fast-Slow Recurrent Neural Networks - Asier Mujika, Florian Meier, Angelika Steger. Phew, back to RNNs. This work proposes a RNN architecture attempting to combine the advantages of multiscale RNNs and deep transition RNNs. Basically, it's a 'new model architecture' paper. They show good results on two language modelling tasks, and do further analyses of the properties of their model. Multiscale (temporally speaking) data is extremely common in medicine, so something like MIMIC-III would have been a great test-case for this model as well. Maybe I'll find a masters student to explore this (I obviously don't have time because I spend all my time writing blog posts).
Identification of Gaussian Process State Space Models - Stefanos Eleftheriadis, Thomas F.W. Nicholson, Marc Peter Deisenroth, James Hensman. A lot of work focuses on inferring the latent states of a GP state space model. Here, they (also) look at learning the model itself. An important difference between your typical GP setting and the GP-SSM is that the inputs to the GP in the latter case are latent states (of the state space model), so they have to infer both the latent states and the transition dynamics (that's the model). They use variational inference with a bidirectional RNN as the recognition network, so you know I'm on board.
On Fairness and Calibration - Geoff Pleiss, Manish Raghavan, Felix Wu, Jon Kleinberg, Kilian Q. Weinberger. This work seems to be a follow-up to this paper which was written to analyse this piece on bias in criminal sentencing from ProPublica (ProPublica also followed up on this and other research following their investigation). So first up: it's awesome to see academic research and investigative journalism interacting in this way. In the precursor paper they provide an impossibility proof (which is given a simplified geometric proof in this paper) for simultaneously satisfying calibration and equalized odds (equal false positive and false negative rates between groups). As hinted in the precursor paper, relaxing the notion of equalized odds (for example, sacrificing equal false positive rates) may allow you to keep calibration, and that's what they show in this paper.
Causal Effect Inference with Deep Latent-Variable Models - Christos Louizos, Uri Shalit, Joris Mooij, David Sontag, Richard Zemel, Max Welling. The focus of this work is in account for confounders (by modelling them as latent variables) while doing effect inference, particularly in the presence of noisy proxies of true confounders. They achieve this using a 'causal effect variational autoencoder' (CEVAE).

Machine Learning for Health (ML4H)

I speculate that they're moving away from the previous acronym (MLHC - machine learning for health care) due to a collision with the MLHC conference (previously abbreviated MUCMD). Apparently MLHC (the conference) will be in Stanford in 2018, which is a shame because I feel I should attend it, but I really didn't enjoy travelling to/from California for NIPS. Also, I think conference organisers should be avoiding the USA (or any other country with restrictive or racist visa policies) if at all possible right now.

Anyway. The workshop, unrelated to the MLHC conference, was an all-day affair on the Friday of NIPS. There were all the usual things: invited talks, spotlight talks, (frustratingly short) poster sessions, and people crammed into one room for 8 hours. I missed the panel because I was stuck at lunch, and I missed Jure Leskovec's talk because I was ~ networking ~. For the rest, I took some notes.

Talks:

Zak Kohane - AI in Medicine That Counts. He distinguished between AI that does what doctors do, AI that does what doctors could do (but don't), and AI that does what doctors can't do. I am reminded of this post from Luke Oakden-Rayner which distinguishes between tasks we're building ML systems to solve, and tasks which doctors actually do. They're not the same, and Kohane made the point that they need not be, in general. We can see gains in outperforming doctors on e.g. diagnostics, but we can also see gains in doing analyses doctors simply can't do (because they're not computers). Kohane gave an example of a child with ulcerative colitis who was saved from colonectomy after they ran a gene expression analysis on children with similar irritable bowel disease and identified an effective drug (indirubin). He also provided a good comparison between medicine and what Deepmind has been achieving with AlphaZero (on Go and other games). Achievements like AlphaZero make people think AI is about to take over the world (from what I can tell), but medicine is far from AI-led mastery:
- it's non-deterministic
- it's not fully observable
- the action space is not discrete
- we have no perfect simulators
- 'episodes' in medicine are not short (consider the number of seconds in a typical ICU stay, consider a person's entire life...)
- evaluation is unclear and slow
- trial and error is not an option (outside of controlled trials, and even then the trial is highly constrained)
In his list he also included that we have huge datasets of human play (for games like Go), but I think medicine is getting there towards having large datasets (at least locally), so I don't count this as a fundamental limitation. He then went on to discuss the money end of medicine, which I'm not a fan of, but if you're to be pragmatic, you gotta understand the game you're playing. He made a point that we may come up with cool technology to improve medicine in different ways, but unless a business argument can be made for it, it likely won't be adopted. This is more clearly true in the US where healthcare is more of profit-oriented than in other countries (e.g. those with socialised healthcare systems) - ML4H @ Socialised Healthcare edition, anyone? We can have it in a neutral country! (Joking aside, I am legitimately interested in the opportunities for ML to benefit from and improve socialised healthcare systems - data centralisation is an obvious point, but perhaps other types of problems are more immediately pressing in systems like the NHS, than they would be in the USA...)
Jennifer Chayes - Opportunities for Machine Learning in Cancer Immunotherapy. The immune system is an incredibly complicated and therefore cool system, and cancer immunotherapy is a very very cool use of the immune system. With the caveat that I'm not an immunologist, the tl;dr of cancer immunotherapy is: tell your immune system to target and kill cancer cells. This may be what the immune system does already, to some extent. T-cells identify specific antigens, and direct the rest of the immune system to kill cells presenting those antigens. (How do T-cells know what to identify? The thymus is the coolest organ you've never heard of.) So the challenge is to train T-cells to specifically recognise your cancer cells, but there are lots of possible (neo)antigens. You can formulate this as a matrix completion problem (T-cells v. antigens) to predict the response of new T-cells. She also described work they did for predicting response to checkpoint inhibitors (a type of cancer immunotherapy), highlighting the value of building relatively simple models on small data.
Susan Murphy - Challenges in Developinging Learning Algorithms to Personalise mHealth Treatments. This was about the HeartSteps project, which tries to encourage physical activity in people who have completed cardiac rehabilitation. That is, it's an app that encourages you to go for a walk. This is a problem of sequential decision making. To maximise positive outcome (more time walking), what sort of notifications should the app send, and when? If someone is driving, you shouldn't bother them. If they just walked somewhere, or are in the middle of walking, you shouldn't tell them to go for a walk. They model it as a (contextual) bandit problem, and have to deal with noise in the data, nonstationarity (the expected reward function changes over time), and that there are longer-term delayed effects from actions. Unsurprisingly (to anyone who's used apps that send them push notifications), after a while people just start ignoring them, and the result of interventions diminish. While the intentions in this work are noble, I can see creepy unintended uses of research like this into user engagement (like this horrible startup). Technology is always a double-edged sword, but if we have to be subjected to personalised advertising and addiction mechanics in games, and so on, at least fewer people should die of heart disease, right?
Fei-Fei Li - Illuminating the Dark Spaces of Healthcare. I think that was the title. She spoke about three projects in healthcare that use computer vision, and the room was packed. At first I thought everyone suddenly loves healthcare, but then I remembered that Fei-Fei Li is famous. The projects were all about activity recognition from non-RGB video (they had depth sensors and IR video if I recall - these alleviate some privacy concerns). First she spoke about identifying hand-washing to tackle hospital acquired infection. One challenge was in activity recognition given unusual (for research) viewpoints, e.g. cameras on ceilings looking directly down. The second project was about ICU activity recognition, to better understand what people spend time doing in the ICU. The priority here was efficiency, so they developed methods to analyse video which don't require analysis of every single frame, saving a lot of compute while still achieving high performance (on standard video understanding datasets). Finally, she spoke about applications in independent senior living, such as fall detection. This in particular is challenging due to limited training data and rare events (thankfully). They propose to use domain transfer to aid in the data scarcity issues, but she pointed out that much of this work is still in progress.
Jill Mesirov - From Data to Knowledge. I am doubtful this was the title of her talk, but we'll run with it. The topic was medulloblastoma, which is one of the most common forms of paediatric brain tumour. 70% of children survive, but only 10% go on to leave independent lives. Their focus is in predicting relapse, which they achieve using a probabilistic model incorporating various clinical and genomic features. She then went on to describe a project to identify novel therapeutics for an aggressive subtype of medulloblastoma driven by Myc (this is a gene). Through mouse xenograft experiments and expression profiling, they found this subtype is likely sensitive to CDK-inhibitors, and found they could extend survival (in mice) by 20% with palbociclib, suggesting a candidate treatment. This sort of analysis is sort of 'well known' to me because my lab (alongside machine learning) works on cancer genomics, but I'd also like to pause for a moment to reflect on two things:
1. As with the example from Zak Kohane (about indirubin), a lot of the time (translational) computational biologists are hunting for threads - persistent patterns in the disease which indicate possible vulnerabilities, which they can then follow up by looking for matches in drugs with known targets. If you can optimise any point in that process, you can probably save someone's life, some day.
2. A 20% extension in survival is clinically significant, but it's not a cure as we think of it. For mice it's measured in days, for humans probably one or two years if not months. For some cancers, especially brain cancers, this is still where we're at. Fighting cancer is really, really hard.
Greg Corrado. I just stopped writing down the titles at some point. He spoke about a few different projects:
- Diagnostics: doctors working alongside algorithms to work better/faster. Examples from Google Brain: screening for diabetic retinopathy (on par with ophthamologists), reading breast cancer biopsies.
- Care management/decision support: the idea is to have smart electronic medical records, to help reduce errors and improve care quality. Having observed clinicians interacting with EMRs, I see a lot of potential for improvement here.
He mentioned challenges with processing medical data because of how messy it is and I just laughed and laughed and then cried (silently). Apparently they built some sort of FHIR-based pipeline to integrate data from six healthcare systems, and it worked well, but I didn't write down what they were doing at the end of the pipeline. He also gave a shout-out to Google's newly open-sourced variant caller, DeepVariant.
Mihaela van der Schaar - Dynamical Disease Modelling. Her work focuses on dynamical modelling, assuming some hidden clinical state which informs observable physiological variables. You could approach this using a hidden Markov model, but she observed that transition probabilities typically depend on sojourn times, necessitating a semi-Markov model. Furthermore, patterns of missingness are informative, suggesting to model observation times, e.g. as a Hawkes process. The informativeness of measurements in medicine may not be immediately obvious, but the rationale (at least in the ICU, my area of focus) is that some measurements are only taken when needed, and they're only needed when the doctor suspects something is up. Even if a measurement is routinely performed, the rate of measurement may increase when patients become more critical. So you have a huge case of missing-not-at-random. She also mentioned their work on modelling competing risks, which I described earlier in this blog post.
Atul Butte - Translating a Trillion Points of Data into Diagnostics, Therapies and New Insights in Health and Disease. I didn't take notes for this talk, but his slides are here - I'd recommend slide 29. In case that link at some point goes dead, that slide summarises lessons he's learned in MLHC over the years, and these are (paraphrased):
- Solve the problems that health care professionals need solved, don't just guess
- Watch out for models limited by bad inputs (e.g. from patients, from doctors)
- Learn what IRB, HIPAA, BAA, ICD-10 codes, CPT codes, CLIA, and CAP are.
- Learn patience.
- Not everything needs deep learning.
- Having all the data on someone is super rare.
- Health care inefficiency is not about friction. (He made a point that everywhere there's a cost, someone is making money and will push back against losing that money.)
- Data integration can happen, if there's a business reason for it.
- Platforms and companies are commoditized. (As subpoints to that he suggests the ML people should come with some medical knowledge, to demonstrate we care about healthcare, and so we don't cost medical collaborators time training us.)
Another point he made was that there's a lot of freely-accessible data out there, which is ripe for analysis. And possibly founding startups.

As I mentioned, there were two poster sessions. I spent the first one presenting my poster, and much of the second one talking to people, so I didn't get to see too many posters. I've described a lot of work from other people in this post, so let me do the same for myself. At WiML and ML4H I was presenting (variations on) this poster: (right)

Summary of the related paper:

Real-valued (Medical) Time Series Generation with Recurrent Conditional GANs - Stephanie L. Hyland (that's me) and Cristóbal Esteban, Gunnar Rätsch. (For disclosure: the version that was accepted by ML4H was a 4-page version of this preprint, focusing on the medical data and aspects. They asked us to give links to the arXiv versions of our work, but I couldn't in good faith link to the full version as it wasn't reviewed by them. In case you noticed and were wondering why there's no link to the paper the workshop page, it's because of my conscience).

The motivation for this work was that MLHC struggles with data sharing. Medical data is hard to share, with good reason. But it means a lot of work in MLHC is completely unreproducible, and nobody can directly build on it, because they don't have access to the data/task a model was built for. This stifles our progress, and MLHC is hard enough already. So wouldn't it be great if we had a synthetic dataset (without privacy concerns) that we could use to benchmark models and approaches? Shoutout to this related paper with similar motivation from Choi et al.: Generating Multi-label Discrete Patient Records using Generative Adversarial Networks (they focus on binary and count-valued data, hence our focus on real-valued time-series data).

I'd summarise what we did in this work in three points:
1. Devise a GAN architecture to generate real-valued time series. We call this a 'recurrent' GAN, or RGAN, because it uses RNNs for both discriminator and generator networks (yes, RNNs!). We also have a conditional version which takes label information, allowing the RGAN to generate data from labels.
2. Devise an evaluation scheme for GANs tailored to our setting. We do this by generating a synthetic training dataset from the RCGAN (labels + features), training a classifier (e.g. CNN, random forest) on it, and reporting its performance on a held-out real test set. We call this the TSTR (train on synthetic, test on real) score. Since we want to use the RGAN to generate synthetic medical data, the TSTR score is of particular relevance.
3. Analyse empirically whether the RGAN is 'overfitting'. By this I mean, we ask (roughly) if the GAN is more likely to produce samples very similar to training samples than it is to produce other samples (from the same distribution, e.g. the test set). If it is, then we have a problem. Firstly because reproducing the training set is boring and does not require a GAN, and secondly (more importantly) because reproducing the training data set would constitute a serious privacy breach in our setting.
On the final point, we also experimented with training the RGAN using differential privacy, just to be extra safe. If you're willing to sacrifice performance you can get some privacy, but it's a harsh trade-off and requires further research.

I held a small reading group in my lab about interesting contributions from the ML4H workshop, so I'll briefly summarise two papers of interest to me:

Generative Adversarial Networks for Electronic Health Records: A Framework for Exploring and Evaluating Methods for Predicting Drug-Induced Laboratory Test Trajectories - Alexandre Yahi, Rami Vanguri, Noémie Elhadad, Nicholas P. Tatonetti. My reason for interest should be obvious. Also, the first author emailed me to get help with our code, which possibly means they used it. I spent some time answering issues on GitHub and responding to emails, and I'm still quite a junior scientist, so it's really exciting for me to see people taking interest in and actually trying to use my work. Anyways, in this paper, as far a I understand it, they're generating cholesterol time-course data before and during exposure to statins. They do two interesting things: 1) Clustering patients based on a large set of clinical attributes, then training separate GANs on each cluster. 2) Evaluating the performance of the GAN by measuring how well it 'predicts' cholesterol level during statins exposure. They do this by matching generated samples to the closest real (hopefully test-set) sample based on the pre-exposure part of the sequence, then measuring the similarity of the synthetic and real samples during statins exposure. This evaluation method seems a little brittle - imagine there are multiple real samples that look similar to the synthetic one, but respond to statins quite differently, but it's an interesting idea.
Personalized Gaussian Processes for Future Prediction of Alzheimer's Disease Progression - Kelly Peterson, Ognjen (Oggi) Rudovic, Ricardo Guerrero, Rosalind W. Picard. I haven't spent enough time with this paper to fully understand it, but the most interesting aspects are: fitting a GP model to a source population, and personalising (i.e. tuning) it to an individual based on their observed data to date using domain-adaptive GPs, and using auto-regressive GPs. Various kinds of GPs. No RNNs.

Conclusion

This has been an exceedingly long blog post and I hope you're not as exhausted as I am, but this is basically an accurate depiction of my experience of NIPS. A lot of stuff, all the time. I have not even mentioned the Bayesian Deep Learning workshop. During the lunch break on the final day I grabbed a burrito and almost fell asleep. I was not the only one. The convention centre by that point was gradually emptying, with scattered people dozing off in chairs, and a prominent left-luggage zone where the registration tables had been. There was a clear sense of winding down, perhaps because the process had already begun for me. I stayed only briefly at the closing party (missing some unpleasantness, it sounds like), and instead walked/skated thoughtfully back to my Airbnb along the beach, pausing to look at the stars and listen to the Pacific Ocean.

bloodseeker costume

2017-10-31T00:00:00+00:00

Here are a bunch of WIP pictures of the Bloodseeker (Dota 2) cosplay I did for MCM London Comic Con on the weekend. This is the first non-Halloween costume I've made, and the first non-trivial sewing project I've done, so it's far from perfect, but I'm pretty happy with how it turned out overall.

For reference, here's what Bloodseeker looks like... (yes, this is a photograph of my screen)

And for further reference, here's what my cosplay ended up looking like... (my face doesn't usually look like that)

I took a lot of progress pictures of the crafting, so I've tried to find a smallish number for each of the components. Most of the crafting took place in my boyfriend's parents' house, saving me the need to fly with two glaives in my bag. Also he has a heat gun.

The fabric bits

This costume was my introduction to using a sewing machine (save a day-long class I took with the Thrifty Stitcher in London), so I kept it two two stitches and fumbling around making patterns.

Never have I measured myself so much.

I heard you're supposed to use muslin? Also doubles for making cold-brew coffee. This was me trying to deal with the fact that my waist is sort of, but not exactly, cone shaped. It turns out those little tucks I did are called 'darts', and I didn't invent them.

Originally had this elaborate thing where the flappy bit on the loin cloth would wrap up and around from the back of the waist band. I wanted it to fall right, but I do not understand fabric nearly well enough.

My sewing machine is a Janome Sewist 525S, because that is exactly the one I used at the aforementioned sewing class, so I avoided doing any market research at all.

Sewing interfacing onto the inside of the front of the cape, on the tiniest ironing board. I wanted it to be a bit stiff, so it would retain a mostly-round shape even if there were bones and stuff on it. I think this worked okay.

Comparing different shades of black/red paint/pen on the fabric. Is painting acrylic onto fabric a thing you're supposed to do? I did a lot of that.

This is a collar. I improvised how you're supposed to do collars. I am looking forward to learning how to actually sew things.

Plain red fabric bits! I kept them in this state for ages because I was afraid of destroying them with painting. One day the skates on my floor will kill me.

Drawing patterns onto the cape. I forgot to get like... anything you're supposed to use to draw onto fabric, and it was Sunday (which means everywhere in Zurich is closed), so I just #yolo'd it and directly drew with an 8B pencil. This particular section was also improvisation, because I couldn't find an angle of Bloodseeker where you can actually see the top of his shoulders/back of his neck. There is no evidence Bloodseeker even has a neck. Slit in the back of the head hole to enable me to put it over my massive head. (My head is massive, my roller derby helmet confirms this. Massive.)

Painting acrylic onto fabric. I accidentally bought so much red/black paint! I preferred how it looked when it was sort of vaguely-shaded, rather than solid colours in the end. I will make a costume with more exciting painting in it next time.

These wraps did not work out on the day. I did not think about how to get wraps to lie flat and not just slowly fall down my calves and bunch around my shoes. There's also supposed to be a red trim on this, and I did cut it out/sew it, but I didn't have time to do the final assembly. Things to fix for next time.

Sewing myself into the hand wraps was interesting.

bones

The basic construction of the bones is a papery core with a plaster wrap ... wrapping. Because I obsessively hoard things I suspect may be one day useful, I have an entire box of generic recycled sheets of paper. I managed to use almost a tenth of the box making these bones. Next costume: mostly paper?

Applying plaster wrap is extremely satisfying. I spent an afternoon making these while listening to podcasts. If you ever find you're too easily distracted by your phone/computer, you should try covering your hands with plaster.

Painted a few coats of English breakfast tea on these bad boys. Check out that extremely Swiss shopping bag.

helmet

EVA foam! Drafting a helmet: challenging. Building a rugby ball: all too easy. We didn't have a dremel so you can see the jagged edges where I tried to hack at the edge of the foam to create a surface I could join.

Not pictured: sanding some extremely angular corners. Still too angular. These angles haunt my nightmares. Learned I love sanding (somehow??).

Helmet... wings? Flaps? That horrible join is what the angular corners looked liek before the extensive sanding. I think if I do this costume again I will just make the helmet out of fake leather.

Painting on foam is easier than fabric. I actually ended up going over this paint, because it's better if you go EVA foam -> PVA glue -> spray paint -> normal paint, apparently.

I'm pretty proud of the solution I came up with for attaching + fanning the feathers. That's a strip of softer craft foam, which I then painted black and glued to the inside of the helmet. Also, because the helmet is shaped like a rugby ball, there's space for feather shafts in the back.

glaives

I made these in the ~1.5 days before MCM, but they were a lot easier than trying to make any clothing. Trying to get things to fit is really hard. For the glaives, I just had to measure how long my forearms are.

The basic idea of the glaives is that they're a sandwich of EVA foam (the handles) around a thin-craft-foam-worbla sandwich (the blades). I decided to do it this way because it seemed physically plausible, and would result in something that wasn't too bulky-looking. This photo shows the thin and EVA foams, pre-sanding.

Worbla's Finest Art ready to be sandwiched around the thin craft foam. I did a sort of partial sandwich, where the worbla didn't completely enclose the foam (see the black extending beyond the gold), to give a sharper blade edge.

Half-sandwiching in progress. Working with Worbla is quite nice! Once we realised you can turn the heat gun down from 600C it all got a lot easier.

Love sanding.

Sanded EVA handles + partial worbla sandwich + sharp foam edge. Part of me thinks they looked better like this than post-painting. I think it's to do with the texture - acrylic paint introduces a very... plastic texture.

Post painting! Ignore the horrible worbla seams! I had to make up some designs to put on the handles. My boyfriend did most of the painting of the glaives, because I was mysteriously busy with something else at the time (possibly painting acrylic on fabric). Also, he knows how to paint things to look like metal.

Here I am with my boyfriend in his costume!

The last two things I did with the glaives were: painting shading to try to make them look a bit more three-dimensional, and painting blood on. I bought some fake blood, but it just sort of sat on the surface of the blades, jamlike.

conclusion

I think some people who make costumes also make an effort to get good photographs of themself wearing it. I did not do this. (Luckily, the (only) other Dota 2 cosplayer I met at MCM got some photos of us here). Next time, I will get some better photos.

NIPS 2016

2016-12-16T00:00:00+00:00

We return for another installment of Stephanie Summarises a Conference. My previous work in this area is NIPS 2015, AAAI 2016, and ICML 2016. I was pleasantly surprised at NIPS to be asked if I was going to write one of these again. Apparently someone somehow found my blog. Ignorance of this is one of the downsides (??) of not having creepy tracking analytics.

This time we get a table of contents so I can be guiltlessly verbose (I fear how long my PhD thesis is going to be):

Women in Machine Learning
Main Conference
- Invited Talks
- Posters
Machine Learning and the Law
Machine Learning for Healthcare
Miscellaneous Comments/Observations
Conclusion

Women in Machine Learning Workshop

"What are women and how can machine learning stop them?"

I didn't register for WiML in time last year, so this was my first time attending. I also managed to miss all the Sunday events by arriving to Barcelona at midnight that night. There was a workshop on Effective Communication where I could perhaps have learned how to write shorter blog posts.

My feelings about having 'woman-only/woman-centric' events are complex, poorly-understood and otherwise beyond the scope of this particular post, but the reality is that women are wildly underrepresented in computer science and machine learning is no exception (about 15% of the 6000-odd NIPS attendees were women, and I don't know what fraction of those were recruiters). I'm so used to being surrounded by men that I barely notice it (except for the occasional realisation that I'm the only woman in a room), so having a large conference hall full of women for this workshop was a bit surreal.

Interesting talks/posters:

(talk) Maithra Raghu, On the expressive power of deep neural networks. They study the expressive power (ability to accurately represent different functions) of neural networks and show that this depends on a quantity they call 'trajectory length'. There's also a companion paper, Exponential expressivity in deep neural networks through transient chaos.
(poster) Niranjani Prasad, Barbara Engelhardt, Li-Fang Cheng, Corey Chivers, Michael Draugelis and Kai Li. A Reinforcement Learning Approach to Weaning of Mechanical Ventilation in ICU: relevant to my ICU-interests, but this poster was unfortunately on the other side of the board to mine, so I only got to look at it briefly. They're using MIMIC-III, looking at pneuomnia patients and the question of intubation. A challenge was engineering the reward function, which required consultation with clinicians.
(poster) Luisa M Zintgraf, Taco S Cohen, Tameem Adel and Max Welling. Visualizing Deep Neural Network Decisions. They propose a 'prediction difference analysis' method to visualise regions of an image which either support or oppose a particular prediction. This is based on assigning 'relevance' to parts of the input, based on the 'weight of evidence' a particular input gives to a certain class. This is a pre-existing idea, and a cursory glance at the paper doesn't highlight what's novel about their approach - possibly applying it to deep networks? Extending it to analysing the influence of multiple features at a time, possibly?

I accidentally presented my poster for most of the poster session and therefore missed out on going around to others. This is a compelling argument for having co-authors who can share the load. For the record, the work I was presenting was Learning Unitary Operators with Help from u(n), which I did with my advisor Gunnar Rätsch, and which will be appearing in AAAI-17. I also presented it at the Geometry in ML workshop at ICML, see my post here.

Roundtables

What I found especially valuable and unique about WiML were the roundtables - one for research advice, one for career guidance. In each one there were subtables for specific topics, with 'experts' to extract wisdom from.

I shamelessly hogged space at the healthcare research roundtable in the first session to listen to Jennifer Healey. She's a researcher at Intel Labs working on using sensor data for human health. That is, if you have continuous audio recording (as one can get from a phone microphone), you can identify a person coughing, measure qualities of it, its frequency, onset and so on. This information is incredibly valuable for making diagnoses and treatment decisions, and it's the kind of data that one could reasonably imagine everyone collecting in the future. One thing I really enjoyed about the discussion was that she was quite aware of the HORRIFYING PRIVACY IMPLICATIONS of this kind of data, and the need to avoid storing (and calculating on) this data on The Cloud. I'm really excited about this avenue of healthcare (as I say every time it comes up) and I'm really glad to hear a senior researcher from a big company talking about the importance of the privacy considerations. As was mentioned in the ML and Law symposium, all personal data you collect is a privacy vulnerability. But collecting this data could have such massive positive healthcare implications that 'solving' the privacy problem is really important. Especially if the data is going to end up getting collected anyway...

The second roundtable I went to (about careers/advice), I spoke to some people at Deepmind about working there (me and everyone else at NIPS, it feels like...), and some other people about how to decide between industry (that is, industrial research) and academia. Both experts at the industry/academia table were in industry, so I'm not sure I got an unbiased perspective on it. The context for all of this is that I'm a 'late-stage' PhD student (the idea of that is rather scary to me - there's still so much to learn!), so I'm looking for internships (got any spare internships? contact me) and thinking about post-PhD land. The most concrete difference I learned about was that in companies, you may need to send your paper to the legal team before submitting it to a conference, in case they want to patent something first. I'd imagine this also applies to preprints and code and so on. Otherwise, the level of intellectual freedom one enjoys seems to vary, but everyone I spoke to (from a biased sample) seemed largely unconstrained by their industrial ties.

I'd imagine there's a gulf of misery between brand-new startups that have yet to become overly concerned with Product, and established tech companies with the luxury of blue-skies research labs, where you don't get to do cool things and instead must live in a box desperately trying to demonstrate the commercial viability of your research. I'd also imagine that said box-dwellers don't attend roundtables (how do you fit a round table in a square box?).

The final notable thing that happened at WiML was me apparently winning a raffle, but being shamefully absent. I was upstairs charging my laptop and catching up with a friend from MLSS, blissfully ignorant of the prize I would never receive.

The Main Conference

Invited Talks

The main conference opened with a talk (the Posner Lecture) from Yann LeCun. LeCun is famous enough in machine learning that people were excitedly acquiring and then sharing selfies taken with him (a practice I find puzzling), so the things he said will likely echo around the community and I need not repeat them in detail here. In gist he was talking about unsupervised learning (although focusing on a subtle variant he called 'predictive learning'). He used a cake analogy which spawned parodies and further cake references throughout the conference/social media. The analogy is that reward signals (as in reinforcement learning) are the cherry, labels for supervised learning is the icing, and the rest of the cake is essentially unlabelled data which requires unsupervised learning. The growing importance of unsupervised learning is not new, I can say from my intimidating one year of previous NIPS conferences.

Marc Raibert from Boston Dynamics gave an entertaining talk about dynamic legged robots. This featured many YouTube videos I'd already seen, but was happy to gormlessly rewatch. One amusing thing is the fact that they can't use hydraulics in domestic robots, because they leak. That's a great example of a real-world problem. It might be common knowledge amongst roboticists, but 'you can't use hydraulics because nobody wants oil and stuff on their carpet' would not have occurred to me if I for some reason needed to design a robot. Now, maybe I would not need to design a robot directly, but it's not entirely unlikely that I could design an algorithm making assumptions about the kinds of movements, or the cost of those movements, that a robot could make. And this is why 'domain experts' will always be needed. Probably.

At the end of the talk, someone asked if Boston Dynamcis uses machine learning. They do not. Maybe they should?

Saket Navlakha spoke about 'Engineering Principles from Stable and Developing Brains'. Part of this talk was based on this PLoS CB paper where they compare neural network development in the brain to that of engineered networks. In brains, connections are created rapidly and excessively, and then pruned back over time dependent on use (they demonstrate this in mouse models). This is to be contrasted with engineered networks, where adding and removing edges in this way would be seen as wasteful. They demonstrate however that the hyper-creation and then aggressive pruning results in improved network function. They're particularly interested in routing networks, so the applicability to artificial neural networks is not immediately apparent.

Susan Holmes gave the Brieman Lecture, which exists to bridge the gap between the statistics and machine learning communities. This was the single talk of the conference where I took notes, because the relevance of the topic to me and others in my lab overwhelmed the need to preserve precious limited laptop battery. The title of the talk was "Reproducible Research: the case of the Human Microbiome", and so was mostly a story about how to do reproducible research, in the context of microbiome analysis. One really cool thing she mentioned was a web application called shiny-phyloseq, which seems to be an interactive web interface to their phyloseq package. However, it also (I think) records what you do with the data as you explore, which you can then export as a markdown file to include with your paper. I try to emulate this by pipelining my analysis in bash scripts (or within python), but having something to passively record as you interactively explore data seems additionally very beneficial. The garden of forking paths is a risk during any data exploration. Also, the garden of forgetting exactly what preprocessing steps you did.

There was a touching memorial to Sir David MacKay during one of the sessions. It's easy, as an early-stage scientist, to get swept up in the negative aspects of academic culture (looking at you, Publish or Perish) and lose sight of the reasons for doing any of this. Hearing about scientists like MacKay, who both think and care deeply, is genuinely inspirational. The only book on my Christmas wishlist this year is "Information Theory, Inference, and Learning Algorithms".

Interesting Papers/Posters

Necessarily, a subset of the interesting work.

Misc

Learning Transferrable Representations for Unsupervised Domain Adaptation - Ozan Sener · Hyun Oh Song · Ashutosh Saxena · Silvio Savarese - jointly learn representation, cross-domain transformation as well as labels to do better domain adaptation.
Examples are not enough, learn to criticize! Criticism for Interpretability - Been Kim · Oluwasanmi Koyejo · Rajiv Khanna - this was a great poster and spotlight talk. The idea is this: to help make sense of massive datasets, we ideally identify some 'representative samples' ('prototypes') which we can manually assess and use to generalise about the rest of the data. The danger is that there will be non-stereotypical data points, which are nonetheless represented in the data and should be considered. They call these examples 'criticisms', and describe an approach to generate both prototypes and criticisms from large datasets.
Disease Trajectory Maps - Peter Schulam, Raman Arora - the objective here is to find latent representations of patient trajectories, and then characterise them (i.e. through clustering). They use a fairly complicated probabilistic model to do this, so the more interesting details are in the paper. They also associate the representations with clinical outcomes to prove that they're 'clinically meaningful', comparing with some other methods of representing time series.

Reinforcement Learning

Cooperative Inverse Reinforcement Learning - Dylan Hadfield-Menell · Stuart J Russell · Pieter Abbeel · Anca Dragan - in traditional inverse reinforcement learning (IRL), the agent tries to learn the expert's reward function. However, to have benevolent robots, we would like them to maximise rewards for humans, not themselves. Additionally, in IRL the agent observes assumed-optimal expert trajectories, which may nonetheless be sub-optimal for learning - one would rather generate teaching, or demonstration trajectories. They formulate a solution to these concerns as a two-player game with learning and acting (deployment) phases.
Showing versus doing: Teaching by demonstration - Mark K Ho · Michael Littman · James MacGlashan · Fiery Cushman · Joe Austerweil · Joseph L Austerweil - this work focuses on the second issue raised in the previous one - how does a teaching trajectory differ from a doing trajectory? They formulate it as 'Pedagogical Inverse Reinforcement Learning'd. What's really neat about this work is that they actually did experiments with humans to validate their model's predictions about how people would behave while trying to teach versus simply doing.
Safe and Efficient Off-Policy Reinforcement Learning - Remi Munos · Tom Stepleton · Anna Harutyunyan · Marc Bellemare - 'safety' in this work refers to the capacity of the algorithm to deal with arbitrary 'off-policyness' (that is, the policy to evaluate and the behaviour policy observed need not be close), and 'efficiency' refers to using data ... efficiently. The work seems to combine previous approaches which are either safe or efficient into an algorithm enjoying the benefits of both, with various theoretical results.
Safe Exploration in Finite Markov Decision Processes with Gaussian Processes - Matteo Turchetta · Felix Berkenkamp · Andreas Krause - 'safe' here roughly has its common meaning. They address the issue where an agent, looking to maximise long-term (discounted, perhaps) reward, is willing to tolerate temporary very negative rewards. This is unacceptable for safety-critical agents - they used the example of a Mars rover getting stuck in a crater - so they develop an algorithm (SafeMDP) to safely explore, avoiding unsafe states/actions using noisy observations from nearby states. They also ensure the agent can't get stuck in states without safe escape routes.

Recurrent Neural Networks

Sequential Neural Models with Stochastic Layers - Marco Fraccaro · Søren Kaae Sønderby · Ulrich Paquet · Ole Winther - they combine state-space models (uncertainty about states) with recurrent neural networks (sequential, long time dependencies), and describe a variational inference procedure for the model.
Phased LSTM: Accelerating Recurrent Network Training for Long or Event-based Sequences - Daniel Neil · Michael Pfeiffer · Shih-Chii Liu - they add a time gate to the LSTM unit, which has a parametrized oscillation frequency, controlling when individual parts of the memory cell can be updated. This allows for irregularly sampled sensor data to be integrated and they demonstrate improved performance on long memory tasks. They also have really nice figures.
Full-Capacity Unitary Recurrent Neural Networks - Scott Wisdom · Thomas Powers · John Hershey · Jonathan Le Roux · Les Atlas - this is pretty relevant for/similar to my recent work, so I'm going to read this paper in detail later. My initial thought upon seeing the poster is that they have some really unnecessary mathematics in there, which also appears in the manuscript - the entirety of section three in their paper is self-evident. I'm a bit concerned that reviewers might think well-known mathematical facts restated as 'theorems' may constitute novel results. Anyway cattiness aside, their model is interestingly different to my approach - they optimise on the Stiefel manifold of unitary matrices directly (I optimise in the Lie algebra), although if you define the Riemannian gradient using inner products on the tangent space, this probably becomes equivalent in some sense. It requires further analysis. Their results seem quite impressive, although they don't do a comprehensive comparison on the same experiments as Arjovsky & Shah, which are the ones I'm familiar with. I had a nice conversation with one of the authors at the poster, which is really what conferences are about.
RETAIN: An Interpretable Predictive Model for Healthcare using Reverse Time Attention Mechanism - Edward Choi · Mohammad Taha Bahadori · Joshua Kulas · Jimeng Sun · Andy Schuetz · Walter Stewart - their focus here is to have an interpretable model, so the evidence used to make a decision is easily identified. They achieve this using an attention mechanism where the recurrence is on the attention mechanism, not on the hidden state. I'm not sure why RNNs should be seen as intrinsically uninterpretable (you can get gradients of cost with respect to any input, for example), so I'm going to think about this more. Interpretability is crucial for any medical applications.

Machine Learning and the Law Symposium

I was and remain to be confused by the choice of symposia. The options were: Deep Learning, Recurrent Neural Networks, and ML and the Law. RNNs aren't deep? What was the DL symposium covering? Deep but Not Recurrent Learning? Weight-Sharing Is OK but Not Over Time, Never Over Time? As evidenced by the title of this section, I didn't attend either of them, and I also didn't attend enough of the Counterfactual Reasoning workshop on Saturday to say what would have happened if I had gone to them, but there seems to be a naming/scope issue here. Whatever it was, the RNN Symposium was Hot Shit and had to switch rooms with us ML+Law people during the lunch break. As soon as the room change was announced, people started appearing at the fringes of the Law symposium and may have been inadvertently exposed to some meta-ethics. I'm not sure how this planning error occurred - it is natural to assume that most of the growth in NIPS attendance is coming from DEEP LEARNING, which should (??) include RNNs, so that symposium was likely to be popular. Maybe they thought enough people would go to the other DL symposium.

The real question is - did non-DL non-justice machine learners feel cheated of a symposium? Am I wrong to try to place the RNN symposium inside the DL one?

Having just published a paper (arguably) about RNNs, I should have gone to the RNN symposium, but I can't resist thinking about the broader social impact of machine learning. I've also found myself thinking about morality and justice (and therefore law) more than usual lately, so I had to attend this. Discussions of normative ethics at a machine learning conference? Yes.

I'd consider this symposium a law-oriented follow-on to the 'Algorithms Among Us: the Societal Impacst of Machine Learning' symposium at NIPS 2015 (see my summary here. Having a focus is good. The impacts of machine learning on society are widespread, so trying to cover too many all forces a shallower treatment. High level talk is well and good, but getting stuff done requires being specific. This is actually a point that was raised during one of the panel discussions: how do we balance the need in computational science to formulate very specific, quantified definitions of things (like discrimination) with the requirement of margin of interpretation in law? I was surprised, as a non-lawyer, to hear that such ambiguity could be tolerated, much less desired. The example given for this was in discussions where compromise may only be attained through baking some ambiguity into an agreement, which would then (I suppose) later be argued over as necessary. This leads to another point which was made - law is not a monolith, laws are not absolute immutable statements - law is a process, an argumentative tradition (at least in the US), evolving and iterating and requiring justification at all times (get it - justice pun!). How to integrate algorithms into this process is not as simple as treating them as Truth Functions (shout out to my main man Wittgenstein) on Evidence ... or is it? I get ahead of myself.

Legal Perspectives

Ian Kerr, Learned justice: prediction machines and big picture privacy. The 'learned' in the title is partially a reference to the US judge Learned Hand (what a name). A quote from him, "If we are to keep our democracy, there must be one commandment: Thou shalt not ration justice". As an example of a 'learned AI' he mentioned 'The World's First Robot Lawyer', which helps people generate appeal letters. It's actually a pretty standard chat bot, but it's helped to overturn over 160,000 parking tickets in London and New York, which is a massive impact ('helping to protecte vulnerable people from state coercion'). What could we do with more powerful algorithms? He then spoke about prediction, highlighting the links between prediction, preemption, and presumption. This brought us to the prediction theory of law, an idea coming from the legal scholar Oliver Wendell Holmes. This is the idea that 'the law' is simply about predicting what the courts will do, and nothing else. So the study of law is the study of prediction, not morality or anything else. He went on to talk about the 'reasonable expectation of privacy' which is required to understand the scope of the 4th Amendment of the US Constitution. The difficult part is not defining 'reasonable', but rather 'expectation'. What does this word mean? There are two interpretations: it could be normative, or predictive. The US courts have taken the latter stance, and one's 'expectation' of privacy therefore depends on what is possible with generally-available technology. This is terrifying - if I know my phone microphone is always on, and my phone is at risk of being hacked, do I lose the expectation of privacy whenever my phone is on me?
Mireille Hildebrandt, No Free Lunch. One particularly pertinent thing she spoke about was 'Data & Pattern Obesitas'. That is, there is a general desire to collect as much data as possible, to look for as many patterns as possible, simply because. This is dangerous for several reasons, the most obvious of which being that any personal data that is stored is a security risk (looking at you, Big Healthcare Databases). And so she highlighted the importance of salience of purpose, citing the security adage of 'select before you collect'. I think this idea likely goes against the inclinations of many researchers in machine learning/data science, who would rather grab everything, and do some sort of automated relevance detection later. This may be fine in certain domains, but when the data you're operating on is sensitive in some way, it can be fatal.
Deirdre Mulligan - Governance and Machine Learning: there was not so much machine learning in this talk, but she spoke about various ways technology and governance interact. Voting machines are one obvious place (and topical!). She spoke about how electronic voting systems failed to reproduce the traditional voting system. In pen-and-paper voting, the ballot is a physical artefact of the vote, but in these systems, apparently it was rendered on the fly and not saved. There was no storage of the ballot image, it simply incremented a counter somewhere in the backend. This is obviously a terrible system, but these machines were closed-source (!?!?!), so I guess nobody realised they were working like this until they reverse-engineered them? The mind boggles. Other examples are automobiles - you can hack them (like everything on the IoT), they were avoiding regulation (Volkswagen), product safety was compromised by software updates. The last case highlights the need for certification and verification of post-purchase software updates. If you want to run Windows XP on your computer that's your own business, but unsafe cars (from either software or hardware) are public safety risks.

Technical Perspectives

Aaron Roth: Quantitative tradeoffs between fairness and accuracy in machine learning - Rawls provides a definition of fairness, which is "fair equality of opportunity", which he formalised using a 'discrimination index' - the probability of victimisation (not being selected despite being the most qualified, I think) conditional on being present at a bad round (a round in which a sub-optimal applicant is selected). This was all formulated in a contextual bandit setting, and he described an algorithm called 'fairUCB' (from UCB - upper confidence bound, a standard bandit algorithm) and gave its regret bound.
Krishna P. Gummadi: Measures of fairness, and mechanisms to mitigate unfairness - the focus here was on discrimination, which is a specific kind of unfairness. So what is discrimination? A definition is "wrongfully imposing relative disadvantage based on membership in socially salient groups". One could ask what most of these terms mean exactly (and indeed, we must, if we want to computationally model anything), but he focused on the phrase "based on". Some attributes are sensitive, and some are not. Can you simply ignore them? The problem is that, people in different sensitive attribute groups may have different non-sensitive feature distributions, which risks disparate mistreatment and disparate impact. One can test disparity of impact through, for example, proportionality tests, e.g. "an 80% rule" - if 50% of men are accepted, then 40% of women should too. And a shout-out to Fairness, Accountability and Transparency in ML.

There were more talks, but I was drifting into the semi-delirious pre-fever stages of the Conference Flu at this point.

Panel Discussions

The discussion spotlight was 'Regulation by Machine' from Benjamin Alarie. A question - how to use AI to make better laws? My notes are sparse but a recurring theme (also in MLHC) is that we should use machine learning to help and augment humans, not to replace them. So he was speaking about using ML to - for example - help to predict if it's 'worth' taking a case to court. Apparently many cases go to court which are 'overdetermined given the facts', and it's somewhat easy (citation needed) for an algorithm to identify which these are.

My notes on the actual panel are sketchy at best. It may have been the time or how sick I was but, it felt like people were saying a lot of interesting things without obvious argumentative structure or direction, so it's hard to summarise any salient points. Here are some decontextualised, paraphrased snippets:

Deirde Mulligan: the judicial system is not always about applying the same law the same way. You must know the facts, the context... The law wants you to come in and argue about what it means. You can go to court to change the law (she asked how many people had been to court - a couple raised their hands - I've only been to court as a juror). Any algorithm for the law must be both performative and output-focused.
Neil Lawrence: how do judges come to opinions? Also, "I don't want to talk too much as I'm not on the panel."
??? (unknown panel member) - we're assuming the law will furnish us with specific definitions, but actually, policies breed on, thrive on, require a lack of specificity and precision - ambiguity is not an accident!
Ian Kerr: Paul the Octopus was highly accurate, but does that mean we should trust it?
Deirde: shout-out to Nolo press, making the law easier to understand. Especially important in areas where the cost of fighting something isn't worth it...

And a final shoutout to Chief Justice John Roberts is a Robot - Ian Kerr and Carissima Mathen.

Machine Learning for Healthcare Workshop

With the caveat that these are workshop contributions, here are some interesting papers/posters (with accompanying arXiv papers, so I have a chance to remember anything about them):

Demographical Priors for Health Conditions Diagnosis Using Medicare Data - Fahad Alhasoun, May Alhazzani, Marta C. González - they look at insurance claims data from Brazil over a 15 month period - about 6.6 million visits. They represent ICD-10 codes by their distribution over ages (a 100-dimensional normalised vector) and do clustering on this representation.
Stratification of patient trajectories using covariate latent variable models - Kieran R. Campbell, Christopher Yau - they describe a kind of linear latent variable model taking patient covariates into account, and use it on a TCGA RNAseq dataset.
Learning Cost-Effective and Interpretable Regimes for Treatment Recommendation - Himabindu Lakkaraju, Cynthia Rudin - related (possibly extended version) paper here: Learning Cost-Effective Treatment Regimes using Markov Decision Processes. The 'interpretability' comes in here because their state space (of the MDP) consists of the effects on their patient population of decision lists - ordered lists of rules, each consisting of tuples of predicates (like, properties a patient must fulfill) and actions.
Modeling trajectories of mental health: challenges and opportunities - Lauren Erdman, Ekansh Sharma, Eva Unternahrer, Shantala Hari Dass, Kieran ODonnell, Sara Mostafavi, Rachel Edgar, Michael Kobor, Helene Gaudreau, Michael Meaney, Anna Goldenberg - they're interested identifying subtypes of mental illness using time series, and predicting future phenotypic values. They use a Dirichlet Process-Gaussian Process and compare with latent class mixed models, finding that the LCMMs are actually as good as the DP-GP, although neither model is yet good enough for clinical use.
Transfer Learning Across Patient Variations with Hidden Parameter Markov Decision Processes - Taylor Killian, George Konidaris, Finale Doshi-Velez - they're concerned with patient heterogeneity, and cast this as a multitask learning problem, where different tasks are different patients. They share information between tasks using a GP-LVM, removing the requirement to visit every state to learn the dynamics (which is, of course, infeasible in medicine). -Predictive Clinical Decision Support System with RNN Encoding and Tensor Decoding - Yinchong Yang, Peter A. Fasching, Markus Wallwiener, Tanja N. Fehm, Sara Y. Brucker Volker Tresp - they represent the patient's time series with a LSTM encoder and concatenate the static information into a representation.As a decoder, they use tensor factorisation. I'm not entirely clear on what is actually contained in this tensor, so the paper will need to be read more carefully.
Multi-task Learning for Predicting Health, Stress, and Happiness - Natasha Jaques, Sara Taylor, Ehimwenma Nosakhare, Akane Sano, Rosalind Picard - they have wearable sensors and smartphone logs from 30 days of monitoring. They looked at three multi-task approaches: multi-task multi-kernel learning, hierarchical bayes with Dirichlet process priors, neural networks (sharing hidden layers), and single-task versions of all of these.

Mandatory shout-out to my contribution to the workshop: - Neural Document Embeddings for Intensive Care Patient Mortality Prediction - Paulina Grnarova, Florian Schmidt, Stephanie L. Hyland, Carsten Eickhoff - we used document embeddings to predict patient mortality in MIMIC-III, purely using text notes. The embedding procedure uses two layers of CNNs - word vectors are combined into sentence vectors (with a CNN), and sentence vectors are combined into patient vectors (with a CNN), and we use target replication to improve predictive accuracy. This was fairly preliminary (there are many other factors to consider, as ever), but we beat previous work using topic modelling on the task, which is encouraging, and perhaps unsurprising given LDA's inability to deal with multi-word phrases.

This is only a snippet of the interesting work presented at the workshop. I unfortunately came down with Conference Flu about half way through NIPS, and was at my sickest during the MLHC workshop (ironically), so I didn't get to speak to as many poster presenters as I would have liked.

Miscellaneous Comments/Observations

Generative Adversarial Networks are super hot right now, and by saying this I am contributing to the hype.
Despite having around 6000 attendees, NIPS didn't feel overcrowded (contrast with ICML this year). I'm guessing this was a combination of having an appropriately-sized venue and good crowd-control from the venue staff (they were closing off the top floor when it got too full), or maybe everyone was just busy enjoying Barcelona.
Being a vegetarian in Spain sucks. Given my diet was largely eggs, potatoes and bread for the week, I feel sorry for the vegans in the NIPS community. I for one devolved into a patatas-bravas guzzling monster and don't want to even think about tapas for the foreseeable future.

Conclusion

I feel less obviously exuberant about NIPS than I did last year, which I attribute to a combination of having been (and continuing to be somewhat) ill, and being in the development stage of several new projects where I just want to be getting stuff done.

As I've mentioned before, I think about approaching research in an exploration-exploitation framework. At this NIPS I realised that even within the exploration mode, one can explore exploitatively. That is, you can distinguish between diversity-increasing exploration (seeing areas of the state space/field you've never been in before) and depth-increasing exploration (refining your knowledge of partially-explored states/topics). The latter is arguably a kind of exploitation, because it's exploration with the aim to increase knowledge of things you are intending to use later. You hope.

Bringing this strained analogy back to conferences, this makes the difference between going to talks on things you already sort of know and going to totally new topics. I tried a bit of the latter, because chances are I'm going to read papers relevant to me regardless, but I found spotlight talks suboptimal for learning new ideas without sufficient background knowledge. An alternative approach would be to be incredibly exploitative, pre-emptively read the relevant papers and then talk to the authors at the poster sessions. Perhaps next year I'll be organised enough to do that, because unless you go to the tutorials, 15-minute talks of questionable presentation quality on cutting edge research are not good ways to learn new topics.

What is a good way to learn a new topic (personally), is to write about it. I've been working on a pedagogical post about sparse Gaussian process classification, which will be up next, after a brief diversion into roller derby.

roller derby week 4

2016-09-29T00:00:00+01:00

See week 3 here.

It had occurred to me that I might injure myself while practising, and I had thought about avoiding practising on the weekends to give myself time to heal before Monday. I think about a lot of things that don't happen. So having somehow intensified the the bruise on my thigh on Sunday afternoon, I spent Monday afternoon trying to talk myself out of goinng to training. On one hand, my body was fine... everywhere except the bruise. On the other hand, even light touches against the bruise were bafflingly painful. On another part of the second hand, apparently that place is where I fall, and apparently falling is a thing I do. I consulted the internet on the consequences of bruising already-bruised skin and it told me I have cancer, so I went back pensively writing SQL queries while waiting for the subconscious to figure it out.

In the end, I printed off the rules of flat track roller derby and went to training, where I sat solemnly in the centre of the track, huddled in my antiquated Pirate Party hoodie, gazing jealously at the rest of the newbies. I figured, even if I'm not skating, I can learn something by watching and listening. Or, instead of watching and listening, I could continue to deliberate about joining in, while slowly succumbing to the clammy sense of inadequacy inspired by watching other people get better while I sit on my partially-broken ass shivering like an abandoned dog. I stand by my decisions.

They covered backwards skating, transitions (switching between forward and backwards skating), and derby stops. A derby stop is roughly when you turn around to use your toe stops to stop (in quad skates, the rubber stop thing is on the front of the shoe, as opposed to the back on inline skates). I just went looking for an illustrative gif and fell into a hole of roller derby gifs followed by kentucky derby puppy gifs, so I offer no further explanation of derby stops.

Transitions are worrying for me, because one approach (the simpler/easier one, possibly) is to do a mohawk turn. This amounts to briefly going into first or second position in ballet, where your feet form a line with toes pointing outwards. I have really tight hips, which has not served me well through many hours of ballet and yoga (I can't sit cross-legged, to the confusion of many), so achieving that position with my feet takes time and hurts. I have yet to properly learn transitions (I have yet to properly learn skating backwards), but so far I have been using a weird combination of multi-step-hops and spinning on my toes to turn around. Time will tell if these are acceptable methods, cause I don't see mohawk turns happening any time soon. Maybe I'll skip that part and go straight to jumping 180 degrees.

Later that week, something bizarre happened. Deflated after sitting out of training, and struggling to pull myself out of silver league in Overwatch, I tried some very casual, very careful derby practice at home. My room is small and the rest of the house contains far too many breakable objects to practice in, so home-practice consists of putting my skates on and then rolling carefully back and forth between my computer desk and my wardrobe. One day I will stream myself skating around my room while waiting to respawn in Overwatch, and it will be beautiful and terrible. The bizarre thing was that when I put my skates on, instead of my body seizing up in terror, I continued to feel like a normal person capable of controlling my legs. That is when I discovered I could spin on my front wheels. Had I always felt like this? Was the fear a strange dream? Had I actually found it difficult to do side lunges before? Was I suffering from some temporary bruise-induced delusion? I honestly don't know what happened. It is unsurprising that I would become more comfortable with practice, but I didn't expect it to be a step function. That's not to say that I feel entirely comfortable on skates (lord no), but it seems my fear is now focused on new things (like transitions and crossovers), instead of everything.

Gear talk interlude: The 'newbie' skates I bought are Riedell R3s, which come with PowerDyne (round, adjustable) toe stops. It turns out that these are on the smaller end for toe stops, so balancing on them is a little like balancing on high heels (except, you know, on the toe). Natalie has voiced concerns about twisting her ankle while trying to do a derby stop with these, so we are going to order bigger toe stops, most likely Gumballs. Hobbies: never not incurring costs.

On Saturday we found a carpark by Zurich airport for practice. It was a little too inclined for me to feel happy doing much more than intensely failing at slalom, but I also took some time to explain crossover mechanics to Natalie. I can understand things without being able to do them, a fact which is persistently frustrating. Part of this explanation involved me standing mostly-still and crossing one foot over the other, which is not a thing I thought I could do. In fact, that is a thing which I explicitly said I couldn't do two weeks earlier, so I was astonished and smug in equal measure for the rest of the day. Also a wizard. I have a quiet confidence that I could do crossovers if I tried now, but at the time of writing, I've had The Bruise for almost three weeks and it's still there and still (somewhat) painful. During the earlier gif tangent I found a catalogue of derby bruises, which begs the question: how is everyone's first bruise so goddamn small?

roller derby week 3

2016-09-24T00:00:00+01:00

See week 2 here.

Having discovered and then fixed an embarassingly serious bug in my code on Friday, I spent the weekend before the paper deadline at the office rerunning experiments. This is a slow process - for whatever reason, tensorflow takes several minutes to fully initialise the computation graph for my experiments, so there's a decent lag before even first results start coming out. Delays like this are frustrating because they're too long to spend staring at the screen or otherwise doing nothing, yet too short to properly do anything else. Sure, I could practice mindfulness meditation or skim abstracts or read emails or temporarily intensify the attention I am paying to the Hamilton soundtrack constantly playing in the background, but I am a human and I have limits. That weekend, I capitalised on the solitude of a Swiss office on a Sunday to practice some skating. That means putting on all my gear, sitting quietly at my desk typing, and then doing laps up and down the corridor while tensorflow backpropagates through time.

The corridor is long and smooth and mostly empty, but it's not especially wide, so corners and crossovers and such were out of the question. I skated up and down and bumped gracelessly into walls on either end and then I somehow, just, sort of got stickyfeet. What had been demanding and somehow counterproductive became obvious and natural. What did I figure out? Physical actions are hard to explain, but here goes. It has to do with the distribution of weight/balance on the different wheels. So the situation with basic stickyfeet is that you're keeping both skates on the ground, but propelling yourself forward by moving your legs 'out', while your toes point out a bit (if your toes point in, you go backwards and then die). But it's not just 'move your legs out'. For me, it seems that I need to release some weight from my front wheels to facilitate the forward moving. I asked Natalie and she thought it might be outer/inner wheels, so maybe mileage varies here. Whatever it is, I'm pretty sure the DerbyNoob Stance of attempting to cling to the ground through one's skates is directly in opposition to the kind of subtle balance shifts required to actually do anything beyond scrabble desperately.

I also did a bit of backwards stickyfeet-skating, because the movement is the same, just reversed somehow. If I thought I would have been skating backwards (for some definition of skating - I am bad at backwards) within a month of putting skates on, I would not have believed.

In class week three, we did:

slalom: This is something from skiing, I think? I did cross-country skiing once. When I was growing up, skiing was the domain of private school kids and my family in Chile. In derby it's weaving between cones. Current status: nah, not really, nope. I just can't turn that tightly. I tried weaving between every second cone and that was almost possible. However, bizarrely, later in the class we had to weave between skaters and I could do that. Maybe staring at cones on the ground makes skating harder. Maybe I'm just afraid of cones.
hopping: I mentioned in week 0 that I'm able to jump on the surface at the gym. This is true. You know what happens when I jump on other surfaces? I'd upload a picture of The Bruise but it's on that part of the leg where ass becomes thigh, the part of the leg you can't just display unless you're at a beach or in a sexually liberated society. The forbidden leg zone. Let me give myself some credit here. We had to jump over a small pile of cones (cones, you mysterious bastards). My initial reaction was "no, hell no, jesus no" but then I remembered that all is meaningless and came to peace with the jumble of bones and gristle that would soon constitute my body, and I gave it a go. The first few times I basically just landed too hard on the front wheels and kept going, either hitting my toe stops or falling on my knees. A goldilocks siuation was afoot - that was too much front, so I overcompensated the next time and went too much back. Instead of getting eaten by bears or whatever it is that happened to goldilocks, I landed on my back wheels and fell arms-flailingly backwards onto my thigh. Apparently my left upper thigh is just where I fall. I think I crawled off the track and climbed onto a chair, because the initial impact caused my leg to go various kinds of numb. The resulting bruise was mysteriously round and violently purple, probably about eight cm in diameter. More on this later.
focus: this was really fun. We assembled a pack (roughly, 'be within arms distance of two people at all times') and then had to skate around while identifying colours and numbers from various teachers/refs who were possibly behind us. Apparently I skate better when I am slightly distracted. I am reminded of a cognitive test I had done some years earlier at the Science Gallery where I had to memorise numbers said aloud while also solving a maze puzzle. They tested both tasks independently and then together, and I did better doing both at once. My explanation for this is that while free to either look or listen (i.e. no maze or no numbers) I got distracted by things in the environment (this was happening in a fairly busy room) which broke my concentration, whereas doing both at once required focus but not in a way that excluded my capacity at the other task. Another example: I find it very helpful to draw/doodle/fidget while listening to things, but unfortunately that comes across badly in meetings.

Another amazing thing during the class was that I did a crossover. How? Well, I just... sort of... did it. I had been trying to first practice crossing my feet over while standing still, or skating on one foot, or whatever. But then one of the teachers was like 'just try it' and I was probably already delirious from all the blood pooling in my thigh so I went for it, and it happened, and everything was beautiful and nothing hurt. And then I fell on my knees, but I think that's because I had no end-game for the move. I assumed I would try and fall, so once I wasn't falling I didn't know where to go, and I fell. So a holy grail (there are many) of skating seemed within reach. I had done it once, and I could - in theory - do it again. So that Sunday, after The Bruise had recovered enough that I could walk mostly normally again, I hit the gym.

Insert a wheel-switching montage here. Did I mention I ordered bearings for the new wheels, to make switching easier? And I forgot how to do numbers, so I got half as many as I needed? Masters degree in mathematics right here.

I had been at the gym for about twenty minutes when it happened. I had skated in circles, I had skidded weirdly on stripes on the floor (it's one of those multipurpose courts covered in every sports marking), and then I went for the crossover. And I fell. straight. onto. the. bruise. I wanted to yell at everyone and no one in particular that I already had a bruise there, that I already had a massive, deep bruise, so they would not judge me for crawling back to the bench, trying not to cry. It hurt, it hurt so bad and I felt repulsed and yet obsessed by the idea of pressing on that mess of blood and broken veins.

I sat on the bench and breathed deliberately, waiting for the waves of pain and dizzying shock to subside, psyching myself up to try again because I was so determined to get this, and then a man appeared. He was not in sports attire. He did not have a smiling face. He communicated in limited English that rollerblading is not permitted in the hall. I asked if there was somewhere else I could go. He said no. I asked again and he said he would get someone who spoke better English. He retrieved one of the basketball players who had been in the hall beside me. The basketball player told me that skating was not allowed. The skates would damage the surface. I asked if there was somewhere else I could practice. He said no. I nodded. I thanked the men. I ripped off my knee pads and elbow pads and wrist pads. I took off my skates. I looked at the pile of gear sitting beside me. I looked at the changing room door on the far side of the hall. I tried to shove my gear into my helmet and it didn't fit, so I put my helmet on and grabbed the shoes by the laces and looped my keys around a finger as I held my bottle under my arm, knee pads cupping the shoulder pads, the skates getting heavier as I left sweaty footprints on the floor.

roller derby week 2ish

2016-09-15T00:00:00+01:00

See week 1 here.

I endeavoured to revise and presumably master all the material we covered in the first class on a Sunday afternoon (nerd, remember. Historically 'Good At School'). It seemed quite straight forward. I would just repeat the difficult thing until it became easy. I've been there. I used to spend an hour a day just playing scales on my violin. Repetition is the key to mastery.

I failed to account for the fact that in physical activities, mistakes aren't free. They hurt. They can damage you. I'm writing this three and a half weeks after the failed T-stop at the first training, and my ankle still hurts. So when I put my gear on and stood up in the sports hall, instead of taking off with the casually intense focus of one committed to a task, I became very aware of how precarious my situation was. It seemed likely if not utterly predetermined that I would fall, badly, onto my back - maybe twisting my legs or ankles as I went down, or falling onto an arm and breaking it, implausibly - my wrists were too well protected but - presumably the forearm can break somehow - can you break elbows? This in mind, my plans of mastery withered to a single hope - to survive unscathed, and maybe be a bit less terrified in an hour.

Remembering the almost-comically predictable 'falling at the corners' I'd done at training, I managed to expand my rapidly-contracting ambition for the day to include learning how to turn. Because the gym is only available for a few hours on Sundays, I first went home and watched a lot of youtube videos (some relevant) and thought about acceleration, broadly defined.

Here is what I learned: you can turn by pushing the outer foot out. You lean a bit into the turn, putting weight on the inner leg, and push the outer foot roughly 'outwards' (actual trajectory is more like a curve since you're moving forwards at the same time).

When I realised this - in the sense of actually achieving it, rather than understanding it conceptually (which was easy), I felt so accomplished I almost forgot how wildly I had moved the goalposts on my afternoon to get there. I got excited for the prospect of trying the '27 in 5' again, because the secret to turning corners without losing all my speed had been unlocked and maybe I'd get a score I could say in public.

Unfortunately, I almost missed training session two, because I spent most of the day in a haze of pain and sickness, semi-conscious and clawing pitifully in the direction of Netflix. Scientist was baffled. I made a deal with myself where I would go via tram (as opposed to bike - not a fan of cycling when I'm dizzy) and take breaks as often as necessary. Probably not a good health choice, probably don't take health advice from me... or definitely do, because I miraculously got better, and it was certainly due to roller derby and not the extended nap I took in the afternoon.

Highlights of the class included me seeing side planks with leg lifts, thinking 'I can probably do this, I have done this before', and being totally incapable of lifting my legs, because that's what happens when you add a 2kg (?) weight to the end of your leg and also stop going to yoga six months earlier. We also did some 'agility', which involved things like balancing on one skate (I was pretty good at this - I have good balance), jumping to the side (somehow easier than it sounds), and stepping to the side while crossing one foot over the other (LITERALLY IMPOSSIBLE, anyone who can do this is a WIZARD). We also did a pattern of zig-zagging around the track (cutting from side to side) which exceeded my limited ability to do sticky feet (skating without lifting your feet) and involved a lot of baleful middle-distance stares as I rolled unceremoniously to a stop. But I barely fell, and someone said it looked like I had been practicing, so I left the place wearing the invisible sunglasses of a person inwardly giddy with pride.

I was then travelling for rougly a week while trying to finish and submit a paper, so I missed the next training - but it was actually an intro to the extensive rules of roller derby, which I have already largely ready (nerd, remember), so the exhilirating story of learning-to-skate continues uninterrupted whenever I write the next one of these. Right now I have to find a sleeping position that doesn't involve the bruised side of my body.

roller derby week 1

2016-08-29T00:00:00+01:00

See week 0 here.

I am a nerd. I didn't want to go to class unprepared, so a few weeks before the Rookie Course started, I went to a neighbouring canton to buy derby gear so I could practice. Practice and hope to attain a minimal competency such that I would not be so horribly bruised again, so quickly. Roller derby gear purchased in Switzerland is pretty expensive. My knee pads alone cost 100 francs. But I have a job and a single yoga lesson here costs at least 20 CHF so I can deal with it. And those pads make falling forwards painless, which I rate quite highly. One must remember to fall forward.

An expedition was undertaken to find a location suitable for practice. Such a location must be:

flat, ideally completely flat, oh god a single stone will kill me dead
safely enclosed from roads, traffic, hills, chasms
devoid of other people, with their beady, judging eyes

And another thing I did not even think to think about:

of a surface appropriate for the wheels on my skates

It turns out that the wheels which came with the 'starter' skates I got (Riedell R3s) are of a hardness appropriate for concrete and other hard-ish surfaces. (They're Sonar Flat Out wheels, with a hardness of 88A, for reference.) The indoor multi-purpose court we found at our university's sports centre fulfills the first three conditions, to an extent. There are people on exercise machines overlooking the court, but I can deal with judgement better than I can deal with skating into traffic and dying. It fails badly on the last condition. It has some kind of rubberized surface (you can leave small impressions in it with your nails) and when I first put on my skates and tried it out, something felt... wrong. I could stand upright with no effort to stay in one place. I could roll to a graceful stop by ceasing movement. When I tried to move forwards, my feet lagged strangely, and made me stumble. I could jump without filling with abject terror. Something was wrong. I went home and did some research ('why roller skate sticky') and deduced that my wheels were responsible. For a surface that soft you need harder wheels to compensate, and my 88As wouldn't cut it. So I went off and bought some 95A hardness wheels and they seemed better, not perfect but good enough because I'm not made of money or willingness to spend a morning going to Aarau, as pretty as their cantonal flag may be.

Minor logistical issues aside, I managed a tiny bit of practice before the first training session. It's hard to know how much worse I would have been without it. I was bad, real bad. I fell (backwards or otherwise messily) five times. While trying to do a T-stop I messed up somehow and twisted my ankle in a way that seems like something should have broken. My issue was mostly corners. I was trying to stay on the inside of the track to avoid being in the way of the others, who are much better than me (they are mostly more experienced, so this is natural). I can deal with hurting myself but I don't want to cause someone else to fall. The inside of the track has a tigher corner however, so I kept getting off balance. Obvious solution is to stop hugging the inside of the track. Or learn how to turn (coming soon!).

At the end of the session we did a practice '27 in 5' (part of the minimal skills test is to skate 27 laps in 5 minutes) and I did twelve laps in five minutes. That's awful. So bad. Much slow ... but I was intentionally avoiding gaining speed on the straights because I had no idea how to deal with it on the corners, and I definitely didn't want to make someone else fall while they were trying to go fast. Especially not if it involves them falling on me.

We also covered plow stops, which I find a lot easier and less ankle-destroying than T-stops, probably because I can keep both feet on the ground. Overall though, my favourite stopping method is 'falling and curling up in a ball'. Fall small, they say. Be grand, they say.

I was alarmed at first by the horrible disparity in skill level at the Rookie course. Some people were doing crossovers in week 1. I mistakenly tried to keep up and got several days of hobbling around the office as my reward. I resolved to practice (once my leg regained full function) and definitely catch up, because that is definitely possible in a week (spoiler: no).

And now for some thoughts on skill-acquisition. At this point, being on skates feels terrifying. The ground moves perilously beneath me at every moment. Even staying still requires intense and tiresome muscle activity. My whole body is inexplicably involved in the task of not falling to my doom. I imagine however that this is how cycling felt, once. On cycling: I am a good cyclist. I am not an advanced cyclist: there are many things I can't do. I can't swing my leg over my bike and jump off while I'm slowing down, and I can't cycle stably with no hands. I can't turn really sharply. I don't try to do these things. I am good at the things I choose to do (causal direction left as exercise for the reader). I feel utterly comfortable on my bike - it is an extension of me, and I can go where I desire without consciously acting. It would be hard for me to explain exactly how I cycle, because so much of it is automatic. My hope and belief is that this is how experienced people feel on rollerskates. Until then, I can't play while on skates, because I am far too occupied with questions of basic mechanics. I want to get to the stage where I can think more about what I'm doing and less about how I'm doing it.

roller derby week 0

2016-08-28T00:00:00+01:00

In a turn of events baffling to those who know me, I decided to sign up to (try to) learn to play roller derby. Roller derby is a full-contact sport played on rollerskates. The game basically consists of getting in someone's way, or getting through people who are getting in your way. It's like 'walking through Times Square' but on rollerskates and the pushing is consensual.

My interest in the sport is incredibly out of character. I'm not a fan of falling over. I don't like getting hurt. I don't really like doing things which might hurt others, and while I do enjoy the sensation of moving quickly, it also terrifies me. I appear to lack any thrill-seeking bones and I've always been fine with that. Why would I do something dangerous? Also, and this is not unrelated, I'm sort of small. I'm 161cm (5'3") tall and weigh about 55kg (120lb). In a game of momentum transfer, I am going to lose. I fare a lot better at academic pursuits, like writing code while slowly horizontalising myself, or drawing amorphous manifolds on whiteboards. Also computer games. I got an accidental headshot in Overwatch last night, so you could say I'm pretty good.

However, I also moved to a new country (Switzerland) recently, and having one's non-work life unceremoniously deleted is a good opportunity to find new hobbies. (Or rediscover old ones, like competitive online video games) I had been aware of roller derby for some time because a former flatmate was involved with the Dublin Roller Girls back when we lived together in 2010. At the time I wasn't interested in the idea of sports, much less violent ones, so I never seriously considered the idea of doing it. Also, I'm so small! I was in the 'too light to donate blood' category back then. However, now that I am older and wiser and excluded from blood-donation for different reasons, I have decided to get out of my comfort zone. Wildly. So far that traditional notions of distance can no longer be meaningfully used to describe the relative locations of roller derby and my comfort zone.

So, while scouting around for Things To Take Up in Zurich I found the Zürich City RollerGirlz, noticed they had a try-out day and convinced my good friend and coworker Natalie to go with me. That was my first time ever on rollerskates. I fell twice on the same spot and had a bruise which lasted, visibly, for three weeks. But it was fun and I learned that the game is more nuanced and less openly brutal than 'shove people, also you're on rollerskates'. There are rules. I knew in a sense that there would be, but I also feared that 'full contact' means anything goes. Thankfully not.

Natalie and I signed up to the rookie course, and I intend to document my progress. Natalie has a video-camera too, so if we remember we might compile clips to make a sweet training montage video at the end. It lasts 12 weeks and culminates in the Minimal Skills Test required to be eligible to actually play roller derby. It seems implausible to me that I can go from never having skated to passing that test in a 12 week period (while also doing a PhD and becoming a professional Overwatch player), but time shall tell. I'm quite motivated to try, because everyone loves underdog stories, and I really like having things to care about that aren't related to my PhD. File this under coping mechanisms.

The course actually started two weeks ago, I just didn't think to blog about it until now. I was searching for things like 'how to roller skate' and came across some really useful and reassuring blog posts from other beginners (or ex-beginners), and realised how much I value reading about other people sucking at things. So here I am, sucking at a thing.

transcribing my accent

2016-08-21T00:00:00+01:00

An exercise in the international phonetic alphabet (IPA).

Here's a quote from the magnificent Margaret Atwood book, "The Handmaid's Tale":

"Now we walk along the same street, in red pairs, and no man shouts obscenities at us, speaks to us, touches us. No one whistles.

There is more than one kind of freedom, said Aunt Lydia. Freedom to and freedom from. In the days of anarchy, it was freedom to. Now you are being given freedom from. Don't underrate it."

I recorded myself reading it at a normal pace, not trying to enunciate correctly, trying not to think about my accent, here is the recording.

And here is my attempt to transcribe the recording into IPA:

[nɐu wi wɒk əlɑŋ ðə seɪm st͡ʃɹitʰ, ɪn ɹɜd pɛɹz, ənd nɵʊ mæn ʃɐət͡s əbsɛnɪtiz ætʰ əs, spiks tu əs, tət͡ʃɘz əs. nɵʊ wʌn wɪsœlz.]

[dʰɛɹ ɪz mɒɹ ðɜn wʌn kɜnd əv fɹidəm, sɛd æntʰ lɪdiæ. fɹidəm tu ɶnd fɹidəm fɹɐm. ɪn ðə dez əv anœɹki, ɪtʰ wəs fɹidəm tu. nɐu jəɹ bin ɡɪvɪn fɹidəm fɹɐm. dɵʌnt əndəɹetʰ ɪtʰ]

My accent is... a bit weird. I spent my first twenty-three years in south Dublin, so my accent should be unquestionably 'Irish', but people often think I sound American. I don't know why. ¯\_(ツ)_/¯

Things I noticed during transcription:

I don't break between words, apparently (does anyone?). While transcribing 'freedom' I had to check that it wasn't actually 'freedoms', because there's no gap in the audio between the [m] and the [s].
vowels are hard and mostly schwas.
I simply cannot make out the difference between some letters, like [æ] and [a]. My ear just isn't that good (yet). I think I need to spend more time learning how the sounds are created, because I found that immensely helpful in distinguishing between, for example, [ʃ] and [ʂ] while studying the consonants.

If you send me a recording of yourself reading the quote I will try to transcribe it to IPA for comparison purposes. Comparison and judgement.

I imagine transcribing someone else speaking is much more difficult because one can't rely on slow careful repetition with internal observation of the shape of the mouth. That might be a good thing.

important site updates

2016-07-06T00:00:00+01:00

Two matters, one vastly more important than the other:

SSL is active! It was incredibly easy to set up with Let's Encrypt. Losing SSL was one sad thing about moving away from GitHub pages, but it is clearly remedied. Now you can access my content securely.
This site has a new subdomain: http://dog.apeiroto.pe/. It shows a new dog gif/image on reload. SSL is... not working on the subdomain, I think I need to poke at nginx for that. You can now emulate friendship with me by hitting refresh on that page!

The way I did the new image on reload thing is hacky so here goes: I considered trying to actually learn JavaScript and then remembered I had to go wash my horse. So I found a 'load random image on reload' script, which puls images from a list defined in the script. Given I want to just drop files in a folder and have them enter the pupper-rotation this was no good, so I wrote a script (in bash) to compile the contents of the dog-folder and stick that in the HTML. Easy. What I need to do now is combine the 'count puppers and update HTML' script with the 'sync puppers to web server' script and I'll be sorted. Or put both on cronjobs and forget about it.

At the time of writing there are 210 such images. That's 2.1G of doggos. I regularly collect these pictures from twitter/imgur/giphy so it should grow slowly.

Coming soon: cat.apeiroto.pe.

ICML 2016 not by the day

2016-07-05T00:00:00+01:00

The International Conference on Machine Learning (ICML) was in NYC this year! Unfortunately(?) for me, I moved from NYC to Zürich two months ago. Fortunately for me, I was able to return to attend the conference. Instead of doing a day-by-day breakdown (as I did for NIPS and AAAI), this post will be arranged thematically. Let's see how I deal with the hard group assignment problem... Skip to the bit you care about.

Caveats:

I missed some non-trivial fraction of ICML due to finishing my poster, helping collaborators with a grant application, and coming down with illness
- Future conference goal: finish my poster before I travel.
- Also don't try to print A0 posters in the USA. It ain't pretty.
I took very patchy notes, haven't read all the papers deeply.

Volunteering at ICML

I was a student volunteer for ICML, which consisted of working two ~five-hour shifts at the conference. For me these were both Registration Desk. I had 07.30-12.30 on the first and last days, which was possible purely by my being in European time for much of the trip. I woke up at 4am on the first day. Here are some observations:

people actually register on the last day, but more people just want to get their badge reprinted
- protip: don't forget your name badge!
- you paid hundreds of dollars to get that piece of paper
some people turn up really early to register
90% of ICML attendees were DeepMind employees
registration desk workers could easily be replaced by name-badge-printing kiosks
conference attendees expect a pile of swag upon registration: pens and bags and mugs and programs booklets. Not receiving these items is cause for thinly-veiled indignation
queues for registration are worst in the gap between sessions, naturally
people manage to make it to the top of a line without attempting to find the documents they need
- I have also observed this phenomenon in airports and banks
- why
I registered a bunch of people whose papers I have read, and I maintained composure
if I were running the registration desk with excessive time to spare, we would have had a graph of cumulative registrations over time, maybe with a breakdown for geographic origin/broad affiliation

Overall it was surprisingly fun. Apparently I rather enjoy that kind of work, so if this whole research thing doesn't work out I have a bright future as a vending machine.

Tutorial on Deep Reinforcement Learning

I was only able to attend one tutorial due to volunteering, and it was Deep RL. It was so popular there were two overflow rooms. Intense community interest in deep RL continues. Here's an abbreviated version:

The deep part comes into play when you use a deep neural network to approximate your value function, policy, environment etc.

Interesting Papers/Talks

These are the papers I flagged in the conference app. Did I attend all of these talks? No. Did I attend all of the posters? Also no. In hopefully-meaningful categories:

Neural Networks

Learning to Generate with Memory: Chongxuan Li, Jun Zhu, Bo Zhang: a deep generative model with external memory and attention mechanism. The deepness comes in through some nonlinear functions on latent variables which are defined by (deterministic) deep neural networks. Each layer in the network has access to its own external memory, which is seemingly novel in this model. In each layer lower-layer information is combined with the memory to produce the output, using some attention function taking as input the information from the lower layer. I'm not entirely convinced by the experiments that the memory mechanism actually helps that much, although they say it gives better 'qualitative' results.
Unitary Evolution Recurrent Neural Networks: Martin Arjovsky, Amar Shah, Yoshua Bengio: The idea here is to use a unitary matrix as the evolution operator in an RNN, with a hope to avoid exploding gradients. It seems to result in an RNN which can retain information for longer than a LSTM, and while gradients do vanish slowly, they do so more slowly than other models, and don't explode. I'm working on something of an extension to this work right now, and I had the pleasure of speaking with the authors at length. More details in forthcoming paper, I guess? Or blog post, we'll see.
Strongly-Typed Recurrent Neural Networks: David Balduzzi, Muhammad Ghifary: I really like the spirit of this work. Let's try to understand RNNs! And take inspiration from functional programming and physics, because why not? The physics part is roughly to preserve 'dimensions' (think units) by preserving the basis of the space. I took issue with this because I think any map from a space to itself is already preserving something (preserving being in the space, that is), but what that means for the model is less clear. The part from functional programming is about separating state and computation, a separation into learnware (with parameters) and firmware (having no parameters, but having state).
Group Equivariant Convolutional Networks: Taco Cohen, Max Welling: Wild simplification/mild understatement: they extend convolutional layers to other kinds of symmetries, not just translational.
Training Neural Networks Without Gradients: A Scalable ADMM Approach: Gavin Taylor, Ryan Burmeister, Zheng Xu, Bharat Singh, Ankit Patel, Tom Goldstein: ADMM stands for Alternating Direction Method of Multipliers. They use this with Bregman iteration to train networks without SGD! This method scales linearly over cores, and they compare this to an asynchronous SGD model called Downpour, which scales very strangely. SGD, having many small computations is good for GPUs, whereas CPUs are better for a smaller number of expensive calculations, preferably involving a lot of data. This approach also combats the vanishing gradient problem (unsurprising given there are no gradients to vanish: gradients come pre-vanished), and SGD's tendency towards lingering near saddle-points.

Reinforcement Learning / Bandits

Opponent Modeling in Deep Reinforcement Learning: He He, Jordan Boyd-Graber, Kevin Kwok, Hal Daume III: They develop a model called DRON: Deep Reinforcement Opponent Network, which is close enough to TRON to make me happy. It's based on Mnih's deep Q-networks. DRON has both policy-learning module and opponent-learning module. It's essentially two networks, and they look at ways of combining them: concatenation and using mixtures-of-experts.
Why Most Decisions Are Easy in Tetris—And Perhaps in Other Sequential Decision Problems, As Well: Ozgur Simsek, Simon Algorta, Amit Kothiyal: by 'easy' they mean: "one can choose well among the available actions without knowing an evaluation function that scores well in the game". The idea is that comparison becomes easy when some criteria are met, and the relationship between features and criterion (of the comparison) is linear. This linearity requirement seems restrictive, but holds true for the best known tetris player (BCTS).
Smooth Imitation Learning for Online Sequence Prediction: Hoang Le, Andrew Kang, Yisong Yue, Peter Carr: They're looking at imitation learning where actions and the environment are continuous, but the environment is exogenous (not affected by actions). They consider the state space to be both environment and actions (so the policy considers the previous action taken), and enforce smoothness of actions. The application is smooth camera control (the paper is from Disney research), hence smooth actions. Their approach learns a fully deterministic stationary policy, and they have some other contributions whose gravity are somewhat lost on me, but are presumably important.
Asynchronous Methods for Deep Reinforcement Learning: Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu: As an alternative to experience replay, they asychronously run multiple agents in different instances of the environment, in parallel. This can then be run on a multi-core CPU rather than a GPU, and is more resource efficient. Some nice ggplots, too.
Conservative Bandits: Yifan Wu, Roshan Shariff, Tor Lattimore, Csaba Szepesvári: a multi-armed bandit problem where a company wants to maximise revenue while keeping revenue above a constant baseline. In this setting there exists a 'conservative default action', and they propose an extension to UCB (upper confidence bound) where a budget is accumulated using the conservative arm, and when large enough allows for 'safe' exploration.

Representation Learning

The Information Sieve: Greg Ver Steeg, Aram Galstyan: What an intriguing title. This is about representation-learning. The idea seems to be to iteratively 'sieve' the data, extracting a latent feature at a time, then passing on a version of the data with the contribution from that feature somehow removed, and so on. Sieving. It relies on the total correlation, or multivariate mutual information, and they describe a way for finding the factors which cause this total correlation to decompose into non-negative contributions.
Complex Embeddings for Simple Link Prediction: Théo Trouillon, Johannes Welbl, Sebastian Riedel, Éric Gaussier, Guillaume Bouchard: a scoring function for link prediction (subject, predicate, object type triples) which uses complex-valued embeddings for entities. Using the inner product in complex space amounts to taking dot products with complex conjugates, which handles asymmetry of the triples. The relationships appear to be parametrised with complex-valued vectors. At a glance it looks like a complex version of DistMult.

Other / ???

ForecastICU: A Prognostic Decision Support System for Timely Prediction of Intensive Care Unit Admission: Jinsung Yoon, Ahmed Alaa, Scott Hu, Mihaela van der Schaar: the application here is predicting when/if a patient needs to be admitted to the ICU. They cast it as an optimal stopping problem, and try to learn the unknown stopping rule of the stochastic process: how the physician decides (on the basis of the stream of data) to admit the patient to ICU. They assume patients belong to 'stable' or 'deteriorating' classes, which describe different distributions over physiological streams.
Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning: Yarin Gal, Zoubin Ghahramani: I'm not going to give this paper justice by skim-summarising it, so I'll just quote a sentence: _"In this paper we give a complete theoretical treatment of the link between Gaussian processes and dropout, and develop the tools necessary to represent uncertainty in deep learning". Cool cool cool.
CryptoNets: Applying Neural Networks to Encrypted Data with High Throughput and Accuracy: Ran Gilad-Bachrach, Nathan Dowlin, Kim Laine, Kristin Lauter, Michael Naehrig, John Wernsing: Homomorphic encryption! Homomorphic encryption only allows for addition and multiplication, and ideally with low-degree polynomials, so they have to approximate the usual max pool, sigmoid etc. transformations. One also has to be careful as all operations in the cryptosystem are applied modulo some number. A key thing to note here is that they're not training on encrypted data, just predicting.
The Arrow of Time in Multivariate Time Series: Stefan Bauer, Bernhard Schölkopf, Jonas Peters: Non-Gaussian noise breaks time symmetry in multivariate autoregressive moving average (VARMA) models.

Geometry in Machine Learning Workshop

Is the title of this workshop an intentional Lord of the Rings reference? I sure hope so.

I spent the whole day at this workshop, since I was presenting a poster and also yay differential geometry.

So why care about geometry for machine learning? Firstly, by geometry we're talking about differential geometry, which is focused on differentiable manifolds (manifolds which locally look flat). Data usually lies on a manifold. We often assume this manifold is Euclidean space (nice and flat), but it often isn't. A simple example is data which lies on a circle, which if you've encountered if you've ever dealt with angular measurements. Gregory S. Chirikjian gave a really nice illustrating example in his talk "Learning and Lie Groups": if you consider the range of motions available to a simple noisy robot, after a certain number of steps its possible location will be given by some probability distribution (this is called the 'banana distribution'). This distribution is not Gaussian in x and y (the coordinates of the Euclidean manifold a.k.a. the plane the robot was moving on), but if you recall that its motions were constrained to come from a Lie group (specifically the planar special Euclidean group, SE(2), consisting of translations and rotations in the plane), you can define a Gaussian distribution relative to coordinates in that group space (since Lie groups are manifolds), and this distribution describes its location. For more details, see the paper: The Banana Distribution is Gaussian: A Localization Study in Exponential Coordinates.

Reasons to be careful when your data lies on a manifold seem to be:

doing statistics requires a notion of distance, so you must use the distance on the manifold
gradient-based optimisation requires, well, gradients, so you must use the gradient on the manifold

This second point is actually highly relevant to the work I was presenting at the workshop, which will become entirely clear once I put the paper on the arXiv.

I think machine learning as a field already cares about manifolds a lot, particularly when it comes to finding low-dimensional subspaces within a dataset. This workshop was however primarily concerned with cases where the (sub-)manifold is already known.

And now, the content: (also, you can get the slides for these talks on the workshop page)

Nicolas Boumal spoke about Optimisation on Manifolds. Here is his PhD thesis on the topic. The take-homes were:

we have some convergence guarantees for non-convex optimisation on manifolds, see the paper: Global rates of convergence for nonconvex optimisation on manifolds, Boumal, Absil and Cartis.
he has developed a Matlab toolbox for optimisation on manifolds: Manopt
free book, Optimization Algorithms on Matrix Manifolds, Absil, Mahony, Sepulchre

Laura Balzano spoke about Subspace Learning by Incremental Gradient Descent on the Grassmannian.

the Grassmannian is a manifold comprised of all low-dimensional subspaces of a particular ambient space, I believe with a pre-specified dimension (so it could be the space of all lines, for example)
her focus area is streaming data, where you want to use first-order methods (not enough data to estimate hessians, for example)
doing SVD where the learned matrices are elements of the Grassmannian (that is, living in a lower-dimensional space), so gradients are on the Grassmannian
more details probably in this paper: Global Convergence of a Grassmannian Gradient Descent Algorithm for Subspace Estimation, Zhang & Balzano
also featuring a live demonstration of separating foreground from background in video! using a laptop and a webcam! More here: Online Algorithms for Factorization-Based Structure from Motion - Kennedy, Balzano, Wright, Taylor

Gregory S. Chirikjian spoke about Learning and Lie Groups as I mentioned above:

paper again; The Banana Distribution is Gaussian: A Localization Study in Exponential Coordinates, Long, Wolfe, Mashner, Chirikjian
also a book (although not free ;_;): Stochastic Models, Information Theory, and Lie Groups, Chirikjian

Tom Fletcher spoke about Probabilistic Geodesic Models. The motivation is shape analysis (with a medical application in brains), particularly for dimensionality reduction and regression.

he gave a nice introduction to the idea of shape: basically, geometry of object invariant of position, orientation, size: when you remove these things you are on the SHAPE MANIFOLD
Kendall's Shape Space: defined in a complex space. The idea here is that multiplication by a complex value is a rotation and scaling in complex space, so if you 'quotient' that out, you get Kendall's Shape Space, a complex projective space. (and amusingly for me, Projective Geometry is a class I used to sneak into)
back to the idea that statistics requires a notion of distance, he defined for us the fréchet mean, allowing points to be 'averaged' on a manifold, and allowing you to define something that looks like a Gaussian-on-a-manifold...
but a different one than that proposed by Chirikjian, because: there are many ways to arrive at a Gaussian distribution (as a solutions to heat and diffusion equations, as maximum-entropy distributions, the central limit theorem, maximum likelihood solutions to least squares, etc.) and while these seemingly converge on the much-loved Normal distribution in Euclidean space, this doesn't happen on other manifolds... so we end up having 'normal distributions' that look different depending on which definition we started with... oh dear.
I think it was at this point that someone voiced the concern that in an arbitrary manifold, the distance metric is locally defined (because it is defined on the tangent space at a point), so the normalisation constant in your Gaussian-on-a-manifold actually depends on the centre of the distribution. The solution to this is to only look at homogeneous manifolds, manifolds whose isometry group acts transitively, so the manifold 'looks the same' everywhere.
some homgeneous manifolds: spaces with constant curvature, Lie groups, Stiefel manifolds, Grassmiannians, dot dot dot
Open Access Series of Imaging Studies (OASIS): open access brain (MRI) images!
then it got into geodesic regression and the manifold of diffeomorphisms, with a shout-out to the Sobolev metric, and a mention of Gaussian processes, thus ensuring my interest was piqued
generalisation of probabilistic PCA on a Riemannian manifold: Probabilistic Principle Geodesic Analysis, Zhang and Fletcher
another relevant paper: Geodesic Regression and the Theory of Least Squares on Riemannian Manifolds, Fletcher

Katherine St. John spoke about Dimensionality Reduction on Treespaces, specifically evolutionary trees. Hey, biology! Phylogenetics! The core issue is: you see a set of organisms (their genomes, rather) and want to find the optimal evolutionary tree, out of a very very large set of trees. What to do? Metrics on trees usually look at things like rearrangements ("remember balancing red-black trees?"), distances which are NP-hard to compute. I apparently didn't take many notes during this talk, so have some likely-relevant references:

The Shape of Phylogenetic Trees (Review Paper), St John
Characterizing Local Optima for Maximum Parsimony, Urheim, Ford, St. John

Mikhail Belkin spoke about Eigenvectors of Orthogonally Decomposable Functions: Theory and Applications. This was partially lost on me, but what I got was: - we have a well-defined notion of eigenvectors and eigenvalues for matrices, but what of tensors (multilinear forms)? There's no spectral theorem here, the idea of rank is different, 'things are just sort of unpleasant' - focusing on orthogonally-decomposable tensors makes things easier (sort of an analogue of eigen-decomposition) - then the trick is to recover the 'basis' the tensor is orthogonally-decomposable on - he said this was primarily about work with Rademacher and Voss, so this paper is likely the reference: Basis Learning as an Algorithimic Primitive, Belkin, Rademacher, Voss

Finally, Stephen Marsland spoke about Principal Autoparallel Analysis: Data Analysis in Weitzenbock Space. This talk got into discussion of connections (maps between elements of tangent spaces), and their curvature, and torsion. It had the same effect that looking at my copy of Spivak's 'A Comprehensive Introduction to Differential Geometry' has: excitement to (re)learn these things but the vague guilt of indulgence in intellectually stimulating but maybe not so directly applicable mathematics. But so cool. Also the sense of having come so close to getting fibre bundles. One of these days.

this talk included an entertaining story about the history of Weitzenbocks spaces, Cartan not receiving recognition, and racist messages hidden in books. Forgetting the umlaut in Weitzenböck's name is OK, because he was a racist.
we usually look at the Levi-Civita connection, which is unique and torsion-free. This one weird non-zero torsion tensor. Mathematicians hate it!
intuitive explanation of curvature: the amount you've rotated upon returning to your original position
intuitive explanation of torsion: the amount you've failed to return to your original position, sort of, or, 'how hard it is to stay on the manifold'
Riemann-Cartan space reduces to: Riemannian if torsion is 0, and Weitzenbock if curvature is 0
cryptic statement in my notes: 'prior over tangent spaces?'

And that's where my notes end.

The poster session was really good in that I got to speak about my work a lot, but really bad in that it ended before I got to see anyone else's work, or talk much about my work at all. I had so many more things to say! Good thing I have a blog. I'm also working on a manuscript which is very almost ready to go on the arXiv, honestly.

Computational Frameworks for Personalisation Workshop

Mistakes were made. I spent the first quarter of this workshop working the registration desk, and the second quarter standing outside the workshop. The afternoon I spent at Machine Learning in Social Good Applications, which was not a mistake (although I arrived too late to get a t-shirt in my size), as I think I had already seen the work from David Blei's talk present at the New York Academy of Sciences Machine Learning Symposium.

The name of the workshop got truncated to 'Computational Frameworks' on the sign outside, so I got to feel vaguely useful providing disambiguation services while trying to glimpse content.

The content I was most interested in (and managed to catch part of) was Joelle Pineau speaking about Contextual Bandits for Effective Discovery of Personalized Adaptive Treatment Strategies. The focus here is on adaptive protocols, such as adaptive clinical trials or adaptive treatment strategies. In each case, earlier outcomes influence subsequent decisions: it's, you know, adaptive. The computational framework they use is the multi-armed bandit: you have a set of K actions with probabilistic outcomes. You don't know the outcomes or the probabilities, but you have to select actions to maximise some expected utility. This poses the classic exploration-exploitation trade-off so integral to sequential decision making. Once you discover an 'ok' action, do you choose it repeatedly (exploiting it), or do you attempt to find yet better actions, risking stumbling upon inferior outcomes (exploration)? This also raises questions about whether it's possible to explore 'safely', which was the subject of Andreas Krause's keynote at AAAI this year.

Back to exploration-exploitation: In adaptive Bayesian trials, they use Thompson Sampling. This requires having a posterior over models, sampling one and selecting the action with highest expected utility relative to that model. So you act greedily given your belief (exploiting), but your belief is random (exploring). Another approach is to define an upper confidence bound (Auer 2002), where you estimate the confidence of the estimate of the expected utility of an action using how many times the action has been tried, and select arms maximising the estimate + the confidence bound. In this way, you select actions which are either very good, decent and uncertain, or very uncertain. The third example in her slides is BESA: Best Empirical Sampled Average (Baranski, Maillard, Mannor, 2014), which seems to involve subsampling the arm which has more data, then selecting the one with highest expected reward.

The specific application was cancer, specifically trying to minimise tumour volume in mice. They did a pure exploration phase, where mice with induced tumours had random treatments of combinations of two drugs (fluorouracil and imiquimod). They then considered the adaptive problem of selecting treatments given the current tumour size. This makes it a contextual bandit problem. They used Gaussian Processes to model the reward function over the space of continuous contexts (tumour sizes) and arms (discrete treatments). Then, given a specific context, you can select the arm maximising the expected reward, using these earlier-described methods. At this point there's a reference to Durand & Pineau 2015 for the GP extension of BESA but I somehow cannot find it. The idea seems to be to re-estimate the GP using a sub-sample of the data, then using that GP to estimate the maximum expected reward. Preliminary results using the adaptive approach look promising, and they're interested in doing sequential reinforcement learning (rather than bandits) in the future.

Machine Learning In Social Good Applications

I approximately made it to the Disease section of this workshop, which is unfortunate because I would have liked to see Quantifying and Reducing Stereotypes in Word Embeddings, Bolukbasi et al. I'd consider this under the umbrella task of removing unwanted patterns from data, or perhaps more accurately, training a model such that it doesn't pick up on these patterns. See also 'racist algorithms' and this ProPublica piece on Machine Bias. Will there be a conference summary where I don't mention Fairness, Accountability and Transparency in Machine Learning? Probably not.

Anyway, I have an especially strong memory of Barbara Han's talk on Predicting Novel Tick Vectors of Zoonotic Diseases, possibly because it contained many horrifying images of ticks. This work is part a project to use machine learning to predict zoonotic diseases, and also featured a (iirc) undergraduate researcher! The problem is basically: ticks act as disease vectors, but not all of them carry zoonoses. They mined entomological literature (and maybe other sources) to come up with feature sets for ticks, trained a supervised classifier (if I recall they used boosted regression trees), and predicted novel vectors. They also did some feature analysis to understand what differentiates these classes of tick. It turns out that a strong predictor is the number of hosts the tick feeds on. It seems like this could be confounded with the need to feed on a specific host (since that host has to be reservoir of the zoonosis), I asked and they hadn't done a breakdown looking at the specific species. Anyway, a straight-forward machine learning task but an important problem in ecology and epidemiology.

A Rant about the Venue

Times Square is the worst. Times Square is why people hate NYC. Tunnels should be built under Times Square so we never have to look at it. I acknowledge its utility to tourists and I reserve through gritted teeth some respect for their bloody-minded dedication to milling at junctions, drifting absent-mindedly across sidewalks, and stopping suddenly. I just don't enjoy being the person trying to weave between them on my way to lunch, especially when it's summer in NYC and I'm an inappropriately-attired Irishwoman. (We don't do 'direct sunlight' very well.)

I thought of some reasons to locate a conference on Times Square:

the rest of the world has been destroyed
- Times Square stands alone in the void, a final stand for humanity against the encroaching oblivion
- there is nothing left to do but hold conferences
there are no other appropriate venues in New York City
conferences require a density of hotels only offered by Times Square
holding a conference in what is probably a very expensive hotel is a demonstration of power and status
- for... someone. ML researchers maybe? 😎

The venue itself was interesting because the conference was distributed across multiple floors. This meant lots of using the futuristic elevator system. I was involved in more than one 'what algorithm does this elevator system use' conversation. And hey, here's the chapter of the Sutton Reinforcement Learning book about Elevator Dispatching. I wonder how many interesting methods have been developed to solve simple problems arising in the work environment of engineer/scientist types. I certainly used to think about the optimal road-crossing strategy when I lived in NYC (the problem is slightly interesting because east/west and north/south aren't symmetric due to differing block lengths and crossing times, so always going with the go sign isn't an optimal policy[citation required]).

The negative side-effect of this layout was (to me) a lack of general 'focal point' for the conference, especially since there were various other things going on in the hotel. (Excitingly, on the final day there was an Edward Tufte seminar on the same floor as us.)

TL;DR limit registrations to a number your venue can comfortably accommodate. Turning people away is sad (especially if they are, like me, students who only knew they were going once their workshop submission was accepted), but overcrowding is detrimental to good conferencing.

In Conclusion

Despite missing about half the conference between volunteering, working and being sick, I saw a lot of good work and had some great discussions with people. I'm a bit disappointed there was no proper closing ceremony with summary statistics like at NIPS (unless it was at the party on the Wednesday, which I spent coughing in my hotel room). The multi-track format makes it a little hard to get an overview of the broader field, ad there was a strange lack of closure on the last day. I'd say I'm looking forward to next year, but I think* it's going to be in Sydney, so we'll see about that.

*I don't know why I think this and I can't find any evidence supporting it. I did however learn that ICML also stands for:

international conference on minority languages
international congress of medical librarians
international conference on chronic myeloid leukaemia
international conference on malignant lymphoma

The more you know.

characterising treatment pathways at scale using the OHDSI network

2016-06-15T00:00:00+01:00

This post is about the paper Characterizing treatment pathways at scale using the OHDSI network from the hefty author list: George Hripcsak, Patrick B. Ryan, Jon D. Duke, Nigam H. Shah, Rae Woong Park, Vojtech Huser, Marc A. Suchard, Martijn J. Schuemie, Frank J. DeFalco, Adler Perotte, Juan M. Banda, Christian G. Reich, Lisa M. Schilling, Michael E. Matheny, Daniella Meeker, Nicole Pratt, and David Madigan.

Let's have at it. Note: including figures is needlessly time-consuming for me, so I'm going to refer to the paper assuming you have it to hand.

tl;dr

They looked at which medications patients received, for one of three diseases (type 2 diabetes, hypertension, depression), considering sequences of medications. Diabetes treatment is mostly dominated by metformin, and there is more variation for the other diseases. Many patients only ever receive metformin. They break it down by medical centre and find hetereogeneity between centres (and thus countries). Heterogeneity suggests we attempt to generalise with care.

What is OHDSI?

Pronounced 'Odyssey', OHDSI is the Observational Health Data Sciences and Informatics collaboration. From the website, 'OHDSI has established an international network of researchers and observational health databases with a central coordinating center housed at Columbia University.' I was shamefully unaware of its existence, despite it being very relevant to my interests. Evidence-based medicine through data analaysis! International collaboration! Open source! Reproducibility! All great. Fawning section over, on to the contents of the paper.

What did they do?

They analysed data from the OHDSI collection of databases to look at treatment pathways (ordered sequences of medications given to a patient) for three diseases: hypertension, diabetes mellitus type 2, and depression. Details in subsequent sections.

Why did they do it?

This feels like a proof-of-concept paper to me. The concept being that large-scale collaborations involving multiple health centres are possible, and that insights can be gained from analysis of the data. Essentially, the mission of OHDSI. More specifically, supporting the use of observational data to supplement medical research, which classically relies heavily on clinical trials. Observational data is 'free' in a sense (data-collection and storage, privacy-violating concerns temporarily aside), can cover wider populations and goes on indefinitely. Exploiting that has clear benefits. They highlight three key areas of benefit:

Identifying which current therapies should be compared with a new therapy (for experimental design)
Testing clinical hypotheses on observational data (acknowledging the need to do the appropriate statistical modelling)
Understanding population characteristics to aid in extrapolation of results (both observational and experimental)

This study focuses mainly on the first point, as the look at medication trends.

Data resources

OHDSI, at the time of writing, has 52 databases containing 682 million patient records. For this study they used 11 databases with 250 million records. I don't know why they didn't use all the data. These databases were: (this is Table 2)

AUSOM (Ajou University School of Medicine, Korea)
CCAE (MarketScan Commerical Claims and Encounters, I guess USA)
CPRD (UK Clinical Practice Research Datalink)
CUMC (Columbia University Medical Centre, USA)
GE (General Electric Centricity, I guess USA)
INPC (Regenstrief Institute, Indiana Network for Patient Care, USA)
JMDC (Japan Medical Data Center)
MDCD (MarketScan Medicaid Mult-state, USA)
MDCR (MarketScan Medicare Supplement and Coordination of Benefits)
OPTUM (Optum ClinFormatics, I guess USA)
STRIDE (Stanford Translational Research Integrated Database Environment, USA)

So that's one from the UK, one from Japan, one from Korea and eight from the USA. The biggest population by far was CCAE, which contributed 119 million patients. Japan and Korea only comprised 5 million patients together, and the UK 11 million, so most of these patients are in the USA.

The databases have various types of data in them, which is of great interest to me, but in this study they just extracted medications.

Data processing

Filtering for patients

So: which patients did they include in the analysis?

Patients had to satisfy:

≥ 4 continuous years in the database
- ≥ 1 year before any treatment for that disease
- ≥ 3 years of continuous treatment after that (this means patients who died during the period were excluded)
≥ 1 diagnosis code for corresponding disease
0 diagnosis codes for excluded diagnoses (these were: pregnancy for all, diabetes type 1 for diabetes type 2, and bipolar 1 disorder or schizophrenia for depression)

This resulted in 1,182,792 hypertension patients, 327,110 diabetes patients, 264,841 depression patients. I'm not sure what the breakdown by centre was.

Excluding patients who died during that period seems problematic to me, because that's probably not a random event. I worry about excluding subpopulations with more aggressive forms of the disease, or excluding badly-treated patients (although that's slightly outside the scope of this paper I think, but is a question of particular interest to me). The phenotype here is already incredibly broadly defined - what if the observed heterogeneity in treatment pathways is due to such subpopulations? I'm not sure what a better approach here would have been, though - exclude patients who died of reasons unrelated to the disease, perhaps?

Data standardisation

Diagnoses were defined by mapping SNOMED (Systematized Nomenclature of Medicine) and Medical Dictionary for Regulatory Activities to ICD-9-CM (International Classification of Diseases, ninth revision, clinical modification). Medications were defined by their ingredients using RxNorm, and grouped according to classification hierarchies (such as, they state, Anatomical Therapeutic Chemical classification and First Data Bank's terminology). I'm not especially familiar with these ontologies, except for SNOMED. Most of what I've done to date involved UMLS (which contains SNOMED and possibly everything else that has existed).

Constructing medication sequences

Having filtered to these patients they queried the OHDSI databases for the sequences of medications for these patients. Some notes on this:

sequences were limited to a maximum of 20 medications
if a patient switched from one medication and then later back to it, only the first exposure was recorded
combination medications (with multiple active ingredients) were treated as prescriptions of multiple single-ingredient medicines
I don't think the time between medications is considered - they're just ordered sequences of drugs

Having defined these sequences, they then counted the numbers of patients with each sequence and did other analyses. For example, they looked at medication classes, which are listed in table 1.

What did they find?

Also known as: let's look at the figures!

Figure 2

Which drugs do patients get first? Is there a standard entry into treatment-for-disease?

For diabetes, it seems yes. 76% of patients start with metformin. For hypertension, hydrochlorothiazide is sort of most popular (I am squinting at the figure), and in depression citalopram is also sort of most popular, but there's no clear winner. This is where I wonder about subpopulations. The immediate questions are: what's different about these patients? Why did they receive a different first medication? Does it vary by centre (yes - see figure 3)? By other diagnoses? Age? So many variables to consider! (I realise that this paper cannot answer all of these questions and I'm not criticising it - the results just inspire further research.)

Do patients stay on a single drug?

For diabetes, 29% of patients took only metformin. For hypertension, 6.44% took only lisinopril. For depression, 5.18% took only citalopram. Once again I wonder what this means. Was this medication especially effective for them, and if yes why? We see the potential for this large-scale observational data to shed light on differences in response to therapy that might be missed on the smaller-scale of a clinical trial. Maybe.

Unique treatment pathways?

Some patients are unique in the entire dataset: 10% of diabetes patients, 24% of hypertension patients, 11% of depression patients have unique treatment pathways. Clearly doing a nearest-neighbour treatment recommendation approach would fail for these patients, although I wonder if these patients may simply have rather long sequences of medications? It might be in the supplemental data, but I wonder what the distribution of sequence length is.

Figure 3

This is figure 2 but broken down by data centre, for some samples. We see immediately that metformin is less popular in the Japanese database than in the UK or US examples shown. I think the overall gist of this figure is that there is between-centre heterogeneity, and also (as in Figure 2) heterogeneity in the choice of second-line drugs. You could definitely look deeper into this data (hence my feeling that this paper is a proof of concept), but there is a risk (as always) of wading around without a clear hypothesis.

Figure 4

The y-axis here is a fraction of patients in the population. The fraction of interest is given by the lettering. x-axis is time, so we're looking at medicating trends.

A: patients on monotherapy: this became somewhat more popular
B: patients on monotherapy which is the most popular monotherapy for that diesase: the medication is listed with the disease now (so this is a subset of the patients in A)
C: patients whose first medication started with the most popular starting medication for that disease (not necessarily most popular monotherapy)

The conclusion from B is that monotherapy in diabetes is somewhat dominated by metformin, whereas in hypertension and depression there is more variation.

I don't know how they decided which drug was most popular - is this over all patient trajectories over all time (I suspect yes)? It seems unlikely but the apparent absence of a dominant monotherapy in hypertension and depression could be explained by a strong bias towards some drugs being popular at some times: so at any moment in time there is a dominant monotherapy, but because its identity is always changing, it goes undetected by this analysis. Or more similarly, there is a dominant monotherapy, but it's not lisinopril/sertraline. Would this be an interesting finding? Perhaps. Discovering that medication practices are highly influenced by trends could be a cause for concern. Equally, finding that medication practices lag (between centres or behind research) could also be concerning. Or heartening. Who knows.

Figure 5

This is figure 4 but now the data series corespond to data centre, and the different diseases get their own graphs. They bind the y axes together across rows, so there are inset graphs to give the zoomed-in views. Mmm, data visualisation.

There's so much going on here that looking at this figure fills me with vague dread. We have the potential to learn how data centres vary in their medicating trends.

Gravitating towards the most extreme-looking data series, something is going on in STRIDE (US) for monotherapy. 100% of diabetes patients in 2004 were on metformin? This is also when this database appears to begin, so I guess something strange was going on (like only data from diabetes patients on metformin was being recorded, or something)...

The authors draw attention to the lack of consistent bias between use of EHR data and claims data in what they report. This is potentially very interesting, because claims data is somewhat more 'available' from what I can tell (people seem to be publishing more with claims data[citation required]), but is biased towards billing (obviously) and less 'rich' than a full EHR. Being able to use claims data as a proxy for EHR would be good and useful. However, the analyses here draw on medication information, which is probably well covered by claims data, so the finding is probably less striking.

Figure 6

Once again, we see a fraction of something on the y-axis, with time on the x-axis. In this case, it's the fraction of medication changes in that year which were within the same structural class (these classes are not fully listed in table 1, and are definitely in the supplemental information).

I am not sure what to conclude from this figure. Do different strutural classes correspond to very different mechanism of action for the drug? Would changing structural class mean the doctor believes the patient's disease to be characterised differently? I am not a doctor (as might be obvious) and I'm cancer-focused so I'm speculating wildly here. There isn't much discussion of this figure in the main paper. Not much of a trend is observed, anyway.

Conclusion

I reiterate my feeling that this is a proof of concept paper, or possibly a paper to advertise the seemingly incredibly amazing data resource OHDSI is creating. There aren't really any hypotheses tested in this work, and I don't come away from it with a strong conclusion beyond 'heterogeneity exists'. Then again, I came into this paper with little by way of prior expectation for the findings.

There are some further avenues of research (some of which I mentioned in this blog post) prompted by this study, but whether they're truly worth pursuing requires further thought, as ever. And I'm definitely going to check out what else OHDSI is up to.

play me a match of doto for your heart

2016-06-11T00:00:00+01:00

Here are some strange messages I have received on OKCupid.

???

you are evil // Sorry you do not you're Sorry
hi // how are you? // what did you mean with that you are dog person
Hi. How are you? Please I need a favor
Hey bitch, are you studying computer science? (this is probably the closest thing to 'abuse' I have received on the site)
I love dogs tho
All of which are American dreams comrade
MY GOD YOU HAVE A FACE // Sorry, was that rude? Is it rude to tell a woman she has a face?

people respond to my profile

The context here is that I write very small pieces of speculative fiction in the 'explanation' box for my questions. Someone asked if they were Cormac McCarthy quotes once, which may be my greatest achievement. I've also gotten a few messages written in the same style as my responses. Dating site as platform for collaborative fiction-writing, anyone?

I could not and still can not tell if your page is a real dating profile or just a place to either copy and paste or write original satire
I have an incredible amount of respect for your commitment to this bit.
five stars for FUCKING TERRIFYING OH MY GODDDDDDDDDDDD
are you a Turing test?
play me a match of doto for your heart (amazing)
Ayyy gurl, no need to get a BKB cuz my love is pure.

Never change, internet.

moved to pelican

2016-05-17T00:00:00+01:00

Here's what happened:

My phone started acting up. The lock button turned into a 'maybe turn the phone off' button. The unlock button turned into a 'maybe turn the phone off, or open the camera?' button. I impulse-purchased a new phone, and the old one magically fixed itself. New phone went into a box and my SIM remained micro-sized for a little longer.

I moved to Switzerland, got a nano SIM, started dual-wielding phones. New phone for new SIM, old phone for everything else. I migrated my apps to new phone (this was surprisingly tricky and there's an entire story in there about me conveniently being in the USA to receive a single text message, but it's a tangent), so I got a new Signal fingerprint. (Then my old phone bricked itself, as they do.)

I went to put my new Signal fingerprint on my lovely SSLy Github-pages website and noticed it was broken. For how long this was the case I don't know. I think it was a Jekyll update? Who knows. The time had come. I'd been thinking of leaving gh-pages for a while, so this was a good excuse.

Reasons for wanting to leave:

using plugins with Jekyll and gh-pages is mildly painful
the solution, and/or Jekyll itself was increasingly frustratingly slow
gh-pages was too mysterious

and the new reason:

website was mysteriously broken

These are probably fixable, I am sure. Maybe my solution isn't the best, but it's mine. I decided to: move from Jekyll to Pelican and host my site myself. By the time you're reading this, I'll have achieved that second part. At the time I'm writing it, I haven't even started.

Why did I do this?

Jekyll uses Ruby, Pelican uses Python, I know Python a lot better than Ruby
I never knew Jekyll very well, so there were no temptingly sunk costs to care about
I saw a site using Pelican during an impressionable moment
self-hosting (in a VPS, let's not be unreasonable) affords a level of control that I apparently want

So, this site now looks rather different. That's because another thing happened: I realised that I'm not a front-end web developer, or designer, or basically a person who has touched HTML since she was making Pokémon fansites as a ten-year-old. But I'd also like my site to look good. The old version, [which I should screenshot for posterity] was my first adventure into CSS and was unsurprisingly minimalist. I got some compliments on the design of it (yay!) but it was very hand-crafted and it looked it.

Now I am balancing competing desires: a site which looks good (and works well), and a site which is my creation. The solution for now is to use an existing Pelican theme, made by someone who presumably knows a lot more about websiting than I do, and modify it to my own purposes. Apparently the one I picked was intended to look like Medium so now after several hours of mucking around I have a Medium blog with no features. Excellent.

I'm modifying the theme (see my fork) and I will continue to do so until I get distracted by something else.

This post brought to you by an internet outage at the AirBnB I currently live in.

installing tmux locally

2016-03-13T00:00:00+00:00

I have been setting myself up on a new computing cluster (CentOS 6.7), so I'm in the lovely land of installing things without root. tmux proved a bit frustrating, so here's what I ended up doing:

install libevent

tmux needs this, I didn't have it (you might, so try installing tmux first). Grabbed it from the repository, then:

$ git clone https://github.com/libevent/libevent
$ cd libevent
$ ./configure --prefix=$HOME
$ make
$ make verify       # this failed for me, oh well
$ make install

Choose your prefix as you desire.

Now libevent should be installed in $HOME. Easy y0.

install tmux

Git all the things.

$ git clone https://github.com/tmux/tmux
$ cd tmux
$ zsh autogen.sh

Now for the thing which was required to make everything work, a slightly augmented version of the winning answer from this Stack Exchange post...

$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/lib         # possibly optional, see below
$ DIR=$HOME
$ ./configure CFLAGS="-I$DIR/include" LDFLAGS="-L$DIR/lib" --prefix=$DIR

Obviously I could have used $HOME there instead of $DIR, but I am staying consistent.

$ make
$ make install

And everything should Just Work now. This probably shouldn't have posed the difficulty that it did, but I'm a scientist, not a sysadmin.

final thing

Does the LD_LIBRARY_PATH line seem redundant to you? It seems redundant to me, but I had to add it to my .zshenv (or, try .bashrc) after this to get tmux to continue working. May or may not have been necessary during installation (what with all that redundancy) but I'm not going to uninstall just to check, because tmux is actually working and if I look at it too closely it will definitely break.

AAAI 2016 by the day

2016-02-17T00:00:00+00:00

I started writing this in Phoenix airport, so if the current trend (n=2) continues, I'll start recounting my next conference half-way through, with interesting implications for the latter half of the post. This was my first time attending the Association for the Advancement of Artificial Intelligence Conference (AAAI), so I made sure to spend most of it comparing it to NIPS. I stopped taking notes towards the end, so this coverage is a bit skewed.

Thursday

Shameless plug for a great vegan restaurant in Phoenix: Green. I would have eaten there a lot more if it were a bit closer to the conference centre. I ended up going to Vegan House a few times. A fair runner up in the list of Best Vegan(-ish) Restaurants in Downtown Phoenix (there are two).

Friday

Shortly before dawn, it became cold enough to sleep. I appreciated the vastness of the Arizona sky and the eerie absence of fellow pedestrians as I relocated to my downtown hotel. Coming from Dublin and then New York City, I find empty paths unsettling, especially coupled with wide roads and low buildings. I passed by a man selling paintings of owls from a rucksack, and order was restored.

Friday and Saturday of the conference were tutorial/workshop days (the distinction between these categories is not clear). On Friday morning I went to...

Organ Exchanges: A Success Story of AI in Healthcare: John Dickerson and Tuomas Sandholm

I'd seen John Dickerson speaking at the NIPS 2015 Workshop on Machine Learning in Healthcare (some thoughts on MLHC in my NIPS 2015 post), so I was already somewhat familiar with this work. I think he's a good speaker, so even though this topic is not entirely relevant to me, I figured I'd get something out of the tutorial. This was true to some extent - my attention started to flag at some point into what was essentially a 3.5 hour lecture.

The link to the slides is above and here, so I will just outline the main idea and skip the algorithmic details.

Kidney exchanges: you need a kidney, and a family member/friend/loved one is willing to donate one. Unfortunately, they may not be compatible. The solution is to 'trade' donors with someone else: "I'll give you my mother's kidney for your girlfriend's kidney", or, "I'll give you my mother's kidney so your girlfriend can give her kidney to that other person, and their friend can give me their kidney", and so on. This amounts to finding cycles in a graph (the second example being a 3-cycle), which brings us into the wonderful world of combinatorial optimisation. The exchange actually requires everyone to go under the knife at the same time (something about trading organs I don't quite recall), so there are physical and logistical limits on the length of the cycle.

They mentioned some other barter-exchange markets, such as

holiday homes (intervac)
books (paperback swap, book crossing)
odd shoes (national (US) odd shoe exchange)

These are neat. People exchanging used items instead of buying new/throwing away is obviously great, and I approve of anyone supporting such efforts. It's what the 'sharing economy' should have been... and now back to organs.

An interesting (and amazing!) thing can happen in these kidney exchanges: sometimes an altruistic donor will show up; someone who just has too many kidneys and wants to help out. These produce 'never-ending altruistic donor' chains ("a gift that gives forever"), and have apparently become more important than cycles for the kidney-matching problem.

I zoned out of the tutorial for a bit to discuss the feasibility of simultaneous translation, prompted by this article: The Language Barrier is About to Fall. My gut reaction is to say 'it's too hard', but that's motivated by my enjoyment of learning languages - part of me (selfishly) doesn't want this problem solved. I'm however learning to temper my skepticism when it comes to what machine learning can achieve, and we're actually getting pretty good at translation (for some language pairs) so I'm pretty optimistic about this. And breaking language barriers, if it can be done cheaply, could be immense. I emphasize the relevance of cost because I see language most prohibitive not for holiday-goers but for migrants, who may not have the resources to buy a babelfish.

There are a lot of subtleties to consider in the kidney exchange problem, and much work has been done: see the slides.

They concluded the tutorial with a discussion of other organ exchanges. Kidneys are sort of 'easy' because the cost to the donor is quite minimal, unlike in e.g. lung exchanges where the donor's quality of life (and life expectancy) are impacted. One can also do living donor liver exchanges, where some fraction of the donor's liver is removed. There are essentially no altruistic donors here. Dickerson suggested combining multiple organs, so you thread a liver and kidney chain together. Perhaps a kidney patient's donor would be willing to donate liver to someone whose donor would give a kidney, and so on.

My plan was to go to AI Planning and Scheduling for Real-World Applications (Steve Chien and Daniele Magazzeni) in the afternoon, but I made the mistake of being outside for slightly too long during lunch, and I spent the rest of the afternoon recovering in a dark and cool hotel room. Irish people: handle with care, keep out of direct sunlight.

Student Welcome Reception

One really nice thing about AAAI was the student activities. Being a student at a conference can be bewildering: there are so many people who seem to know each other, talking about things they seem to know about! I was also there by myself (my group does not typically attend AAAI), so the icebreakers they ran saved me from spending the rest of the conference lurking in corners and hissing at people.

The actual ice-breaker activity was weird (although seemingly effective): we had to take photographs with a AI/AAAI/Phoenix theme (artificially intelligent fire, maybe) featuring ourselves. A ploy to get pictures for a website? Possibly. We never did find out who won the fabled prize.

Saturday

Excluding a brief foray into the tutorial about 'Learning and Inference in Structured Prediction Models', and fruitless wandering in search of coffee shops open on a Sunday, I spent much of the day at...

Workshop on AI, Ethics, and Society

This workshop had overlap in content/speakers/organisers with the 'Algorithms Among Us' symposium at NIPS 2015 (some thoughts here). My interests might be obvious by now.

This was an interesting workshop. There was a mix of machine learners, AI researchers, (possibly) philosophers and miscellaneous other. There were fewer arguments than I would have expected. It's not that that I particularly wanted to see (verbal) fighting, but people seem quite passionate about, e.g. whether or not The Singularity is something to worry about, so I expected more gloves on the floor.

People are concerned about dangerous (powerful) AIs - how do we ensure they don't enslave us all in pursuit of paperclip-making? Do we have moral responsibility towards them? Should they feel pain? Should we be allowed to turn them off, once they're active/alive(?)? Are simulations of humans humans? These were some questions raised.

Some more, uhh, short-term concerns included the risks of adversarial machine learning, the effects of AI on labour markets (more on this later), the difficulty of measuring progress towards AGI, and enough other things that I didn't leave the workshop thinking everyone is feeling Existentially Threatened. I certainly am not.

I'm glad some people are thinking about long term threats (diversity of tactics!), but I am much more worried about the present and near future. AI (rather machine learning) already influences people, in potentially irreversibly life-altering ways (to put it mildly), and I fear the technology is becoming integrated into society faster than anyone can measure its harm (see also: vaping). It's also quite easy for us as researchers to pretend our work is apolitical, that we simply explore and create things, blissfully ignorant of negative consequences should our creations be misused. Positive applications presumably motivate much great work, and I don't wish that people stop this work, necessarily. We just need to acknowledge that we cannot un-discover things, and that people who don't understand the limitations of technology may still use it.

I am meandering to a point: efforts such as the Campaign to Stop Killer Robots are good and should be publicised and discussed. Perhaps the Union of Concerned Scientists should start thinking about 'algorithmic/autonomous threats' (to human lives, livelihoods and the environment). My ideas here are half-formed, which is all the more reason I'd like to see discussions about such issues at similar workshops. It's certainly important that AIs have ethics, but what about the ethics of AI researchers?

Sunday

The conference begins in earnest!

Steps Toward Robust Artificial Intelligence - Thomas G. Dietterich

Quantifying our uncertainty (as probabilistic approaches to AI attempt to do) is about known unknowns: rather, the thing we know we are uncertain about has to appear somewhere in the model. Dietterich drew attention to unknown unknowns: things outside the model, perhaps outside our algorithm's model of the environment.

One way to tackle this is to expand the model: keep adding terms to account for things we just thought of. A risk of this is that these terms may introduce errors if we mismodel them. He suggested that we instead build causal models, because causal relations are more likely to be robust, require less data and transfer to new situations more easily.

Regarding new situations: what happens if at 'test' (deployment, perhaps) time, our algorithm encounters something wildly different to what it has seen before? Perhaps instead of allowing it to perform suboptimally (and worse still, to not know it is performing badly), it should recognise this anomaly and seek assistance. This prompts an open question, "when an agent decides it has entered an anomalous state, what should it do? Is there a general theory of safety?"

Session: Learning Preferences and Behaviour

I'll not lie: I went to this session because it sounded creepy in a Skynet, Minority Report sort of way.

My favourite talk of the session was Learning the Preferences of Ignorant, Inconsistent Agents - Owain Evans, Andreas Stuhlmueller and Noah D. Goodman. Roughly, they are concerned with inverse reinforcement learning (IRL) (so learning utility/reward functions) from suboptimal agents, as humans often might be. A specific case they look at is time inconsistency, which is where agents make plans they later abandon. Seemingly any non-exponential discounting implies time-inconsistency, if my notes are correct. See paper for details. And a related project page: agentmodels.org

I spent the early afternoon finishing up my 'plain English explanation' for the work I was presenting at AAAI, see the page here. I wanted to have something to point my family/friends at when they ask what I work on. Also, making science accessible is good, probably.

Session: Word/Phrase Embedding

I went to this because I was speaking (briefly) at it. Also, because it is relevant to my interests, so I'll list everything.

The oral spotlights:

Inside Out: Two Jointly Predictive Models for Word Representations and Phrase Representations - Fei Sun, Jiafeng Guo, Yanyan Lan, Jun Xu and Xueqi Cheng: Modification of the word2vec-style skip-gram/continuous-bag-of-words model including morphology, project page: InsideOut.
Minimally-Constrained Multilingual Embeddings via Artificial Code-Switching - Michael Wick, Pallika Kanani and Adam Pocock: using artificial code-switching to help rapidly create multilingual tools, borrowing information across languages essentially.
Generalised Brown Clustering and Roll-Up Feature Generation - Leon Derczynski and Sean Chester: I am shamefully ignorant about Brown clustering, so a lot of this was lost on me. Link to project repository, anyway.

The poster spotlights:

Building Earth Mover's Distance on Bilingual Word Embeddings for Machine Translation - Meng Zhang, Yang Liu, Huanbo Luan, Maosong Sun, Tatsuya Izuha and Jie Hao: I may have spent this spotlight worrying about my spotlight.
A Generative Model of Words and Relationships from Multiple Sources - Stephanie L. Hyland (that's me) , Theofanis Karaletsos and Gunnar Rätsch: People seemed to like the slides I made for this spotlight, so I put them in the project repository with some other 'media', see here.
Single or Multiple? Combining Word Representations Independently Learned from Text and WordNet - Josu Goikoetxea, Eneko Agirre and Aitor Soroa: work in a similar vein to mine, in the sense of combining information from 'free text' and 'structured data' (in this case WordNet).

From Proteins to Robots: Learning to Optimize with Confidence - Andreas Krause

Some interesting and important questions:

how can an AI system autonomously explore while guaranteeing safety?
how can we do optimised information gathering?

The former question is quite important for 'learning in the wild', and moving beyond the existing (rather successful) paradigm of test/train/validation that we have in machine learning - what happens when the data the algorithm sees depends on actions it takes?

The latter is quite interesting for cases where we want to probe some nearly-black-box system, but probing is expensive. One can use the framework of Bayesian Optimisation (Močkus, 1978), and score possible locations (to probe) by their utility in resolving the exploration/exploitation trade-off (via some kind of acquisition function, of which many have been proposed).

He discussed how one can use Gaussian processes and confidence bounds to help with this, and I'll include a pointer to Srinivas et al, 2010.

Some more paper pointers:

Stochastic Linear Optimization under Bandit Feedback - Varsha Dani, Thomas P. Hayes, Sham M. Kakade
Active Learning for Level Set Estimation - Alkis Gotovos, Nathalie Casati, Gregory Hitz, Andreas Krause
Safe Exploration for Optimization with Gaussian Processes - Yanan Sui, Alkis Gotovos, Joel Burdick, Andreas Krause
Contextual Gaussian Process Bandit Optimization - Andreas Krause, Cheng Soon Ong

(I am quite fond of Gaussian processes, in case that wasn't already obvious.)

The conclusions were:

feedback loops abound in modern ML applications
exploration is central but also delicate, and safety is crucial
statistical confidence bounds allow navigating exploration-exploitation tradeoffs in a principled manner

Poster Session 1

I was presenting at this session (see my poster here), so I didn't get to look at anything else. I struggled to eat bean tacos one-handed, and I talked a lot.

Monday

Learning Treatment Policies in Mobile Health - Susan Murphy

I have Susan Murphy's paper Optimal dynamic treatment regimes on my desk as I write this, so I was pretty excited to see her speaking. And on mHealth, too! Double excitement.

It turns out that she is also involved in the Heart Steps project with Ambuj Tewari, which I wrote about a little in my NIPS post, so I'm not going to repeat myself.

The 'treatment optimisation' aspects of mHealth are interesting because it gets into the realm of HCI and psychology. You want to send the patient reminders to do a thing, but you don't want them to become habituated and ignore them, or irritated, or distracted. She mentioned the need to avoid pointlessly reminding the patient to go for a walk while they're already walking, or dangerously alerting them while they're driving. I find it uncomfortable to be reminded that my phone knows when I'm walking/driving, but if the information is being recorded anyway, you might as well use it, right? Insert something about dragnets here.

But really, mHealth provides some very exciting opportunities to do reinforcement learning. She mentioned non-stationarity as a general challenge, and suggested one could perhaps do transfer learning within a user to tackle it.

Session: Active Learning

A POMDP Formulation of Proactive Learning (Kyle Hollins Wray and Shlomo Zilberstein) was interesting. The idea is that the agent must decide which oracle to query to label a particular data point, where the underlying state is the correctness of the current set of labels. I'm not familiar enough with the active learning field to say if this formulation is especially novel, but I liked it, possibly because I like POMDPs.

Session: Privacy

I experimented with taking no notes during this session to see how it would influence my recall of the material. The trade-off here is that taking notes is a little distracting for me (as well as providing many opportunities to notice Slack/email/etc.), but does provide a lasting record.

Logical Foundations of Privacy-Preserving Publishing of Linked Data (Bernardo Cuenca Grau and Egor V. Kostylev) was strangely fascinating. They were talking about anonymisations of RDF graphs (a data type I'd been working with for my word embedding work). I'm also quite interested in information linkage (see e.g. my talk at Radical Networks 2015), so this was up my alley.

Not sure how the experiment worked out, further data required.

Session: Cognitive Systems

I was heavily overbooked for this time-slot: I wanted to see Deep Learning 1, Discourse and Question Answering (NLP 6), the RSS talks (for my friend Ozan's talk), Cognitive Systems (largely for Kazjon's talk - see below), and Machine Learning/Data Mining in Healthcare. Time turners have yet to be invented, unfortunately.

One of the recurring themes of my AAAI v. NIPS pronouncements was that AAAI has, well... more AI stuff. This session was probably the closest I got to that (unless you count the AI and Ethics workshop: I'd consider it meta-AI). People were doing reasoning without probability distributions, using first order logic! One of the presentations included this video which I found strangely distressing (to me it is - spoilers! - clearly about domestic abuse).

The talk I had come to see, Surprise-Triggered Reformulation of Design Goals (Kazjon Grace and Mary Lou Maher), along with numerous chats with Kazjon throughout the conference made me realise that computational creativity is a thing. OK, full disclosure: I am loosely involved with some generative art folks so I did sort of know this, but it hadn't occurred to me that one might use machine learning to represent or understand mental processes surrounding creativity. Neat! The idea here is that the way humans design things is iterative: you have some loosely-formed idea, and through the process of realising it, notice things you hadn't expected (experience surprise, as it were), and modify your idea accordingly. So there is interplay between the internal representation (perhaps this is the design goal) and the external representation (the realisation). So they're interested in understanding surprises: perhaps an element of a design is unusual given other elements of the design, for example. I am going to have to actually read the paper before I elaborate any further on this, but the experiments involved generating weird (but edible) recipes so I'm looking forward to it.

Very deep question raised by all of this: "can computers be creative?"
Related: what is creativity? What is art? What are computers?

AI's Impact on Labor Markets - Nick Bostrom, Erik Brynjolfsoon, Oren Etzioni, Moshe Vardi

I managed to take no notes during this panel (my notes from AAAI actually dry up around here, I hit peak exasperation with keeping my devices charged).

I have a lot of feelings about AI and labour, but I'm first going to direct attention to the Panel on Near-term Issues from the NIPS Algorithms Among Us Symposium, which had a similar lineup.

Ultimately, it is hard to solve social and political issues using technology alone, especially if those issues arise as a result of the technology itself. I'd love to automate away all the mind-numbingly boring and unfulfilling jobs humans currently do, but I don't want to remove anyone's livelihood in the process. I don't think it's sufficient to say that society will 'figure it out somehow', especially in countries such as the USA where there is so little protection from poverty and homelessness. That said, I don't know what the solution is (except for some rather radical ideas with limited empirical support for their efficacy), and I don't know if it will, or should, come from the AI research community.

Poster Session 2

I got slightly side-tracked by ranting about how broken academic publishing is. Shoutout to the Mozilla Contributorship Badges project for trying to deal with the credit-assignment problem, for one.

Tuesday

Towards Artificial General Intelligence - Demis Hassabis

Google DeepMind are arguably the machine learning success story of the last year, given their Atari Nature paper and AlphaGo result (although the match against Lee Sedol in March will be more interesting). I'm very happy to see computer games featuring so prominently for evaluating and developing AGI: so much that I spent the session after this talk sketching out a project involving Dota 2, which I think could be a very interesting application of deep reinforcement learning, if only the metagame would stabilise long enough to allow for acquisition of sufficient training data!

Anyway, this talk mostly convinced me that DeepMind are doing cool stuff, which I imagine was the intended effect. Hassabis was coming from a pleasantly principled place. They do seem genuinely interested in AGI, rather than for example, beating benchmarks with yet deeper networks. I don't mean to imply that beating benchmarks isn't important, but I think the types of discoveries one makes in the pursuit of larger/more abstract goals are quite important for the intellectual development of a field which can easily become dominated by engineering successes. So yes, the talk had the intended effect.

Session: Reinforcement Learning I

Distance Minimization for Reward Learning from Scored Trajectories - Benjamin Burchfiel, Carlo Tomasi and Ronald Parr: this is about IRL with suboptimal experts (a popular and interesting topic). In this case, the 'demonstrator' need not be an expert but can operate as a judge, assigning scores to demonstrators. The real-world example would be of a sports coach who's no longer capable of creating expert trajectories (that is, demonstrating optimally) but who can still accurately rate demonstrations from others, if they're available. They also study the case where the judge's scores are noisy and find the algorithm robust.

Inverse Reinforcement Learning through Policy Gradient Minimization - Matteo Pirotta and Marcello Restelli: more IRL through parametrising the expert's reward function, but here it is no longer necessarily to repeatedly compute optimal policies, so this should be quite efficient. Also, this algorithm is called GIRL.

Poster Session 3

Some interesting posters (highly non-exhaustive list, but I'm exhausted):

Predicting ICU Mortality Risk by Grouping Temporal Trends from a Multivariate Panel of Physiologic Measurements - Yuan Luo, Yu Xin, Rohit Joshi, Leo Celi and Peter Szolovits
Reinforcement Learning with Parameterized Actions - Warwick Masson, Pravesh Ranchod and George Konidaris
Siamese Recurrent Architectures for Learning Sentence Similarity - Jonas Mueller and Aditya Thyagarajan

Wednesday

Sessions: Reinforcement Learning II & III

Model-Free Preference-Based Reinforcement Learning - Christian Wirth, Johannes Fürnkranz and Gerhard Neumann: I didn't actually see this talk, but the paper has a number of interesting words in its title, so it must be good.
Increasing the Action Gap: New Operators for Reinforcement Learning - Marc G. Bellemare, Georg Ostrovski, Arthur Guez, Philip S. Thomas and Remi Munos: this was a good talk. Basically, during value iteration one applies the Bellman operator to the state-action value function (Q-function). The fixed point of the operator is the optimal Q-function, which induces (greedily) the optional policy. They argue that this operator is inconsistent, in that it suggests nonstationary policies. They resolve this by definining a 'consistent Bellman operator' which preserves local stationarity and show that it increases the action gap (the value difference between the best and second best actions). The action gap is relevant because it can allow for selecting the same (optimal) action even when estimates of the value function are noisy. And a link to the Arcade Learning Environment.

Deep Reinforcement Learning with Double Q-Learning - Hado van Hasselt, Arthur Guez and David Silver: more from DeepMind. I swear I am not a DeepMind fangirl. Setup here: Q-learning can result in overestimates for some action values. Using DQN (deep Q-learning algorithm) they find that this happens often and impacts performance. They solve the problem by showing how to generalise Double Q-learning to arbitrary function approximation (rather than just tabular Q functions). So this paper seems like a natural progression for Double Q-learning.

Conclusion

Exploration-exploitation trade-offs are everywhere. At this stage in my career, I consider going to conferences a largely exploratory activity: I can learn a little (or more) about a lot of things and get an idea for the kinds of research going on. For the people who spend conferences meeting with their collaborators, it's more about exploitation. (For the appropriate interpretation of that word.) I am a little fatigued of exploration right now - I'm still processing things I saw at NIPS, so I was not well positioned to make the most out of AAAI. I kept wanting to run off and write code in a corner, but who does that? Well, I do that. I do that right now.

linking between zotero and evernote (in OSX)

2015-12-18T00:00:00+00:00

I'm upgrading my paper-management workflow from 'labyrinth of folders' to an Evernote + Zotero mix. I already use Evernote a little for this by writing paper summaries in it, but I would rather do the 'heavy duty' management and organisation in Zotero. So, I'd like to easily switch between the Evernote note about a paper and the Zotero reference on it. I need:

Zotero → Evernote
Evernote → Zotero

Note: This is all with OSX 10.11.1 (El Capitan), Evernote 6.3, and Zotero 4.0.28.10. YMMV.

Zotero to Evernote

My solution here is to add an attachment to the reference which is a link to the Evernote URI. Zotero gracefully handles this, you just right-click on the reference and Add Attachment → Attach Link to URI...

The hard part is then getting the URI. In (my currently-up-to-date version of) Evernote, the Copy Note Link gives you a HTTP(S) link:

https://www.evernote.com/shard/asdfgh

This converts to the Evernote URI link if you paste it inside Evernote, which is pretty cool I guess but also not useful here.

I use the Evernote client and I don't want this, I want the URI like:

evernote:///view/adfgh

This is (for now, probably) known as Classic Note Link in Evernote, and you get it by holding down Alt on the menu after right-clicking on the note. Bizarre and annoying, but whatever. It works.

Now I can just double-click on the Evernote URI attachment on my reference in Zotero and it'll open the note (in my Evernote client) with my notes on it.

Evernote to Zotero

Now, we want to get the Zotero URI (to the reference) and include it in an Evernote note. The 'normal' URI you'd get from Zotero (using Item URI as your Default Output Format under Preferences → Export → Quick Copy) is:

http://zotero.org/users/local/asdfgh/items/asdfgh

What I really want is the non-HTTP URI, e.g.

zotero://select/items/asdfgh

However, if you paste that into Evernote it doesn't recognise it as a URI or anything that should be linkish. It just sits there, flat and idle and useless. This is pretty annoying given it can do this for Evernote URIs, as above, but whatever...

Solution Sketch

This solution is from brain.flush(); Connecting Zotero and Evernote and RTF-Links from Zotero in Evernote. This post isn't just a link to that blog because I fear link rot.

The workaround is pretty gross but bear with me. First, you modify Zotero so that its Quick Copy gives you a HTML-formatted URI, e.g.

<a href="zotero://select/items/asdfgh">Paper Title</a><br />

then convert the HTML in your clipboard to RTF, which can be pasted into Evernote and will act 'as desired'. The intermediate RTF representation looks something like this, btw, so clearly there's scope for further customisation:

{\rtf1\ansi\ansicpg1252\cocoartf1404\cocoasubrtf130  
{\fonttbl\f0\froman\fcharset0 Times-Roman;}  
{\colortbl;\red255\green255\blue255;\red0\green0\blue233;\red0\green0\blue0;}  
\deftab720  
\pard\pardeftab720\sl280\partightenfactor0  
{\field{\*\fldinst{HYPERLINK "zotero://select/items/asdfgh"}}{\fldrslt  
\f0\fs24 \cf2 \expnd0\expndtw0\kerning0  
\ul \ulc2 \outl0\strokewidth0 \strokec2 Paper Title}}  
\f0\fs24 \cf3 \expnd0\expndtw0\kerning0  
\outl0\strokewidth0 \strokec3 \

Ew. But you never need to look at this.

Specific Steps

Modifying Quick Copy:
Put this script in your translators folder wherever your Zotero is.
Tell Zotero to use it for Quick Copy: Preferences → Export → Default Output Format: select ZotSelect Link (HTML) from the dropdown.
Now Cmd + Shift + C will put the HTML-formatted link in your clipboard.
Converting HTML to RTF:
The script to do the transformation is (choose your favourite UTF-8 region...) export LANG=en_US.UTF-8; pbpaste | textutil -stdin -stdout -format html -convert rtf -inputencoding utf-8 | pbcopy
You can automate this with Automator, (OSX tool):
- Create a Service, add an action of Run Shell Script, paste in above code.
- Make sure set the "Service receives selected..." dropdown to "no input".
- Call the service whatever you want - following the source blog I called mine "Convert HTML clipboard to RTF".
Set a keyboard shortcut for the new service under Keyboard → Shortcuts → Services. Also following the blog, I used Cmd + Shift + Ctrl + C.

So now the workflow is:

[in Zotero] Cmd + Shift + C on desired reference
[wherever] Cmd + Shift + Ctrl + C to transform contents of clipboard
[in Evernote] paste normally to get a link which opens the reference in Zotero.

NIPS 2015 by the day

2015-12-14T00:00:00+00:00

I got back from Montreal yesterday. I was at the Twenty-ninth Annual Conference on Neural Information Processing Systems (NIPS) - a rather large gathering of people interested in machine learning, neuroscience, artificial intelligence, and related topics. It's an academic conference, and it is intense. Many wonderful conversations were had, things learned, insights gained, ideas developed, coffees consumed. Old friends met and new friends made. I left physically exhausted, and this post is an attempt to summarise some of what went down. This was also my first time attending NIPS, so next time I might be a little more conservative with my energy.

If it seems like my level of detail varies wildly, it's because sometimes I took notes, sometimes I couldn't, and sometimes I didn't want to.

Sunday

When I flew from Dublin to Hamburg for 31C3 last year, the plane was full of vaguely unusual-looking people (myself included, no doubt) clearly destined for Congress. Who else would fly to Hamburg on St. Stephen's day? The flight from NYC to Montreal for NIPS was a little less homogeneous, and machine learners are harder to spot (posters are strong evidence), but I nonetheless had the same vague feeling of unified purpose with my co-passengers. Conversation about optimisation broke out on the bus to the city centre, and knowing glances were exchanged between strangers. And so NIPS began as it would continue, a bubble where the social convention of silence is broken by mutual knowledge of shared purpose (this purpose being bringing about the robot apocalypse).

Tip: don't try to register the day the conference starts. Angry Monday morning tweets mentioned waiting times some multiples of how long I spent on Sunday evening. ¯\_(ツ)_/¯

Monday

Because I forgot to register for the Women in Machine Learning workshop, I went to tutorials.

Deep Learning: Yoshua Bengio & Yann LeCun

Topics mentioned were: curse-of-dimensionality, backpropagation, convolutional nets, recurrent nets, details about backprop (e.g. with ReLUs and max pooling, GPUs), distributed training (e.g. asynchronous stochastic gradient descent), applications (eg. vision, more about vision, speech, natural language), attention mechanisms, encoder/decoder networks (e.g. for machine translation), multi-task learning, unsupervised learning, undirected graphical models, more about auto-encoders (e.g. probabilistic interpretation, helmholtz machines), semi-supervised learning (e.g. ladder networks), and some challenges and open problems. The future questions/areas of interest highlighted were:

unsupervised learning, and how to evaluate it?
how to include long-term dependencies?
NLP, generally
optimisation
distributed training?
bridging the gap to biology
deep reinforcement learning

Overall, I wanted to see more gorey details, but deeper coverage of any one topic would have limited breadth, so it was more like a lit review/series of pointers to publications in this field. There was criticism that it placed too much attention on the work of its presenters (also Hinton, who was meant to be there but couldn't make it unfortunately), and gave an incomplete treatment of the history of the field. I'm not in a position to comment intelligently on that. Anyone giving an overview-style talk has a responsibility to adequately cover both history and breadth of research, so I can see why it might have been made.

Probabilistic Programming: Frank Wood

I had already heard some of this content at MLSS 2015 Tübingen so didn't take notes. Check out this repo for the material from the practicals on Anglican. TL;DR:

Goals of probabilistic programming: reduce coding burden, commodify inference, create weird new models, make it widely usable #NIPS2015
— Stephanie Hyland (@__hylandSL) December 7, 2015

Introduction to Reinforcement Learning with Function Approximation: Rich Sutton

I took physical notes for this tutorial, because there was a severe lack of power outlets in the convention centre. Tip: sometimes it's better to have a notebook with a long battery life than a retina screen.

I think reinforcement learning is really cool (and according to how popular the deep RL workshop was, so do other people (or maybe they just like 'deep')).

This tutorial was much more focused than deep learning: it was concerned with policy-learning through first getting an action-value function. This function gives you the expected reward (usually with discounting) upon taking a particular action from a particular state, and can therefore be used to define a policy (e.g., greedily, given your state choose the action with highest value).

He spoke about on- and off-policy learning, where the agent is obtaining information for its estimate of the action-value function while either following the policy given by such a function (on-policy) or some other policy (off-policy), such as a random policy. I hadn't properly appreciated the significance of this difference before, so I found the exposition illuminating. He gave an example where on-policy learning resulted in the highest average reward across episodes, but its learned policy was worse than that of an off-policy learner, since the off-policy learner was able to explore 'riskier' actions. My intuition here is that this result could be altered by tweaking rewards and the inclination towards exploration in the 'off' policy, and I'm sure there is loads of (ancient) work already done on the topic. More papers to read, eh.

Another neat thing about off-policy learning is that you can gather information about many potential policies simultaneously. This might seem 'trivially obvious' (exploration leads to information about the system and its rewards which enables policy-learning) but it is always reassuring to hear one's intuitions restated by an expert in the field.

Overall this was the most lecture-like of the tutorials and hopefully it will appear online soon, because it was well-paced, well-motivated and overall the most useful, even if it wasn't all-encompassing for reinforcement learning (it wasn't trying to be). Sutton is a good educator.

Poster Session 1

I did this session 'wrong'. I tried navigating through an impassable crowd of humans and coats and bags to peer at each poster and then decide if I wanted to hear more. Tip: do not do this. For everyone's sake.

Taming the Wild: A Unified Analysis of Hogwild-Style Algorithms- Christopher M. De Sa, Ce Zhang, Kunle Olukotun, Christopher Ré - I need more asynchronous SGD in my life. They look at the noise you get from asynchronous updates, derive some results and describe a lower-precision SGD algorithm. I am disproportionately likely to pay attention to posters/papers with cool titles.

A Theory of Decision Making Under Dynamic Context - Michael Shvartsman, Vaibhav Srivastava, Jonathan D. Cohen - Neuroscience! Decision making! I have apparently forgotten the main message of this poster, possibly because we rapidly started talking about psycholinguistics. The danger of NIPS is exhaustion through too many interesting conversations. Added bonus for this poster: he made it + code + paper available (see link).

Grammar as a Foreign Language - Oriol Vinyals, Łukasz Kaiser, Terry Koo, Slav Petrov, Ilya Sutskever, Geoffrey Hinton - by the time I made it to this poster the session was over so I didn't get to speak to any of the authors. Main idea: parsing with LSTM + attention! The 'foreign language' part comes in because it's sequence-to-sequence (sentence to linearised parse tree), which is typically found in machine translation settings.

Tuesday

I was unreasonably tired and questioned the wisdom of staying at a poster session until after midnight. Zoubin Ghahramani spoke about Probabilistic Machine Learning while I ate pastries with a fork in the overflow room. The overflow room would have been perfect if it had any power outlets in it. I've heard some variant of Zoubin's talk roughly twice already, thanks to MLSS 2015 and GPSS 2014, so I lost focus and probably missed something new and important. He mentioned probabilistic programming and the automatic statistician. One of the questions was about whether this (the automatic statistician) will replace machine learners : a terrible thought, and ironic for a discipline which (to some extent) aims to automate away many other jobs. The answer was (as you might expect, may have even given yourself); 'this will just make our jobs easier, allowing us to focus on more interesting problems'.

The talk after Zoubin was rather technical and about singular value decomposition. I missed some critical thread of understanding at the start (see missing focus) and sort of give up following, although I note that the speaker was just quite good, even if the topic is not directly relevant to me.

Spotlights as a concept are interesting, and their intended purpose is a little unclear to me. If the poster is personally relevant and interesting, I will (possibly) already know about it and go to it. If it's not relevant, a three-minute summary is unlikely to change my mind. The intended benefits I could imagine (for the presenters) are:

convincing other members of your field that your poster is interesting/worthwhile
convincing people from outside your area that your poster is relevant

However, each of these necessitates a very different three-minute presentation (detailed versus high-level), and it's hard to say what the presenters went with. Further possible benefit:

be able to state that one's poster was selected for highlight

In which case, the audience need not pay any attention. This is another way of saying that while I did listen, most of the spotlighted posters didn't make it into my cut for later. However, the final benefit (for the audience) was appreciated:

exposure to entirely different sub-field and its different priorities and problems

I spent much of the afternoon charging my laptop in a corner of the convention centre and doing some work, so there are some deleted scenes from the conference here. I got an especially foul latte and suffered it for too long. What sort of monster uses artificial sweetener without asking first?

Poster Session 2:

New poster session policy: consult conference book, select posters, target.

Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks - Emily Denton, Soumith Chintala, Arthur Szlam, Rob Fergus - Emily Denton was part of the way into an explanation of adversarial networks when I arrived at this poster. I feel like I've heard a lot about them in recent days, but it's probably just the Baader-Meinhof phenomenon. I like the idea, although I feel like there's probably a way to show that the procedure is equivalent to some other contrastive objective or falls out naturally from an appropriate model choice, but these idle thoughts are better substantiated later/elsewhere/in prior art.

Expressing an Image Stream with a Sequence of Natural Sentences - Cesc C. Park, Gunhee Kim - fairly complicated deep network architecture, the idea is to create a reasonable-looking set of sentences to describe a sequence of images. Training data is blog posts containing pictures, assumed related (they break it into image/text-block segments). Some possibly-interesting pre-processing on the text data (I am biased to find text more interesting than images!), too.

Interactive Control of Diverse Complex Characters with Neural Networks -Igor Mordatch, Kendall Lowrey, Galen Andrew, Zoran Popović, Emanuel Todorov - using a recurrent neural network to learn the dynamics under a control policy; seemingly mapping from the state to the velocities (dynamics) caused by an action.

Efficient Learning of Continuous-Time Hidden Markov Models for Disease Progression - Yu-Ying Liu, Shuang Li, Fuxin Li, Le Song, James M. Rehg - a medically-focused paper! The advance seems to be making continuous-time HMMs more feasible. How much? I'm not sure, I didn't stay too long.

Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization - Xiangru Lian, Yijun Huang, Yuncheng Li, Ji Liu - this is a theory paper about asynchronous SGD. I was initially confused because I didn't know the state of the theory here, and wasn't sure what their actual contribution was. The contribution is about the convergence rate. The take-home for me is roughly 'you can use asynchronous SGD'. See also the poster from Monday on Hogwild!.

Wednesday

I missed Tibhshirani's talk, Post-selection Inference for Forward Stepwise Rregression, Lasso and other Adaptive Statistical Procedures. This was unfortunate given the topic of lunchtime discussion was adaptive statistical procedures (among other things). Being interdisciplinary is interesting: I can simultaneously observe biology's obsession with p-values and machine learning's apparent lack of interest (to generalise wildly). I am not sure how many papers demonstrate statistically significant improvement over state-of-the-art, and while I should back up this speculation with reality, at the present moment (1:19am on Thursday) I'll say 'not many' and leave a generic pointer to Gelman's paper The garden of forking paths: Why multiple comparisons can be a problem, even when there is no “fishing expedition” or “p-hacking” and the research hypothesis was posited ahead of time.

Some of the talks/spotlights ended up being poster picks of mine, so I'll describe them below.

I was strangely entranced by Efficient Exact Gradient Update for training Deep Networks with Very Large Sparse Targets. I have a soft spot for linear algebra tricks. The idea is basically that when you have a very large, but sparse target - as you might get in a language modelling task (trying to predict which of O(100,000) words comes next) - you can do smart things to obtain gradients without actually calculating the horrible non-sparse, high-dimensional output. Lovely. The problem is that this only works for certain classes of loss functions, not including the traditional log softmax one sees in these language applications. So possibly limited benefit, but worthy of further investigation.

Poster Session 3

Refining the poster-session policy, I made it to too many posters and fried my brain.

Deep Visual Analogy Making - Scott E. Reed, Yi Zhang, Yuting Zhang, Honglak Lee - this was a full oral presentation so the poster was crowded. Oh, to be tall. Analogy idea: A 'is to' B as C 'is to' D, given (or simultaneously with) representations of A, ..., D, what does 'is to' mean? Oft-cited example from language modelling is the 'king is to queen as man is to woman' example (from word2vec) where 'is to' is apparently a constant offset vector in the representation space (which is a vector space). This is a very general problem and one I could rant about for quite a long time (indeed, I have a paper on a related topic) so I'll say that the new thing here seems to be the application to images, and nice results/experiments... and probably other details that will only emerge when I read the paper. Cool bonus: they used free (as in Free) game art from the Liberated Pixel Cup.

Training Very Deep Networks - Rupesh K. Srivastava, Klaus Greff, Juergen Schmidhuber - (disclosure: Rupesh and Klaus are friends of mine) Deeper networks are more better, but training them is hard (vanishing gradients and whatnot) - what to do? Highway networks tackle this by putting gates on layers, choosing between 'transporting' and 'transforming' data. Transporting is just an identity operation and therefore doesn't complicate gradients at all. There are (probably very obvious, for those who know LSTMs) connections to LSTMs here also. Keeping in line wih Klaus's Kubrik-inspired paper titles (previous ones being A Clockwork RNN) and LSTM: A Search Space Odyssey) I'd suggest 'Highway Networks or: How I Learned to Stop Worrying and Transport the Data', but admit further work is needed in this direction.

End-to-end Memory Networks - Sainbayar Sukhbaatar, arthur szlam, Jason Weston, Rob Fergus - continuous extension of memory networks, thus can be trained end-to-end (that is, without direct supervision at each layer, just from input-output pairs). The basic idea of a memory network is that you have some memory component (surprisingly enough) which the model learns to read and write to. Obvious applications is question-answering: feed it some text describing a scene, situation etc., then ask questions. I wondered how difficult these tasks could become before the methods started to break down and suggested (I think it was to szlam) that the GRE logic puzzles might be interesting for that, but alas, restricted-access data. One of many reasons we cannot have nice things.

On-the-job Learning with Bayesian Decision Theory - Keenon Werling, Arun Tejasvi Chaganty, Percy S. Liang, Christopher D. Manning - humans are quite good at tasks you might want an algorithm to perform, but employing humans is expensive (in many ways). Algorithms scale much better in this regard, but they have unacceptably bad performance until they've seen enough data. Solution: combine both. Get the algorithm to assess its certainty on the task, and ask for help when it needs it (using Amazon Mechanical Turk). Seems quite cool/useful, although I have some Complicated Feelings about turking (is it fine? is it creepy? is it exploitative somehow?).

A Framework for Individualising Predictions of Disease Trajectories by Exploiting Multi-Resolution Structure - Peter Schulam, Suchi Saria - carefully constructed hierarchical model of disease trajectory to identify patient subgroups. In particular, using a noise model (gaussian process with particular kernel choice) which allows for transient trends such as infection, medication etc. I think the disease severity is measured by lung capacity, so it's a 1-dimensional state space (although patients have covariates etc.), but I don't see any reason why a similar model couldn't handle other state-spaces. It makes for nice graphs, anyway. I'm glad to see probabilistic graphical models for healthcare represented at NIPS.

Thursday

Some last minute poster-printing shenanigans occupied the morning. For future reference: Copie Nova printed my A1 poster in 15 minutes.

Poster Session 4

Update to policy: bump into a friend, end up chatting about twitter bots and other side projects. Miss half the poster session.

Semi-supervised Sequence Learning - Andrew M. Dai, Quoc V. Le - I marked this and have no memory of actually reading the poster. I suspect it was mobbed and I gave up. Things in the direction of unsupervised learning are interesting, so the paper is probably interesting.

Skip-Thought Vectors - Ryan Kiros, Yukun Zhu, Ruslan R. Salakhutdinov, Richard Zemel, Raquel Urtasun, Antonio Torralba, Sanja Fidler - this is the only paper I had already read before coming to the conference, so it was neat to get to talk to Kiros about it briefly! The idea is basically: as in word2vec, you learn a representation of 'meaning' by trying to predict context. This time, the prediction is of the preceding and subsequent sentences, using RNN encoders/decoders. They also use an interesting trick to augment their underlying word representations by learning a mapping from pre-trained word2vec vectors into their mapping space. This allows for any word2vec-learned word to be used in their setup. I was surprised that this worked well, since the problem should be over-determined (they solve this approximately with a L2 loss, but still). The title is also very eye-catching (the term was coined by Hinton, according to Kiros), although I think we're still a ways away from actually representing thoughts. Sentences are closer than words, but are they close enough?

Symposium: Algorithms Among Us: the Societal Impacts of Machine Learning

I am so excited to see the field talking about this. It is very easy as a scientist to divorce oneself from the social, ethical, economic etc. consequences of one's work. I was glad to see a large crowd turn out to this, although there's certainly an element of 'self preservation' here - that is, how do we make sure machine learning (and artificial intelligence) retains a positive status in the eyes of everyone else, and some element of sensationalism regarding 'scary killer robots of the future' (aka 'the children of the singularity are going to murder us all and something about Roko's basilisk'). Nonetheless, cool discussions were had.

Economic Issues

Erik Brynjolfsson spoke about the Economic Implications of Machine Intelligence. He was proposing that we are in a 'second machine age'; where previously machines were used to replace physical power (as in the industrial revolution), we now see computers providing mental power, which possibly threatens not to complement humans but replace us. This has implications for the economy (what doesn't?). He showed some graphs about income trends in the USA, which were (as usual) horrifying and enraging. It's uncertain how we can use machine learning to combat this without simultaneously bringing about other changes in society/the economy.

Legal Issues

Ian Kerr spoke about Machine Learning and the Law, which was fascinating. Question: can computers make contracts? Apparently yes! What about product liability? The manufacturer is usually responsible if there's a defect in a product, but what if your autonomous vehicle drives into a wall to save a small child? It's doing what it was programmed to do - who's to blame? On that point, should it do that? He mentions volenti non fit injuria, that people who enter into risky activities should assume the risk (and entering a self-driving car is a risky activity, arguably). More questions: how much faith should we put in the output of an algorithm? What if an automated medical diagnosis disagrees with a human? Who do we trust? There are questions of both moral and legal liability. If your instinct is to respond with 'trust the human, of course' - what if the algorithm's track record is provably much better than that of the human?

Panel on Near-Term Issues

(with Tom Dietterich, Ian Kerr, Erik Brynjolfsson, Finale Doshi-Velez, Neil Lawrence, Cynthia Dwork.)

I didn't write down who said what, so to anonymously summarise some of the points raised:

the philosophical problems (e.g. trolley/tunnel problem) aren't so clear-cut, because there is uncertainty and also split-second decision-making which may render 'consulting the human driver' an untenable option.
re: people losing jobs to automation: this has been happening for a long time, but that doesn't necessarily make it acceptable. However, arbitrarily banning/regulating things is also not desirable. Both under and over regulation are possibly dangerous.
we should look for ways that AI can enhance human capabilities, rather than trying to replicate it - this might result in very different-looking research and outcomes.
sometimes there just isn't a right answer because we don't know what the objective function is (particularly in ethics), and encoding a single system of values maybe a fool's errand. (I'm reasonably confident Neil Lawrence said this.)
counter-point to the above: robust loss functions exist to allow us to optimise a possibly-misspecified objective function.
we are actually already quite forgiving of (human) mistakes in medicine!
skill/income gap: what about developing countries? Someone pointed out that China has moved to a higher-income country, but mostly by doing the low-skilled labour no longer performed in The West.

Panel: Human-level AI... If, How, and When?

(with Yann LeCun, Andrew Ng, Gary Marcus, Shane Legg, Tom Dietterich)

More semi-anonymous points:

obviously artificial general intelligence (AGI) is a crude concept, but it's still useful... "I'll know it when I see it."
generality is the main difference between task-oriented algorithms and AGI, but maybe human-level AGI is not so important.
reasons to pursuse AGI include better understanding human intelligence, and other questions of psychology.
someone questions how useful AGI is to society, as individualised systems already work very well.
counter-point to above: hand-crafted systems are being outperformed in some tasks by 'less engineered' ones.

quote from Andrew Ng: "working on AGI today is like working on colonising Alpha Centauri", although he isn't opposed to other people working on it.
LeCun emphasises the importance of unsupervised learning for approaching more intelligent machines.
Ng says that seeing into the future is hard if not impossible, and reiterates the importance of unsupervised learning for progress.
re: self-driving cars: Ng suggests starting with vehicles autonomous on specific routes, and then expanding their range of activity, rather than starting with an everywhere-driving car which increases in autonomy.
"AGI will not be an event. It won't happen instantaneously. We will add capabilities. The hardware matters. Much of our meta-reasoning is about resource allocation. Different hardware infrastructures will lead to different trade-offs. We will see systems with different strengths and weaknesses to humans."
minor counter-point to above: maybe in the future, the point at which computers can read open-ended general-domain texts will be regarded as 'the turning point'
Ng: (paraphrasing): "Forming a committee about evil AI robots is like worrying about overpopulation on Mars."
LeCun: (approximately, my live-transcription has a non-zero error-rate): "We like to think of our mind as being a generally intelligent machine, but our brains are very very far from being general. We’re driven by basic instincts built into us by evolution for survival, our brains are very limited in their types of connections and functions/signals they can process/compute efficiently. We’re very slow at adding numbers together… it’s very difficult for us to imagine a different type of intelligence than human intelligence, because that’s the only example we have in front of us. Machines will look very different. They won’t have the drives which cause humans to do bad things to each other."

Friday (Workshop on Machine Learning in Healthcare)

This is technically how I was at NIPS, I was presenting a poster (and I got travel funding).

To my eternal shame and regret, I missed everything before the first poster session. I hope the talks will be online soon, because they sounded great:

Integrating Artificial Intelligence into Emergency Care - Steven Horng
Data-driven Phenotyping of Autism Spectrum Disorders - Finale Doshi Velez
Behavioral Analytics in Mental Health Care - Gourab De

I also accidentaly double-presented my poster, so didn't have time to thoroughly examine work from others.

Rich Caruana spoke about Accuracy on the test set is not enough --- the risk of deploying unintelligible models in healthcare: interpretability is important in healthcare! He gave an example of a rule-based model which, upon inspection, revealed that asthma appeared to predict better outcomes for pneumonia patients. Further reflection yielded the explanation that such patients are more closely monitored and may go to the hospital earlier/more often.

This reminds me of a lesson from my biostatistics class during Part III: from an entirely unspecified population, the information that a given individual has a diagnosis of breast cancer improves their life expectancy relative to the population at large. Why? Such a diagnosis means:

patient is likely female
patient is from a country with breast cancer screening programs

... both of which improve one's life expectancy relative to global average. Couple this with reasonable-ish outcomes for breast cancer diagnoses and you have the seemingly counter-intuitive result. The lesson is to always be vigilant (for confounders).

Nigam Shah spoke about Building [Machine] Learning Healthcare Systems. Apparently 91% of the increase in healthcare costs in the USA is attributable to price increases, and not specific services or ageing. Citation required, obviously, but I didn't take it down. He spent some time discussing how existing data sources (EHRs, clinical trials, chemical databases, health forums, physician query logs, PubMed) can be used to do three things (this is probably an overview of the work done in his lab):

answer clinical questions, e.g. does androgen therapy for prostate cancer influence risk of Alzheimer's, also as a function of age?
obtain insights from data, e.g. here's a pile of data, tell me something I don't know
form predictive models, e.g.
which patients will become expensive next year?
which patients have wounds that won't heal?
which patients may have latent diseases?

There were several contributed talks. My favourite was from Charles C. Onu about detecting asphyxia from a baby's cry. Problem setting: asphyxia in newborns is potentially fatal or debilitating, but typical clinical diagnosis requires resources which are not always available (e.g. in rural locations in Nigeria). It turns out that babies with asphyxia cry in a detectably different way. So he developed the tools (signal processing, classification) and an app to do this on smartphones (high penetration even in low-resource settings). This is one of the coolest applications of machine learning I've heard of, and it didn't require deep learning. He won the prize for best contribution, and deservedly so. This should be a reminder that impact comes from solving important problems, not (necessarily) using high-tech solutions.

Ambuj Tewari spoke about Personalised mHealth. I find this stuff really fascinating (and wish I had any spare time to work on it - all my spare time is occupied by lasers right now). He motivated the issue by pointing out the dire state of mental health care in India: apparently there are 343 clinical psychologists (in the country?), out of a required 13,259, and 290 psycho-social workers out of 19,064. Clearly, anything technology can do to bridge this gap is huge. He pointed out that since smartphone penetration is very high, mHealth has a lot of potential. Then my laptop battery died.

The rough idea is to use reinforcement learning and expertise from human-computer interaction to devise apps which encourage behaviour modification in their users. This idea is a little terrifying when you think of malicious actors (encouraging addiction to pay-per-use services/games for example), but their intentions here were noble (so the tech will never be misused, right?). The stated application was fitness for people at risk of heart disease, if I recall. Finding the right balance of push/pull notifications (including frequency and number) is important to encourage persistent engagement. I'm particularly excited for mHealth applications to mental health, especially for self-monitoring and anomaly-detection. These things already exist to an extent, but I'm not sure how much they rely on machine learning.

Saturday (Reasoning, Attention and Memory Workshop)

This workshop was very crowded, and I was only able to get a seat for the first session, so my notes are terrible/missing. The slides are on the workshop page, anyway, so I'm going to substitute an actual overview of this workshop with the following set of bad jokes:

No regrets.

Workshop bonus: free icecream.

Conclusion

There are a lot of smart people working on a lot of interesting problems. Industry interest in machine learning is very high, and it is possible to do research outside of academia. (This is a curiosity for a former theoretical physicist.) The deep learning wave is possibly cresting, although deep reinforcement learning seems to be pretty hot. Models including memory are exciting and potentially very powerful, for performance on standard tasks and for (the non-orthogonal problem of) modelling reasoning. Unsupervised learning is the future. Healthcare is clearly a wonderful match for interpretable models, both as an application and a source of inspiration for theoretical development (cf. complex analysis and physics). The community is vibrant (if hilariously gender-imbalanced), and I'm looking forward to next year.

vortidplenigilo

2015-11-21T00:00:00+00:00

I have been learning Esperanto lately (see here). One of the really cool features of the language is affixes. Basically, you can create new words using some simple morphological rules, e.g.:

bona (good) → bonulo (good person)
juna (young) → junulo (young person)

vorto (word) → vortaro (group of words = dictionary/vocabulary)
arbo (tree) → arbaro (group of trees = forest)

There are a lot of affixes (at time of writing I have 48 suffixes and 18 prefixes), so I thought it might be useful to write a small program to create new words by randomly attaching these affixes, then quizzing myself on them.

Here it is. (see `soup.py`)

Usage is like this, for example:

> soup(root=u'hundo', n_p=1, n_s=4, cheat=True)
hundo   + pseŭdo : false
        + uj : container for objects described by root
        + esk : similar to/in the manner of whatever is described by root
        + eg : augments or strengthens idea shown by root affix(opposite of -et)
        + ec : quality/characteristic defined by root
pseŭdohundujeskegeco

n_p is the number of prefixes, n_s suffixes. The cheat flag toggles printing the explanation.

So let's interpret pseŭdohundujeskegeco... this is an abstract noun, the quality/characteristic of being a large thing similar to a container for false dogs. Or a false quality of being a large thing similar to a container for dogs. The order of interpretation is clear for suffixes or prefixes, I'm not sure how to resolve it when both are present.

This is obviously a ridiculous word which no normal person would use, but I find generating and interpreting these very entertaining. Another example...

baledejarinegestro: boss of an enormous, somehow female collection of ballet theatres

I could go on all day. To save myself the effort of doing this, I automated it. So now there's a...

Twitter bot: vortidplenigilo

vortidplenigilo:
tool to make [something] full of word derivatives, from vorto + ido + plena + igi + ilo.

Every hour (or so), it tweets a random root (grabbed from a dictionary) with a random number of suffixes and prefixes. Code is in the same repo as before, see vortidplenigilo.py. It chooses how many affixes to use based on two draws from Poisson distributions, preferring fewer prefixes. Since it's limited by Twitter's 140 character limit, those with n_s or n_p above 1 tend not to make it, unfortunately. Future work will shorten the descriptions so I can squeeze more in. The selection of which affix is not entirely random, however...

Making affixes make sense

Not every affix can go on every type of word. Some take nouns and output nouns, other take nouns and output adjectives, etc. The page I grabbed the affixes from thankfully lists which transformations are valid, so I encoded that. See affixes.py for what is essentially a rendering of aforementioned page into python. The sort of information I recorded is explicit in this class definition:

class affix(object):
    def __init__(self, name=u'undefined', 
                 transformations={'x': 'x'}, 
                 explanation='undefined', 
                 conflicts={},
                 category='undefined'):
        self.name = name
        self.transformations = transformations
        self.explanation = explanation
        self.conflicts = conflicts
        self.category = category

transformations is a dict of valid word-type maps, based on word-endings (since Esperanto is so very regular in this regard). In practice these dictionaries either have one element (e.g. 'a': 'o') or all ('i':'i' for 'i' in valid_word_endings), but in theory one could have an affix which turns adjectives into verbs and nouns into adjectives, I suppose. Or something like that. My code is future-proofed against complicated Esperanto dystopias. The point is that as the compound word is created, I keep track of its current 'word type' and make sure I only accept affixes which are compatible with that (and then it gets a new type from its new affix, and so on). This all takes place in the make_soup function in soup.py.
explanation is just the string explaining the affix.
conflicts is a list of other affixes (by name) which I forbid to co-exist in a compound word. The idea is to prevent illogical things like

arbarero: one of a collection of trees... a tree
dormigiĝi: to become made to be asleep... to sleep

I'm not entirely convinced I want this, though. For example,

hundetego: huge small dog

sort of makes sense. Jury is out on this decision.

The final attribute, category, records what type of affix it is, and is currently not used. Future version could restrict to true affixes or adjective suffixes or something. Future proof, yo. Maybe.

One of its first tweets was beautifully meta:

morfologiido: offspring of morphology

Feedback

I would gladly welcome comments/ideas on the GitHub repository, be it language suggestions or corrections (since I am still a komencisto), code fixes, ideas for automatically producing 'interpretations' of the generated words, or anything else. The contents of affixes.py might also be useful for other people doing things with Esperanto.

esperanto in forty-five seconds

2015-10-31T00:00:00+00:00

Esperanto is a constructed international auxiliary language designed to be simple and easy to learn. It achieves this by having a small vocabulary and very regular grammar. It's quite influenced by Indo-European (particularly Romance) languages, so knowing some of those helps a lot.

This post is not intended to be exhautive or comprehensive, I mostly want to express how simple the grammar is and highlight other cool things.

Forty-five seconds is about how long it takes me to read this page, but excludes time spent on trying to remember things. If you want to actually learn, check out the links at the end.

Gender?

Nope. The definite article is 'la' for everything. There is no indefinite article.

Nouns?

They end in '-o':

hundo: dog (like the German Hund)
feliĉo: happiness (like the Spanish felicidad)
fromaĝo: cheese (like the French fromage)
arbo: tree (like the Latin arbor, Spanish àrbol)

To pluralise, add '-j':

hundoj: dogs
arboj: trees

Pronouns?

This is just a list but people like these things.

mi: me
vi: you
ŝi: she
li: he
ĝi: it
ni: we
ili: they

There's no plural 'you', for some reason.

Verbs?

Conjugation by person isn't a thing. One ending for each tense. Using esti (to be):

-i: Infinitive: esti: to be
-as: Present: estas: is
-is: Past: estis: was
-os: Future: estos: will be
-us: Conditional: estus: would be

Examples:

Mi estas...: I am...
La arboj estis...: the trees were...
Feliĉo estus...: happiness would be...

Objects?

Denote the object of a verb (accusative case) with '-n':

La hundo manĝas la fromaĝon: The dog eats the cheese

This stacks with plurals:

Viro vidas la arbojn: A man sees the trees

Adjectives?

They end in '-a':

bela: beautiful
rapida: fast
malrapida: slow

Examples: (note, they must match the noun in case and number),

Mi estas bela: I am beautiful
La rapidaj viroj: The fast men
La rapida bruna vulpo transsaltas la pigran hundon: The quick brown fox jumps over the lazy dog.¹

Affixes?

Observe rapida (fast) → malrapida (slow).

mal is a prefix for negation. Esperanto has very many affixes (prefixes or suffixes) which modify word roots to form other words. This is where it gets really cool and therefore beyond the scope of this post. Briefly, some examples:

hundo → hundido: dog → puppy (offspring of dog)
ĵurnalo → ĵurnalisto: newspaper → journalist (professional of newspaper)
salo → salero: salt → grain of salt (one of the many same salt objects)

Conclusion

This was just a taste of Esperanto. For more, see Duolingo, grammar at lernu!, Mazi en Gondolando.

[1] This English pangram is not an Esperanto pangram! From here, an Esperanto pangram would be:

Laŭ Ludoviko Zamenhof bongustas freŝa ĉeĥa manĝaĵo kun spicoj.
According to Ludwig Zamenhof, fresh Czech food with spices tastes good.

I will [not] follow you into the darknet (radical networks)

2015-10-27T13:37:00+00:00

Last weekend I presented with two friends (huertanix and Caroline Sinders) at the first ever Radical Networks conference. Our talk was originally called 'Blogging on the Darknet' but after a questionably productive co-working session we changed it to 'I Will Follow You Into the Darknet', after a Death Cab for Cutie song. Caroline even wrote lyrics for it. I sort of hate the term 'darknet' but until the Tor Project settle on a nomenclature for location-hidden services (and their related -ome¹) it'll have to do. And that is how terms become accepted. Sorry, world.

Anyway, the slides are here (warning: gifs, dank memes) and there is a video recording here, and have an embedded version too:

Full disclosure: we didn't practice and most of our working sessions devolved into expeditions into giphy, in case that wasn't obvious. I think the last section could have done with being an interactive/hands-on workshop, but we only had an hour slot. And then @brianloveswords turned (the technical part of) our talk into two commands. Also, in case there was any uncertainty, I'm not an opsec professional. Do those even exist?

The rest of the conference was really good, although I missed a lot of Saturday due to occupying a table of cannolis and devising grandiose schemes. Events like this are great for inspiring ideas, so my todo list is ever-growing. In an attempt to avoid the lure of shiny new projects, I'm disallowing myself from starting new things before I've cleared some of my backlog. Let's see how well that goes.

Relevant to this website, one result of the conference was (after much discussion of TLDs) learning that .io belongs to the British Indian Ocean Territory, which has an unpleasant colonial history: the native Chagossians were forced out by the British in the 60s/70s so that the USA could set up a military base, which has apparently been used in the CIA's rendition (and torture) program! It's unclear how much actually having a .io domain contributes to any of this, of course, but I'm ashamed I didn't know any of this before. It's easy to forget that country TLDs come with political baggage.

[1] An awful idea which just occurred to me is 'onion'-ome, after genome and proteome and transcriptome and epitranscriptome and connectome and interactome and literome, etc. In the grim future of Hello Kitty, everything is an ome.

thoughts from mlss 2015 tübingen

2015-07-30T13:37:00+01:00

I recently attended the Machine Learning Summer School at the MPI in Tübingen. This wasn't my first time at an event like this - I attended the Gaussian Process 'Summer' School in September 2014 - but the MLSS is a lot bigger/diverse. I'm fairly sure I didn't even speak to all the other participants (unfortunately).

The basic format is lectures all day (9am til roughly 5pm) and various academic or social activities in the evenings. I foolishly though it would be possible to get lots of work done in the free time, which was wrong on two counts: free time was limited, and the hostel had essentially no WiFi. By that I mean it was impressively bad: pings of the order 5s, packet loss above 50%. Free café WiFi is also harder to find in Tübingen than in NYC (somehow!), so by the end of the two weeks, MLSS participants could be found sitting near eduroam hotspots across the town.

Luckily there are better things to do at a summer school than struggle with laggy ssh tunnels. The lectures provided good exposure various topics within machine learning, although a 3-hour course is necessarily limited in depth. Most of the lectures were recorded and will probably be here eventually. Those from 2013 are still available here. There are also more from other MLSS venues here.

My favourites were Tamara Broderick's Bayesian Nonparametrics and Zoubin Ghahramani's Bayesian Inference (note my bias). Michael Hirsch's Computational Imaging and Michael Black's Learning Human Body Shape were also enjoyable, largely due to demonstrations. The former briefly covered MIT's visual microphone which prompted a similar level of disquiet as it did on infosec twitter, although more fascination. The unearthly sounds of the reconstructions do little to ease the creep level.

My favourite session overall was the practical from Frank Wood and Brooks Paige on Probabilistic Programming (bitbucket repo), possibly because I am a nascent Clojure fan, or maybe I just love sampling. I'm also quite enthusiastic about abstracting away implementation details and focusing on models, which Anglican facilitates. How much use I'll make of it in my own research has yet to be determined.

Something which cannot be replicated via video lectures or git repos (yet) is interaction with other participants. As I mentioned, there were a lot of us (about 100), and the poster sessions were probably the best opportunity to talk science. I'm not sure how participants were selected but I was impressed by the diversity of research represented. It turns out not everyone is throwing convnets at everything (but maybe they should be?). There was also a lot more theory than I was expecting, which is what happens when you assume your biased sample (of largely-applied colleagues) is representative of the whole. Lesson learned. I didn't take any notes at the poster sessions (nor did I read all of the posters), so I'll just mention a few that stand out in my memory (and have something concrete to link to).

Muhammad Bilal Zafar had a poster about Fairness Constraints: A Mechanism for Fair Classification. Quoting from the paper,

"Fairness prevents a classifier from outputting predictions correlated with certain sensitive attributes in the data."

I was really excited to see a poster about fairness, especially having just read "What Does it Mean for an Algorithm to be Fair?". The danger exists for people to believe that the recommendations from a machine learning algorithm are 'fair' (for some nebulous definition of fair, likely including 'not racist' and 'not sexist'), which could be used to avoid addressing systemic social injustices. It's important for machine learning researchers/users to stress that the output of learning algorithm is a function of its training data (madness, I know), and as long as our historical data contains biases, models trained on it will have them too. That is, unless we do something about it. I'm sure there are more subtle factors at play that I'm not aware of, but I'm glad that these issues are being considered by the research community.

Klaus Greff presented a poster about an experiment-management tool he created called Sacred. (The name is a reference to Monty Python's Every Sperm is Sacred). This obviously isn't research, but it seems extremely useful. It records things like config options, a snapshot of the source code(!), runtime trace(s), and saves them in a (mongoDB) database. I already have a semi-elaborate setup for running reproducible experiments (the details of which are too gory and shameful to provide), but this seems more pleasant and sane.
Tom Rainforth had a poster about Canonical Correlation Forests which I didn't actually get to look at (I was in the same poster session), but the gist I got from a chat in the pub is that they're better than random forests (my brain has a very aggressive compression algorithm, clearly). I'll need to read the paper. I have a picture of him explaining the poster to someone on the bus after the poster session, demonstrating that science never rests.
My friend Jean Maillard had a poster on Learning Adjective Meanings with a Tensor-Based Skip-Gram Model. This was by far the most similar to mine (my poster was also on distributional semantics), although this paper focuses moreso on language-modelling, by representing adjectives as matrices. I'm amused that Jean and I started off doing something entirely different (at the time, Part III in Mathematics at Cambridge, mostly flipping tables over quantum algorithms) and then converged (if only temporarily, for me) on something that is (at least by MLSS standards) somewhat obscure. Maybe there was something in the water at St. John's.

There were also lots of opportunities to talk non-science. On the last afternoon, a straw poll was conducted on the viability of human-level AI during our lifetime. The majority present (n ~ 10) felt it wasn't going to happen, which seems to go against popular opinion on the matter (at least judging by recent articles about the threat of such AIs). Maybe grad students are too pessimistic (optimistic?) a group, or we succumbed to our small sample size. The poll wasn't even conducted in secret.

Another outcome of the MLSS is that I reaffirmed my desire to write a Gaussian Processes for Biologists tutorial. Biologist here really means 'anyone lacking a strong mathematical background' (hopefully in the future it will be offensive for me to use biologist as a proxy for that). I'd originally planned to do this after the GPSS last year, partially out of GP evangelism (all the people doing simple linear regression could be doing GP regression!) and partially to deepen my own understanding (one learns much through teaching), but progress stalled due to lack of interest. Interest is briefly re-ignited, so maybe I'll actually do it this time[citation needed].

quotes in awk

2015-05-27T00:00:00+01:00

Lazily trying to paste a long list of strings into python, which means I need things wrapped in quotes (and commas, but I'm excluding them from this example cause they're easy). Grabbing the strings from a file using awk, but since quotes (apostrophes) are special in awk this messes things up a little. The way to do it is either

awk '{ print "'"'"'"$F"'"'"'" }' file.txt

(I kid you not), or

awk '{ print "\047"$F }' file.txt

where $F is whatever field of file.txt is interesting.

I think the second method might be a bit nicer.

density plots without outlines in ggplot2

2015-05-20T13:37:00+01:00

It is not:

anything in the aes of the ggplot call
color=FALSE
color=NULL

It is:

color=NA in the geom_density call

e.g.

ggplot(data, aes(x=value, fill=grouping))+geom_density(<b>color=NA</b>)

Example (this is some real data I'm currently working with, but I've changed the labels so it's hopefully not meaningful):

Code:

ggplot(data, aes(x=variable, fill=switch)) +
geom_density(alpha=0.7, color=NA) +
facet_grid(case~condition) +
xlim(-3,3) + theme_bw() + 
scale_fill_manual(values=c("darkorchid2","darkturquoise")) +
ggtitle("density plots with no outline")

just the tips

2015-05-20T00:00:00+01:00

For fear of turning this blog into a Chinese Restaurant Process I've spawned a new category, called tips. This category is the closest to my original vision for a PhD-blog, where I would write down useful things as I learned them. It turns out that's not very sustainable (PhD involves a lot of learning). Every time I went to write a post I found it difficult to resist the urge to make it very comprehensive and pedagogical, which ended up in me carefully studying man pages.

The tips category will contain posts that are unashamedly brief and perhaps shamefully trivial. If I have a simple problem that took me too long (like over 5 minutes) to solve, I'll put my solution in tips. As ever, making hard categorical assignments to soft things like blog posts containing multiple topics is a fool's errand, but I'll deal with it as it comes. Perhaps I should be using LDA, or like, tags or something.

pgp 101

2015-04-28T13:37:00+01:00

I gave an 'Introduction to PGP'-type talk/tutorial at CryptoHarlem last night. PGP can be a little confusing and it gets a lot of criticism for being unusable, so I tried to focus on the higher-level aspects, like how it works and why you want that. I fear it ended up being a little too theoretical, but my rationale was that a person who understands how PGP achieves what it does will find it less confusing overall. That said, I have always fallen on the 'theory then application' side of the fence when this comes up, so I admit my bias.

I resisted the temptation to make it insufferably long and all-encompassing, so an observer may note that many details have been glossed over. My plan is to incrementally improve it until I can't stand it, so feedback is greatly appreciated. I'm also thinking to make a pgp_102 which covers things like subkeys, key-signing, and anything else that seems advanced-but-actually-useful.

Link to the slides

I think it also bears mentioning that the entire presentation was an excuse to personally indulge in The Noun Project.

And unrelatedly...

aligning vectors (animation)

2015-04-09T13:37:00+01:00

I'm working on an embedding procedure for placing things into vector spaces. Things which might not normally live in a vector space (although in a sense we all live in a Lorentzian manifold). The basic idea is to take a model, present a bunch of examples of these objects relating to each other (outside of any notion of a metric), and then it figures out where to put them. (In the event of me getting my model to do anything actually useful, I will provide excessive technical detail, but that's not the objective here). This evening I remembered that 2 dimensions are very easy to visualise, so I made an animation of how the objects move in the space as training progresses (number of training examples is depicted in the plot title).

The example here is exceedingly trivial - five objects, two pairs of which are designed (by way of engineering the training data) to end up together and apart (colour-coded), and one loner who goes wherever. So they're not so much learning to agree as learning to jealously cling to their partner.

Look at them twitch! Stochastic gradient descent in action. I'm tempted to make more of these with different learning rates and see how badly I can get it to break, but I foolishly started doing this too late, so I'll save it for next time. (Also next time: evolving energy surfaces).

lá fhéile pádraig

2015-03-17T00:00:00+00:00

Today is St. Patrick's day, which means there's a lot of this online:

Lá Féile Pádraig shona duit

or this:

Lá Fhéile Phádraig sona duit

or basically any variation of:

Lá F(h)éile P(h)ádraig s(h)ona duit

which all roughly mean 'Happy St. Patrick's Day' in Irish, but some of which are grammatically incorrect. (Note: I'm not going to get into pronunciation here, but for the curious, here's a video.)

When I was learning Irish in school this was generally the trouble. The meaning is clear most of the time, but where or when to put the séimhiú (lenition - those 'h's above), the urú (eclipsis), or the additional 'i's (slendering) was never particularly clear. I got through Irish on the vague feeling about what was correct, mostly based off the sound of the thing[1].

On some recent trip home to Dublin I bought a book on Irish grammar in some optimistic hope of fixing the situation. Inside this book must be contained a set of concise and logical rules, the mastery of which will enable me to unambiguously resolve problems like the above. (hahahaha)

Let's break the phrase down. Translated literally (and with caveats)

lá fhéile Pádraig -> the feast day of Patrick
X shona duit -> happy X to you

The second one is a bit easier so we can get that out of the way.

'Sona' means 'happy', and as an adjective it is modified to agree with its noun. As a member of the third declension, 'sona' undergoes no change to its ending, but it does pick up a séimhiú (h) when it's paired with a feminine noun.

'Duit' is a prepositional pronoun, which in this case means

duit = do + tú = to + you = to you

the 'to' of 'do' is in the proposition sense (don't try to use it to form infinitives of verbs, or anything like that)[2].

So we really have

XY sona duit -> happy XY (masc) to you
XX shona duit -> happy XX (fem) to you

Now for the first part. This is where things go wrong for me.

'Lá' means 'day'. It is masculine.
'Féile' means 'feast'. It is feminine.

Tearma.ie seems to think that one makes 'feast day' with 'lá féile', which seems to agree with my grammar book. In the compound noun, 'féile' should go into the tuiseal ginideach (genitive case). As a fourth declension noun, this means it undergoes no change. The resultant word ('feast day') is masculine (the gender is taken from the first noun), so we'll follow it up with 'sona'.

However, tearma.ie also says St. Patrick's day is 'Lá Fhéile Pádraig', so something is going on here. Is the presence of the 'Pádraig' complicating the issue?

The answer that I've found for this is: yes. My grammar book is silent on the topic of this slightly-advanced compound noun case. According to Nua Leargais, when two genitives come together (as we would naïvely try for 'féile' and 'Pádraig' after 'lá'), we encounter the functional genitive. In this case:

The first word is in the 'functional genitive', and its form is that of a lenited, nominative form. So this gives us 'fhéile', even though the genitive of 'féile' is 'féile', as I mentioned before.
The second word is in the genitive. The question is then...

What the hell is the genitive of Pádraig?

The answer seems to be 'it depends'. Situations like this are why I end up feeling like 'Phádraig' and 'Pádraig' are equally valid and get confused. It seems like:

In general, the genitive of 'Pádraig' is 'Phádraig' (so 'Patrick's house' is 'teach Phádraig'). This is supported by tearma.ie. However, in special cases it doesn't change, and those cases include 'after féile' and 'after naomh'[3].

So to summarise, if you want to say 'Happy St. Patrick's Day' in Irish, and be (probably) grammatically correct about it, it's:

Lá Fhéile Pádraig sona duit!

And now you know.

[1] This turned out to be sufficient to get an A1 in the exam (somehow), but I'll never feel capable of teaching anyone the language if I can't explain why things are how they are.

[2] We learned the prepositional pronouns using songs and hand gestures in school, so they're indelibly seared in my memory. Here's the full list of the 'do' ones:

dom = to me
duit = to you
dó = to him/it
di = to her/it
dúinn = to us
daoibh = to you (plural)
dóibh = to them

And here's a page with some more. You might notice that 'do' means 'to/for' there, which is true, but that is beyond the scope of this post.

[3] 'Naomh' means 'saint', so if you, like me, went to a school named after St. Patrick and grew up hearing 'Naomh Pádraig' all the time, you might be forgiven in thinking that's its genitive form.

Here are some pages I found useful/interesting while trying to resolve this problem:

temporal anomaly

2015-03-16T00:00:00+00:00

I'm gradually migrating old posts over from my WordPress blog, which will be backdated, ruining the timestamps-through-commits system that could have been, but never was. Hopefully the validity of the alleged timestamps on these posts will never be important for anything.

Comparing the new post[1] with the old also reminds me that Wordpress has competent web designers, words which cannot be applied to me. However, my blog has the advantage of working just as well(badly) without JavaScript, which is part of my decision to design it with Tor Browser in mind.

[1] At the time of writing this I had just migrated "updating shared variables in theano".

databases all the way down

2015-02-12T14:20:00+00:00

I'm (arguably) a data scientist, so I need data to do science. A problem with data is that it's all over the place, a natural and unsurprising consequence of its many origins. Producing databases seems to be a hobby for bioinformaticians, which also includes databases of databases (Bolser et al., 2012), so I think it's fair to say some nontrivial effort is expended in trying to deal with this data problem. Large collaborations are pretty good for creating big datasets (e.g. ENCODE, TCGA), which can lessen the appearance of scattered data (or at least, heighten the attractiveness of centralised data), but these efforts are less about data curation and more about data generation. One notable effort towards more efficient (genomic) data discovery/sharing is the Global Alliance for Genomics and Health, which is working on defining standards, data formats, APIs, things like that. It's not the only project thinking about APIs and formats (of course). Skimming my notes from the Biological Data Science meeting at CSHL last year, I see: Open Science Data Framework, the NIH's bioCADDIE, Ensembl's REST API, and so on[1]. For the sake of efficiency I hope the community can come to some consensus on how best to store/index data and metadata, although a quote from Richard Durbin comes to mind[2],

"In science, always there are lots of people looking at the same thing in different ways. There are people trying out all sorts of crazy things. It's extremely successful to not have top-down control. It can look a little bit redundant when you have a person write yet another read mapper, but sometimes things will be influential. New ideas will come. Sometimes things can be relevant to individual projects. I think for sure things are done inefficiently. I accept that. It's a bit like evolution. Random mutation and testing is very powerful." The anatomy of successful computational biology software (Nature Biotech, 2013)

Which is to say I'm excessively justifying my decision to create yet another list of resources (but just a list, I have no intention of actually serving any data). In this case it's something I wanted to do for myself anyway, and more importantly I'm starting a new category of non-blog pages on my site (although technically the contact page was first). The idea is to separate pages which I feel are more time-insensitive (like tutorials), or which I intend to keep updated (like my contact details), and treat things on the blog as an unmodifiable record that will likely become outdated. So here is my dataset database. It covers 'things which are relevant to me', which you might find useful, if you're me. It might also be useful for people using convoluted methods to infer my research interests.

[1] It's reasonably likely these are not equivalent projects. I'm not really familiar with any of them, so YMMV.

[2] Duplication of effort is something I think about a lot in the context of bioinformatics/computational biology/computers. This quote makes me feel better about it.

into the gui pond

2015-02-03T13:37:00+00:00

In my previous post about getting Pond running on Yosemite, I ran into an issue with the GUI. The CLI interface seems to be fully-functional and pleasant enough for me, but GUI-errors are no guid. So yesterday I tried to figure out what the problem was, and somehow fixed it in the process. Naturally I was not keeping detailed logs.

It went something like this.

gtk3

First, I made sure I could run something else using GTK, so:

> gtk3-demo

This technically ran, but produced a lot of warnings and errors, e.g.

(gtk3-demo:99098): Gtk-WARNING **: Error loading theme icon 'image-missing' for stock: Icon 'image-missing' not present in theme
(gtk3-demo:99098): GLib-GObject-CRITICAL **: g_object_ref: assertion 'G_IS_OBJECT (object)' failed

which is possibly a related problem to the one I had with Pond, but given Pond now works for me, and I still get these errors, I suspect they have different origins/are not directly causally related.

becoming one with the error

So I tried to understand the actual error I was getting (from this log):

panic: (*gdkpixbuf._Ctype_struct__GError) (0x4383380,0x5c6be20)

This involved following the golden braid deeper into the inner workings of go and gdk-pixbuf, in the hopes that something would start to make sense.

Clarity was not forthcoming. I got to look at non-scientific code for a while, and I decided gdk-pixbuf was to blame, but otherwise learned little. This isn't terribly surprising given my lack of knowledge of Go/gtk/OSX/etc. I tried. At least I had a witch (gdk-pixbuf) to burn.

fun with reinstalling everything repeatedly

I thought about trying to test gdkpixbuf.PixbufLoaderWithType("png") on its own, but I was lazy and foolish, so I just tried reinstalling gdk-pixbuf instead.

brew install gdk-pixbuf
/usr/local/Cellar/gdk-pixbuf/2.30.8/bin/gdk-pixbuf-query-loaders --update-cache

I think this is when I introduced a new problem, thought 'damn, OK I'll just fix this problem and get back to carefully recording how I fix this GUI error', and in the process fixed the GUI. Naturally. The problem I introduced looked like this:

> client

dyld: Library not loaded: /usr/local/lib/libgdk_pixbuf-2.0.0.dylib
  Referenced from: [$GOPATH]/bin/client
    Reason: Incompatible library version: client requires version 3101.0.0 or later, but libgdk_pixbuf-2.0.0.dylib provides version 3001.0.0
    Trace/BPT trap: 5

Which looks suspiciously like a version got messed up, maybe because I installed gdk-pixbuf individually, maybe because of magic. So I then uninstalled it, reinstalled gtk+3 (which pulled gdk-pixbuf down first; error persisted), reinstalled gtkspell3, all to no avail. Then I did a brew cleanup and it deleted some old versions and cached bottles. Determined to somehow update my gdk-pixbuf (naively believing that the newest version of a thing must be the optimal one), I explored:

> brew info gdk-pixbuf

gdk-pixbuf: stable 2.30.8 (bottled)
http://gtk.org
/usr/local/Cellar/gdk-pixbuf/2.30.8 (209 files, 4.3M) *
  Poured from bottle
From: https://github.com/Homebrew/homebrew/blob/master/Library/Formula/gdk-pixbuf.rb
==> Dependencies
Build: xz ✘, pkg-config ✔
Required: glib ✔, jpeg ✔, libtiff ✔, libpng ✔, gobject-introspection ✔
==> Options
--universal
    Build a universal binary
==> Caveats
Programs that require this module need to set the environment variable
  export GDK_PIXBUF_MODULEDIR="/usr/local/lib/gdk-pixbuf-2.0/2.10.0/loaders"
If you need to manually update the query loader cache, set GDK_PIXBUF_MODULEDIR then run
  /usr/local/Cellar/gdk-pixbuf/2.30.8/bin/gdk-pixbuf-query-loaders --update-cache

maximum `gdk-pixbuf` installation

I don't feel proud of this, but I liked the sound of a 'universal binary' so I went for that.

> brew reinstall --universal gdk-pixbuf

==> Installing dependencies for gdk-pixbuf: sqlite, gdbm, makedepend, openssl, python, xz, gettext, glib, jpeg, libtiff, libpng, gobject-intros

. . . But then it was taking too long, so I cancelled it during gettext. I'm so, so sorry.

The next event I have recorded in my logs was me uninstalling gdk-pixbuf and gtk+3, then reinstalling gtk+3. All for good measure. I then tried to cleanly reinstall pond, so

> cd $GOPATH
> rm -r *
> go get github.com/agl/pond/client

. . . and . .

> client
Feb  1 02:03:50: Starting fetch from home server

It Just Worked.

I was hoping the dylib problem would go away and I could return to my GUI issues, but nope. All fixed. I tried uninstalling things (gtk+3, go, gtkspell3, mercurial, gdk-pixbuf, pond) in an attempt to recreate the original (or subsequent) issue(s), but it refused to be broken. I have a list of all the dependencies I installed during this fiasco, so I could in theory spend more time trying to break it again, or I could not.

at the end of the magical vortex, a thousand bald yaks

Incidentally, after all of this, it transpires that that One Weird Trick:

> gdk-pixbuf-query-loaders > $GOPATH/lib/gdk-pixbuf-2.0/2.10.0/loaders.cache

is no longer required for client to run. So that's fewer files and more GUIs, what could be better?

setting up pond on yosemite

2015-01-30T00:00:00+00:00

This is about Pond. If you have no idea what that is but don't want to click on the link, I shall quote directly;

Pond is not email. Pond is forward secure, asynchronous messaging for the discerning. Pond messages are asynchronous, but are not a record; they expire automatically a week after they are received. Pond seeks to prevent leaking traffic information against everyone except a global passive attacker.

If that seems interesting, go back and click on the link. Otherwise get out while you still can.

My goal here is to get Pond running. I unfortunately don't have any Pond secrets to hand right now, so testing its messaging functionality will have to come later. The instructions on the main Pond page for OSX are fairly good, so this is just a slight elaboration/modification on those. This is going to look longer than it should because I haven't solved formatting code snippets on this blog yet.

System

I'm using a ~2012 Macbook Pro 13inch Retina blah blah, running OSX Yosemite (version 10.10.1). It's a fairly recent install, so I haven't had time to ruin everything yet. That said, I have Tor Browser installed and running. Obviously.

tl;dr

GUI is b0rked, CLI seems to work. From the site, for a CLI-version:

> brew install go
> export GOPATH=$HOME/gopkg
> export PATH=$PATH:$GOPATH/bin
> go get -tags nogui github.com/agl/pond/client
> alias pond="$GOPATH/bin/client"

But now for how I spent my Friday evening:

Dependencies

Go

> brew install go

Painless success, version: go1.4.1 darwin/amd64. Then make a folder for Go packages,

> mkdir $HOME/gopkg

Export some environment variables to keep Go happy, (I also added these to my .bash_profile)

> export GOPATH=$HOME/gopkg
> export PATH=$PATH:$GOPATH/bin

One more for good measure,

> export PKG_CONFIG_PATH=/opt/X11/lib/pkgconfig:/usr/local/lib/pkgconfig:$PKG_CONFIG_PATH

gtk+3

(These other dependencies are probably a bit pointless until I figure out how to get the GUI to reliably work, but for the sake of completeness they are here.)

> brew install gtk+3

It complains that it needs XQuartz 2.5, so that is resolved (as it suggests) with

> brew install Caskroom/cask/xquartz

gtkspell3

> brew install gtkspell3

This installed a whole pile of dependencies (including, concerningly, Python: the number of Pythons I have installed grows by the day), and my laptop crashed shortly after they finished, but I think that was random.

mercurial

> brew install mercurial

Pond

> go get github.com/agl/pond/client

It produces lots of warnings, but seemingly no errors. And who reads warnings? Since it's already in my path (see above), running it is simply (I think this could do with being more descriptive):

> client

Disaster strikes.

>Dynamic session lookup supported but failed: launchd did not provide a socket path, verify that org.freedesktop.dbus-session.plist is loaded!

and

>(<unknown>:917): GdkPixbuf-WARNING **: Cannot open pixbuf loader module file '[$GOPATH]/bin/../lib/gdk-pixbuf-2.0/2.10.0/loaders.cache': No such file or directory

>This likely means that your installation is broken.
>Try running the command gdk-pixbuf-query-loaders > [$GOPATH]/bin/../lib/gdk-pixbuf-2.0/2.10.0/loaders.cache to make things work again for the time being.

(I'm using rectangular brackets to denote that I manually replaced its output with the logical meaning, so you don't have to look at my home directory path :)

dbus

First problem is where in the hell is org.freedesktop.dbus-session.plist?

(it's not in in /Library/LaunchAgents or /Library/LaunchDaemons, or ~/Library/LaunchAgents)

The answer, if you brew reinstall dbus to find out (or use some other kind of sorcery) is:

To have launchd start d-bus at login:

ln -sfv /usr/local/opt/d-bus/*.plist ~/Library/LaunchAgents

That's where it is.

Then to load d-bus now:

launchctl load ~/Library/LaunchAgents/org.freedesktop.dbus-session.plist

pixbuf

So the folder it's expecting (lib/gdk-pixbuf...) is certainly not found at [$GOPATH]/bin/..., since that's just $GOPATH.

By running gdk-pixbuf-query-loaders I can see it's probably expecting this path or something:

/usr/local/Cellar/gdk-pixbuf/2.31.2/lib/gdk-pixbuf-2.0/2.10.0/loaders/

Which would only be appropriate if $GOPATH=/usr/local/Cellar/gdk-pixbuf/2.31.2/, which it is not by construction. I can get around this as it suggests by creating a bunch of empty folders and then running

> gdk-pixbuf-query-loaders > $GOPATH/lib/gdk-pixbuf-2.0/2.10.0/loaders.cache

Which gets rid of the warnings. Now when I run

> client

I get no (immediate) errors! Success! Maybe!

GUI-sadness

Empowered by the graphics, I set a password and create an account on the default server.

. . . and it panics. That is to say, the GUI vanishes (although X11 is still running) and I get a lot of errors. Not only that, but now if I try to run it again,

> client
    Jan 30 23:46:58: Fatal error: state file is too small to be valid

Shit.

Maybe I just need the gentle embrace of the command line?

> client -cli=true

>>> Pond...
>>> >>> state file is too small to be valid

Abort, abort!

> rm $HOME/.pond
> client -cli=true

And now, everything is beautiful and nothing hurts.

first post

2015-01-26T00:00:00+00:00

Does anyone remember their first website? I know I was a nine-year-old with Microsoft Word and "Save as HTML", which seems like a decent reason to forget it, even if it was Pokemon-themed. Given my complete lack of gainful employment at the time, I took to website-making as if it were important, as if its utility would not soon be eclipsed by the simplicity afforded by services like Livejournal and Bebo[1]. I spent many hours refining and redefining my vision, a vision comprised mostly of iframes.

Here comes one now: The Only Thing We Know About Cyberspace Is That Its 640x480 (Olia Lialina @31C3)

This talk is largely about the GeoCities era of personal websites, which was already on the way out when I was getting started, but things were different in Ireland anyway. We had 'Ireland On-Line', which is now a five-line Wikipedia article, a legacy webmail service and a mausoleum of personal pages and auto-redirects.

Selected snippets:

optimised for Netscape 4.70 [source]

What's cool: Gameboy, playstation, me, Doom. [source]

While as yet, SnoopDos exists only on the Amiga, I'm happy to report that similar utilities have sprung up for DOS, Windows 3.1, Windows 95, and Windows NT. [source]

Divorce, in the Church's view, threatens to deconstruct the primary unit upon which the model for the Church is built. The battle for the retention of the form of the traditional family becomes the battle for the retention of a particular model of church organisation. This is why the debate is so fiercely fought.[2] [source]

"'Yo motherfucker,' Marx greeted. Proudhon was at a loss." [source] (somehow related to previous one)

The genesis of a new British police force can be seen in Mowlam's proposal to introduce the subtitle 'Northern Ireland Police Service' (NIPS) to the RUC.[3] [source]

Cutting that unexpectedly-engrossing tangent short, I'll clarify that this site is not a nostalgia trip. I will be keeping the animated gifs to a minimum (unless they come from David Whyte) and attempting to adhere to sane design principles. The talk I linked planted a seed of sorts in me, though. I can't argue that it caused this site, because it's been in the works for a while, but it catalysed me.

The seed is roughly this: In a time when one's internet presence was a deliberate act rather than a social necessity, making a website and appearing on cyberspace[4] was preceded by a question of surprisingly existential nature:

"What will I make my website about? What do I have to say?"

I believe that one's web presence today has similar associated questions (notably "what image do I wish to portray?"), but thanks to the existence of a set of well-established social norms, a person can largely ignore them. It is entirely possible to exist online without asking any existentalist questions. This is arguably great, because the internet is fantastic and should not be restricted to a set of people who find it necessary to frame their actions as manifestations of the struggle for self-definition, and/or people who know HTML.

I think I fall somewhere in the 'and' category.

So I'm making a website for a few reasons, which are secretly the same reason[5]. I like thinking about what I have to say. I think questions of that nature are interesting and can be quite personally fulfilling, even if the answer turns out to be 'mostly nothing'. I also like thinking about how to say things, which is part of the reason this entry has taken me far too long to finish[6].

There is also the careful deliberateness which goes into creating a website like this. My CSS was completely nonexistent when I started this a few weeks ago. I'd recommend against reading the source code if you are sensitive about CSS, because terrible, likely-forbidden things have happened and I am remorseless. Things will only get worse as I enact my weird design plans on other parts of the site[7]. I cannot wait.

So yeah, I could run a blog on an existing service, but where would be the fun in that?

[1] Bebo became extremely popular in Ireland when I was about 15, and suddenly using the internet became socially acceptable.

[2] I think this is particularly interesting, because Ireland finds itself (a mere 20 years later) on the precipice of another historic referendum (this time to legalise gay marriage), but the church makes the same old arguments. Society marches ever onwards, and the church is immobile.

[3] They went with PSNI in the end, presumably in deference to NIPS.

[4] Before it became infested with cybercriminals and c y b e r w a r f a r e.

[5] Proof left as exercise to the reader.

[6] Also fighting a losing battle against pre-ironic post-irony creeping into my tone.

[7] I intend to keep the blog as minimalist as is palatable.

updating shared variables in theano

2014-09-30T00:00:00+01:00

Background: I am running python with Theano on a GPU, and I care about speed.

Scenario: I have a largeish matrix (C) which is stored as a shared variable, and I need to update a subset of the rows (modified_rows) by some other matrix (C_delta). What should I do?

Initialising, e.g.:

    import numpy as np  
    from theano import function, shared  
    from theano.tensor import fmatrix, ivector, set_subtensor  
    C = shared(np.random.normal(size=(70000, 100)))  
    modified_rows = np.random.randint(low=0, high=70000, size=200)  
    C_delta = np.random.normal(size=(len(modified_rows), 100))  
    C_d = fmatrix('C_delta')  
    mod_rows =ivector('modified_rows')

Slow method: manually reset the values:

    C_temp = C.get_value()
    C_temp[modified_rows, :] = C_temp[modified_rows, :] + C_delta
    C.set_value(C_temp)

Speed:

    32 function calls in 0.055 seconds
    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    ...
    1 0.000 0.000 0.026 0.026 sharedvalue.py:100(set_value)
    1 0.000 0.000 0.027 0.027 sharedvalue.py:80(get_value)

This is bad because it requires unpacking and repacking the value in the shared variable (via get_value and set_value). We only need to modify a small number of the rows (200 out of 70k) so having to update every single one seems extremely wasteful.

Part of the 'nice thing' about shared variables is that they can be updated by functions which use them, so you might try:

    update_C = function([C_d, mod_rows], [],
                        updates=[(C[mod_rows, :], C[mod_rows, :] + C_d)], 
                        allow_input_downcast=True)

(remember, C_d and mod_rows are symbolic variables (specifically fmatrix and ivector) defined above). The allow_input_downcast=True will deal with numpy's love of dealing in double-precision floats, which Theano rejects for GPU work. This loss of precision may be important to you.

So then a simple call to update_C(C_delta, modified_rows) will do what you want, except that what I just wrote won't work. You can't update shared variables like that. I think it's because the first element of the tuple is not really the shared variable, so Theano freaks out. (Full disclosure: little idea of theano's inner workings.)

Focusing solely on the updates=[...] part (everything else should be OK), you need to do:

    updates = [(C, set_subtensor(C[modified_rows], C_delta))]

So the full command (if you are lazily copying and pasting this into iPython to test speed):

    update_C = function([C_d, mod_rows], [],  
                        updates=[(C, set_subtensor(C[mod_rows, :] + C_d))],  
                        allow_input_downcast=True)

Things which won't work (for reasons unknown to me):

    C[modified_rows] -> C[modified_rows, :]
    C_delta -> C_delta[:, :]

As for the speed for this method: well,

    28 function calls in 0.001 seconds
    ncalls tottime percall cumtime percall filename:lineno(function)
    ...
    1 0.001 0.001 0.001 0.001 subtensor.py:1644(perform)

I think that solves the problem.

Possibly relevant technical information:
Theano version is 0.6.0.
Numpy version is 1.8.2, using Intel's Math Kernel Library (MKL) as part of Anaconda.
GPU is a GeForce GTX 680.

Note: this post first appeared on my wordpress blog.

deletion and replacement of strings in bash

2013-12-24T00:00:00+00:00

I try to record useful one-liners for future reference. I forgot to write down what this one does:

    mv $f ${f##START}.${f%%${f##START}}

This awkwardly-timed (time-zone troubles) blog post is atonement for my carelessness. Like all one-liners it looks complicated but is pretty simple. It does this:

    > f=STARTonetwothree 
    > echo ${f##START}.${f%%${f##START}}
    onetwothree.START

The ${...} construction is brace expansion, which allows us to generate strings. In stages we have

    ${f##START}

This takes the string f and deletes the substring START from the beginning. In fact, it deletes the longest match to the substring, which makes sense when you allow your substring to be something interesting including wildcards (which may have numerous matches). A single # would delete the shortest match: try

    > echo ${f#S*T}
    ARTonetwothree

    > echo ${f##S*T}
    onetwothree

To delete the substring from the end of the string, we would use %% or %, where once again two means longest and one means shortest.

So my horrible code fragment creates f' by removing START from the start of f, then creates f'' by removing f' from the end off, combines these with a period in the middle and renames the file named f with this new string-aberration. An easier way to achieve the same result would have been

   mv $f ${f##START}.START

but this does not generalise to the case of an arbitrary substring.

The obvious application of this is removing/modifying file extensions. To rename all .tgz files as .tar.gz, for example:

    for f in *.tgz do mv $f ${f%%tgz}tar.gz done

If the offending substring is not at the beginning or end, you could use replacement:

    > f=all_workERROR_and_ERRORno_playERROR
    > echo ${f//ERROR/}
    all_work_and_no_play

The syntax is ${string/substring/newsubstring}, where a double first slash (as in previous example) replaces all instances of substring. We can also do partial matching

    > echo ${f/E*R/}
    all_work

and of course replacement with something other than an empty string:

    > fp=${f/E*R/_and_all_play} 
    > echo ${fp/all/no}
    no_work_and_all_play

So if you have a bunch of messy gzipped files like

    data.gz.modified.gz.why.gz.did.gz.i.do.this.gz

the solution is

    for f in *.gz do mv $f ${f//.gz/}.gz done

and to wonder how you got into that mess in the first place.

putting a line in a filename (with sed)

2013-08-06T00:00:00+01:00

"How can I cut a line from a file and paste the rest into a file whose title is the line I just cut?"

If you find yourself asking yourself this question with any degree of regularity, you may have issues. Luckily for you, the help you so desperately need is at hand. It is not the help you want, but it is the help you deserve. For added complication let's assume you want to do this for every line in the file.

Solution (bash):

    > for i in $(seq `wc -l < FILENAME`)
    > do
    >   sed ''$i'd' FILENAME > something_else_maybe-`sed -n ''$i'p' FILENAME`
    > done

What is going on here is the following:

seq N prints a sequence of integers, 1 to N.
wc -l < FILENAME produces the number of lines in FILENAME. The normal way I do this is wc -l FILENAME, but that also prints the name of the file, which would confuse seq.
Enclosing a bash command in ` ` (note these are not ' or ", these are backticks (also known as grave accents (bracket nesting))) replaces the command string with the output of the command. $() also does this. Why are there two ways to do this, and why did I use both of them in my solution? We may never know.
sed 'Md' FILENAME, as per the rules of sed, has FILENAME as output, having deleted line M (in our case, '$i').
sed -n 'Qp' FILENAME runs through FILENAME, printing nothing (the -n) flag unless otherwise instructed (with p), as occurs for line Q.
The "something_else_maybe" just demonstrates that you could include other elements in the output filename. Further complication could be introduced here (say replacing it with $i or whatever you want), but that is too far. Too damn far.

learning hangul(한글)

2013-08-02T00:00:00+01:00

The Korean alphabet (Hangul) is - so far - my favourite writing system. It is logical and efficient. It pleases my sense of style. Since starting this post over a month ago I took up learning Mandarin so my feelings towards Hanzi are liable to threaten Hangul's dominance in the future, but for now I side with space-robot alphabet. Because that's what Hangul is.

At first glance one may assume that Hangul consists of logograms - characters representing words rather than phonemes, but this is not the case. The alphabet is very much phonetic. Each "block" is a single syllable, so for example Hangul(한글) is Han(한)+gul(글).

Since syllables are made of phonemes, it is not surprising that the blocks consist of sub-components representing these phonemes. (It was surprising the first time I learned of this, because such an elegant solution to written language had not occurred to me - though upon further reflection, the trick is just "writing words more compactly" so it's not as novel as it is aesthetically pleasing.) Some insane person wrote a Wikipedia page documenting every possible syllabic block in Korean, so all you need to do is memorise all ten thousand of these (give or take a few thousand) and reading Korean will become trivial. End of post. If this idea is appealing to you, I might suggest going to Cambridge to do Part III of the Mathematical Tripos.

The more elegant solution is to learn the alphabet. Each letter is called a "jamo", but they only occur inside blocks, sort of like quarks. Unlike quarks, we can still look at them individually. I'll include the IPA in [], and a 'translation' of IPA into my accent (mileage may vary). For pronunciation purposes, text is no replacement for audio, so I would suggest finding some videos, like this one, for example.

Simple vowels:

Simple vowels are made of horizontal or vertical lines and short strokes.

ㅣ [i] ("ee" in "tree")
ㅏ [a] ("a" in "mad")
ㅓ [ʌ] ("u" in "mud")

ㅡ [ɯ] (somewhere between the "oo" in "cool" and the "eu" in "eugh" - I have a really hard time differentiating this from ㅜ)
ㅗ [o] ("o" in "bowl")
ㅜ [u] ("oo" in "too")

Complex vowels:

Combinations of simple vowels (including diphthongs). I'm not going to include all combinations because many of them are self-evident given the simple vowels.

These ones are less obvious:

ㅐ [ɛ] ("e" in "bed")
ㅔ [e] ("e" in "grey")

Generally, ㅗ or ㅜ combined with another vowel gives a "w-" sound, so for example ㅘ is "wah", ㅙ is "weh", and ㅟ is "wee".

There's no letter for "y" in Korean, so if you want to "y" up a vowel, double up on short strokes (I believe this process is called 'iotation'. You can do something similar in Slavic languages with ь - Cyrillic comes a close second in the space-robot race.) You get:

ㅑ [ja] ("yah")
ㅕ [jʌ] ("yuh")
ㅛ [jo] ("yoh")
ㅠ [ju] ("yoo")

We can extend this to the complex vowels, to get ㅒ for "yeh" and ㅖ for a slightly different "yeh".

Consonants:

Syllables are usually a consonant-vowel sandwich, so consonants can be "initial", "medial", or "final" (I'll write [i/m/f]), and the placement makes a (small) difference to the pronunciation of the letter.

ㄱ [k/g/k̚] ("k" as in "Kant", "g" as in "gravity", "k̚" as in "quark")
ㄴ [n/n/n] ("n" as in "neutron")
ㄷ [t/d/t̚] ("t" as in "tachyon", "d" as in "down", "t̚" as in "cat")
ㅅ [s/s/t̚] ("s" as in "strange")
ㅁ [m/m/m] ("m" as in "mass")
ㅂ [p/b/p̚] ("p" as in "point", "b" as in "baryon", "p̚" as in "top")
ㅇ [-/ŋ/ŋ] (This is just a silent placeholder in the initial position. In all others it's "ng", as in "ping")
ㄹ [ɾ/ɾ/l] ("ɾ" as in "alveolar tap", a sound which is neither "r" nor "l")

Some consonants are obtained from others by aspiration. Aspiration is basically just adding air to the sound - so imagine trying to sneak a "h-" sound in after the consonant. In Hangul, the addition of a horizontal line seems to denote this aspiration, or a general 'softening' or alteration of the sound (in the case of the letter I like to think of as "j"). This produces:

ㄱ > ㅋ [kʰ/kʰ/k̚] ("kʰ" is an aspirated "k", oddly enough)
ㄷ > ㅌ [tʰ/tʰ/t̚]
ㅅ > ㅈ [tɕ/dʑ/t̚] ("tɕ" as in "charm", "dʑ" as in "jam")
ㅈ > ㅊ [tɕʰ/tɕʰ/t̚] ("tɕʰ" as in "oh god send help")
ㅂ > ㅍ [pʰ/pʰ/p̚] ("pʰ" as in strangling noises)
ㅇ > ㅎ [h/ɦ/-] ("h" as in "hello", "ɦ" as in "cool whip")

Doubled letters:

There are also "double letters":
ㄲ, ㄸ, ㅃ, ㅆ, ㅉ
... which are "tense", so they're pronounced a bit like you're after spending the last hour reading articles about phonetics and just realised it's too late to watch Breaking Bad. "Damn it!" ~ "땀읻!"

I should stress that this entire post has very little to do with the Korean language. I don't know any Korean, but transliteration can be fun, and this article was largely about IPA. Trying to cram English into a foreign language really makes you appreciate phonetic differences.

치샔시챐파에대시추러...

피탤퍼이팰챜앧아퍀어퍀랟페펤... (curse you lack of "ɘ"!)

man cut and other simple yet useful unix bits

2013-07-22T00:00:00+01:00

Instead of just reading the man file, you could read this post about cut!

Printing columns ('fields') n to m (inclusive) from a file:

    cut -d [delimiter] -f n-m filename

Thus, removing the first n-1 fields from a file:

    cut -d [delimiter] -f n- filename

[delimiter] is automatically a tab space. You could also have ' ' (space), '`', ':', '-', '_'. Apparently 'HELLO' is not an acceptable delimiter, which is some kind of bug I guess.

If you just want a specific column, you could use awk:

    awk '{ print $n }' filename

Or do some fancier things like - say* we have a file containing a list of chromosome numbers and SNPids and some other information separated into columns, and we want to extract just the chromosomes and SNPids, rewriting '2' as 'chr02' etc. and including a tab space, we could write

    awk '{ if ($1<10) print "chr0" $1 "\t" $2; else print "chr" $1 "\t" $2 }' filename

The double-quotation marks are necessary here. In awk it's not that column numbering intentionally starts from 1 (note that chromosomes, which are in the first column are accessed via $1), but $0 contains the full line. So you could do

    awk '{ if ($1+$2 == 3) print $0; else print $1+$2,"is not 3" }' filename

if for some reason you wanted to pick out lines whose first two columns sum to three. If you try doing that and $2 or $1 don't contain something which could reasonably be added (e.g. in the SNPid example) awk will just give weird output and not realise the horrible things it's doing, so be careful with that.

Note the comma (eg in print $1+$2,"is not 3") just denotes a space. As per earlier, use "\t" to insert a tab.

You could do something similar to extract all the even or odd columns in a file by silencing those you don't want:

    awk '{ for(i=1;i<=NF;i+=2) $i="" }1' filename > evencols

No, the 1 is not a typo. It just tells awk to print every line. Now, this will produce some unwanted spaces between fields, so we can get rid of the with sed:

    sed "s/^ //;s/  / /g" evencols

The basic thing going on here is s/string_to_replace/with_this_string, separated by ; indicating a new command for sed. In the first one we're stripping a leading whitespace from each line - ^ indicates 'start of line', so we're replacing "white space at start" with "nothing". The second command is simply replacing double whitespace with single whitespace. I'm sure there are more rigorous ways to do this, but this worked for me.

What about finding things? Suppose I have a giant folder - how giant you say?

    ls . | wc -l

This just pipes the output of ls . into wc which, with the -l flag counts how many lines we have. The folder I'm looking at has 948 things in it, because I am organised like that. I want to find a file with 'wolf' in the title, so I can do

    ls -l . | grep 'wolf'

I inclued the -l flag on ls because I'm interested in things like the biggest file with wolf in its title. Supposing I had a worryingly large number of wolf-related files, I could get straight to the biggest one by piping more commands together:

    ls -l . | grep 'wolf' | sort -n | tail -1

sort outputs low to high, which is why we take the tail -1 one.

Now, let's suppose I don't know which subdirectory my wolf file is in. I could do

    find [directory] -name '*wolf*'

to find all files with 'wolf' anywhere in their title in the directory [directory] and all subdirectories of it. To search from the current directory, use . as [directory], etc. To only find wolf files over a certain size (say 1 MB) from the current directory, we have

    find [directory] -name '*wolf*' -size +1M

(use -1k to get wolf files under 1 kB) or to find all wolf files, sort them by size, and pick out the biggest one, we do

    find . -name 'gray_wolf*' -ls | sort -k5 -n | tail -1

The -ls flag tells find to give output in a sort of ls format. For me, the 5th column of this output is the file-size, so we sort based on this column (sort -k5), and the rest is the same as before.

*based on a real event