Weekly Head Voices #148: Data stylist.

Ridiculously fun trail in Paarl somewhere. (Photo taken by Trail Friend #1. Trail Friend #2 cropped from picture, because no permission to appear on the internets!)

This post covers the week from Monday July 9 to Sunday July 15.

The business part of my week was unfairly dominated by far too much after-work obsessing over programming languages, with which I seem to have an unhealthy (or perhaps not) obsession.

I will externalise some of these thoughts further down in this post.

I’m starting with a weekend / running update, which should be reasonably safe for non-nerds to read. However, after that, the nerd dial will go up to 11 with stuff about tools and programming languages right up to the end of the post.

I would have wanted to use the adjective “face-melting”, but I’m not sure if any intensity of nerdery could ever reach that level.

We can dream.

Weekend running update

Most fortunately the weekend had other plans and supplied us with at least 2.5 parties, the first of which even culminated in a ridiculously fun trail run in the mountains on the winter morning after.

The winter morning sun was just perfect, the company was great, and I had forgotten all forms of performance tracking devices at home.

Readers with bionic eyes might notice the Lunas on my feet.

I have now ran just over 260km in them, but, in a surprise twist to the regular readers of this blog, my biological equipment has still not yet completely adjusted to the new style of locomotion.

The latest victim seems to be one of Tom, Dick and Harry, the tendons running under the medial malleolus of my left foot, also known as that big knob on your inside ankle. Tom (the primary suspect in this case according to Trail Friend #1 who is knowledgable with regard to these matters, being a running foot surgeon and all), Dick and Harry are also known as the *T*ibialis posterior, flexor *D*igitorum longus and the flexor *H*allucis longus.

They currently have to work extra hard to stabilise my feet while running, because, you know, no shoes.

Because doing this thing was not hard enough already, and because the Lunas are perhaps still a bit too cushiony, and because my friend the Very Flat Cat forgot that I’m very suggestible after 11:00 in the morning when my prefrontal cortex takes the rest of the day off, I am now also the very shy owner of a pair of Xero Genesis running sandals:

Image result for xero genesis

The soles are only 5mm thick, and quite hard, being rated for a few thousand miles and all. The upshot of this is that one’s feet have to work even harder than in the Lunas.

My first run in these was amazing: I could feel my feet reacting to every little pebble, and my running style having to adapt even more to the terrain.

However, there was a price to pay for all of that additional terrain feel (and the fact that I took a much longer maiden run than I should have): The next day, the tendons in my feet felt even more (ab)used than usual.

WITH GREAT POWER COMES GREAT RESPONSIBILITY, it seems.

Due to these shoes being so powerful, I have had to resign to introducing Xero running far more gradually than I had initially thought.

Vacation-based-thinking-driven tool sharpening aka The WVV 2018 Data Science Toolbox(tm).

During the previously blogged-about Mpumalanga vacation, the lack of alarms, devices, and other work accoutrements, resulted in there being ample time for staring-into-space-grade thinking sessions.

During one of these thinking sessions, I realised that I had somehow neglected my data science toolbox for a while.

At some point a few years back, I was so into ipython notebooks (what has now become jupyter) that I used them as my main work lab notes modality.

However, in the meantime I had fallen slightly out of love with the computational notebook style of data programming, because I had begun to develop doubts about their role in the analysis pipeline.

interlude 1: jupyter notebooks are nice for initial data exploration, and they’re especially useful for remote computation with embedded graphics. However, that initial momentum of discovery risks devolving into an unwieldy monolith of code snippets, data transformations and experiments. There’s a fine line to be walked between flexible experimentation on the one hand, and version-controlled, time-stamped, permutational and scientific rigour on the other.

interlude 2: I have to apologise for using the term “data science” in a non-comedic context. In spite of the inherent humour, it has turned into a usable blanket term for computational data understanding.

Due to my growing doubts in the order of Jupyter, and due to being occupied with less traditionally data sciencey work projects, I had unfortunately let my data science toolbox gather perhaps a bit too much dust.

Slightly more worrying than falling out of love with the Jupyter Notebooks (I still like them, I’m just not that madly in love anymore), was the more specific issue that I’d even let the datavis parts get a bit dusty.

Anyways.

Although I should probably write a more complete post about this, here is the list of ingredients of the official 2018 WHV Data Science Toolbox(tm):

Programming language and library ecosystem: Python.

This language, in spite of its shortcomings, dominates the data science / machine learning world thanks to its STELLAR ecosystem.

numpy, pandas, scipy, scikit-*, tensorflow, pytorch, keras, cython… this snowball has turned into a pretty sizeable planet.

For this reason, it would be hard to justify any other choice for data science.

However, since I’ve been seeing more of Lisp and the rest of the ever-expanding programming language landscape, I can see (Python’s shortcomings as a programming language) clearly now.

In terms of interactive programming, Python beats the majority of practical programming languages, with Common Lisp being one notable exception.

However, it’s not functional enough, which engenders unnecessarily imperative, side-effecting code.  More specifically, it’s not expression-oriented.

More about this slightly further down. Maybe.

Datavis: Anything, as long as it’s Vega or Vega-Lite.

I spent a few years of my life wrangling d3.js, down to INNARD-LEVEL.

Mike Bostock’s idea of data-element-joins is genius, and internalising it was intellectually satisfying.

I thought that these d3 skillz would serve me well for decades (that’s WEEKS in javascript-time), but it turns out that there’s a new, even smarter kid in town.

(if it’s any consolation, the new kid can be considered the grand-child of d3.js.)

vega and vega-lite are so-called visualization grammars, or visualization DSLs (domain specific languages).

The upshot is that one codes up a chart, or a whole set of linked charts and their interactive behaviour, using a language that was designed for this purpose.

This chart code can be easily shared, or converted into interactive visual representations that can be embedded in applications, online or in print quality documents.

Genius!

With Altair, you can even send your pandas dataframes to vega and vega-lite charts all from the comfort of your slightly defective Python armchair.

Development Environment: PyCharm.

You knew it was not going to be Jupyter Notebooks, but you probably expected it to be Emacs.

Well it’s not. Surprise!

The remote interpreter support in PyCharm enables me to connect to a Python virtual environment anywhere on the planet, which I often do.

The JetBrains wizards have optimised the remote communication of code intelligence, so completion, documentation and general code understanding is almost indistinguishable from that on a completely local project.

Being able to step through a remote PyTorch neural network training iteration with the PyCharm debugger or any other remote Python algorithmics is insightful.

Two notable drawbacks are visualization and long-running jobs.

For the long-running jobs I do tend to use Jupyter Notebooks or when at all possible mosh, which is amazing. However, because the primary modality is not the notebook, my code is versioned and organised into separate libraries which I can call into from notebook or mosh.

For visualization, it’s either connecting to the altair chart server via SSH pipe, dumping the chart to the unison-synced project, and/or a Jupyter Notebook.

The rest.

Of course you use Postgres on an SSD for your data, and of course you know enough SQL to make short work of most of the heavy-weight transformations often required at the start your data crunching pipeline.

For all of my lab notes, reports, books, papers and blog posts, I use Emacs Org mode.

LaTeX math with live preview, live code snippets, SVG graphics, bibtex references, export to anything. This is one of the best ways to document your science.

Programming language addiction update.

I spend far too much obsessing over programming languages, old and new.

For the past two weeks, I wasted even more precious time than usual reading up about programming languages.

Because I would really like to spend more of my time on other, perhaps more valuable activities, I’ve been trying to better define what it is I’m actually looking for.

Of course there is no single best programming language, but a whole set of good languages that map in intricate ways to different problem domains.

In spite of this, I have been pining for a language with, in order of importance:

  1. A Functional Programming DNA, with which I’m referring to a) expression-orientedness, b) a preference for pure functions, and at a higher level, c) the modelling of reality as more or less explicit dataflows.
  2. Interactive programming, with Common Lisp being the textbook example of this.
  3. Great tooling and IDEs, meaning first-class support by something from JetBrains, Microsoft or Emacs.
  4. Great concurrency and parallelism stories.
  5. A great library ecosystem.
  6. Modest memory use.

Having just explicitly written this down for the first time (!! – it was consuming so much glucose just being kept amorphously swirling around in my brain) I can now mentally map some of my most recent language dalliances to these points.

go

This language is far too simple for my taste, but probably really great for teams.

I did recently take a more serious look when setting up a telegram bot using tbot and being amazed at how simple it was building web services like these using goroutines and channels.

Go satisfies points 3 to 6 from the list above. Makes sense that I decided to file this experiment away under “check when you need to put a webservice together REALLY QUICKLY”.

rust

When I saw up that rust, surprisingly, is an expression-oriented language, I flew through the O’Reilly Programming Rust book I had bought previously as part of a bundle.

Evaluating rust by the list above, we award it a fractional 1 because expression-oriented, 3 due to jetbrains plugin amongst others, 4(ish) – great memory safety, but compared to clojure, concurrency and parallelism stories still have much room to grow, a solid 5 thanks to cargo and a very strong 6.

I filed this one away under “re-evaluate whenever you reach for your trusty C++”. (also, actix-web looks amazing for super high performance microservices.)

f#

You didn’t see this one coming, did you?

Very strong 1 to 5 and a solid 6.

WAT?!

I’m currently working my way through Domain Modeling Made Functional by Scott Wlaschin, who is also the author of the brilliant f# for fun and profit website.

In addition to f# hitting all 6 of my 2018 PL-requirements above, I’m slowly starting to see the advantages of having a real type system under the hood.

f# is a member of the ML-family of functional languages, which have their origin in Lisp (some very naughty person removed all of the lovely parentheses I’m afraid…).

I hope that at some point I’ll have the opportunity to use f# in anger, at which point I’ll be able to report more concretely as to its suitability.

The End

Let me know in the comments what you think about any of this, or anything else.

I hope to meet you again in a few days, here or elsewhere.

Weekly Head Voices #146: You too can learn Kung Fu.

This post covers the period Monday June 11 to Sunday June 17. Read it to become rich, yawn at Lisp and Emacs, yearn to run free on the wide open plains and to learn Kung Fu. Not ambitious at all.

Front door nearby De Waal Park, in Cape Town. Photo taken on Sunday by GOU#1, age 12.

Social Democracy FTW

It turns out that your chances of becoming rich are the greatest if you had the good fortune to have been born in one of the Nordic social democracies, such as Norway, Sweden or Denmark.

The US trails these countries, at position 13, in terms of per capita individuals with net worth over $30 million.

Being a proponent of social democracy as the most humane form of currently practical human government, and often infuriating conservatives   by pointing out that many crucial aspects of social democracies can be described as socialistic, I really enjoyed the linked TEDx talk by Norwegian Harald Eia.

This material will serve me well as the source of future mischief.

Paradigms of AI Programming in Common Lisp

I am currently working my way through “Paradigms of Artificial Intelligence Programming: Case Studies in Common Lisp”, Peter Norvig’s famous 1992 book an artificial intelligence. Although modern AI has been transformed almost unrecognisably since then (THANKS DEEP LEARNING! Norvig’s PAIP retrospective) the way in which Norvig uses Lisp to model and solve real-world problems is inspiring and quite foundational.

It’s not only that though.

My inconvenient but uncontrollable infatuation with Common Lisp also seems to be pulling the strings. I should study a real language which is not 60 years old, like Rust or something.

What attracts me about Common Lisp is the liberated and pragmatic way in which it enables one to mix functional, object-oriented and procedural programming, and, perhaps most importantly, how it was designed from the ground up for iterative and interactive programming.

Tweak the defun, eval the defun, watch the system adapt. This is what I always imagined programming would be like. Except for the Lisps, it really turned out perhaps a bit more boring than it really needs to be.

interleave-mode for working through PDF books

For the fellow Emacs users, I also wanted to mention the utility of interleave-mode for working through such a programming book, if you can find it in PDF format.

In my Emacs I have the PDF on the left, and my interleave-mode-linked orgfile on the right. On any page of the PDF I hit the i-button to add a note in the orgfile, where I can of course insert and execute live code snippets.

The sections in the orgfile remain linked to the correct pages of the PDF.

For programming books this is an amazing combination. For studying other books, having your orgfile notes linked will probably also be quite useful.

On the topic of note-taking: This past week, on Friday June 15 (I made a note of that), I was able to help a colleague solve a technical problem by searching for and retrieving an org-file note, including detailed configuration settings, that I made on May 13, 2014.

Ether as currency

Although I acquired a small amount of the Ether cryptocurrency for the first time in July of 2016, I’ve never had the opportunity to actually transact with it.

Up to now, it has functioned solely as a pretty volatile store of value.

On Saturday, I used some ether for the first time to straight-up buy something on the internet, which was a pretty exciting but in practice an uneventful procedure, fortunately.

The vendor used a payment processor which presented me with an address and corresponding QR code. I scanned the QR code with the relevant mobile app (Luno in this case), paid the requested amount, and waited for a few minutes for it to be multiply confirmed by the blockchain. The sending fee was about 0.04% of the transaction.

Barefoot-style running update

On Sunday I went for a long(ish) run, bringing my total on the Luna sandals to just over 200km.

My feet, ankles and calves are much stronger than they used to be, but the barefoot conversion is clearly still has some ways to go. I have to take at the very least two rest days (instead of one) between runs to give my feet some extra time to recover.

What I have recently started doing, is that instead of trying to micro-manage my form (put your foot down like this, bend your ankle like that, let your achilles tendon shoot back like this, and so on), I am following the advice of some new random person on reddit/r/BarefootRunning who gave the advice, often echoed elsewhere by barefoot-runners, to try and maintain a cadence (steps-per-minute) of at least 180.

That sounds pretty high for a normal person like me, but it turns out that when I do that, and I try at the same time to run as silently as possible (I often just APPEAR right beside someone, hehe), my legs and feet figure out their elastic bio-kinematics all by themselves.

As yet another random reddit expert (I wish I could find the post) quipped:

You can’t overthink proprioception.

(that’s a running nerd joke)

I know Kung Fu

Do you remember this scene from The Matrix (1999)?

The other day at the Old People Reunion, friend T. Monster, a highly capable pragmatist but also backyard theoretician, talked about how often it happened these days that you had to deal with some DIY issue, tapped or spoke the question into youtube, watched a video or two, and then fixed the issue like a pro.

This, along with my recent pseudo-expert repair of a number of stripped cabinet hinge screw holes with tooth picks and cold glue (this works, I kid you not), made me think that, although The Matrix version was perhaps far more spectacular, we in fact now find ourselves in a real, shared reality where a large subset of skills can be acquired a la carte.

Some may take longer than a few minutes, but it still is pretty amazing how far YouTube has managed to democratise so many different forms of modern Kung Fu.

 

 

Weekly Head Voices #138: Born to run.

I am currently in a place with no to extremely little internet. Just getting the photo above uploaded was an adventure.

I briefly debated breaking my current WHV posting streak due to exceptional circumstances, but decided against it, at least for now.

Anyways, I might have no internet, but the scenery here is phenomenal.

(It later turned out that just getting this blog post uploaded on Sunday evening was not going to happen.)

Sometimes focus falls and slips into Emacs

During the past week I had a fairly difficult technical puzzle to deal with. It’s one of those puzzles that can only be solved with multiple days of research and concerted focus.

It’s funny how my mind manages to sort of slip away when faced with these sorts of puzzles where the solution, if it even exists (this is probably the main reason for the continual slippage), seems to be weeks away, instead of a few hours or days.

It’s like a usually sharp(ish) knife which simply refuses to bite into the thing that I so desperately want to cut with it.

In my specific case, especially later in the afternoons when prefrontal cortex is long-gone, mindlessly drinking its beer while staring into space somewhere, or even later in the evenings when everyone else is also drinking beer while staring into space, I wake up to find myself working on some obscure Emacs hack.

This week, primary thought slippage resulted in:

  • Hooking up my emacs, via helm-for-files with mdfind (the command-line interface to spotlight) on macOS and the tracker file indexer on Linux. This means that with a simple press of the C-x c o keys, I can instantly open any file in Emacs which is already open somewhere, which I’ve recently worked on, whose filename faintly resembles what I’m typing, whose contents (or tags) faintly resembles what I’m typing, no matter where that file is hiding in the hundreds of gigabytes on my SSD.
  • My efforts getting the above working for Linux are now part of helm, via the wonderful system of github pull requests.
  • Setting up Emacs dired to do rsync-based network copying in the background, which culminated in a github contribution which will hopefully also find its way into the main repository soon. (I do most of my serious file management in Emacs dired. You should try it.)

The Running People

I finally bought Born to Run by Christopher McDougall.

In between travelling and other activities, I was not able put this book down.

As I was reading the final pages on Sunday morning, I had trouble keeping my eyes dry. I had connected with the story and all of its nested stories on so many levels.

One strand of the story makes the case that humans, or more specifically homo sapiens, had evolved to run its prey down on the savannah.

We are able to cool ourselves down during running thanks to being mostly hairless and sweaty, whereas an antelope is not able to pant while galloping, and has no choice but to stop.

So the trick is simple: We can run fast enough to keep a galloping antelope in sight. Eventually it will lose the battle, overheat and collapse.

This is what homo sapiens did for millions of years for food. Homo neanderthalensis, our intelligent and stronger competition who used to dominate during colder times, had no chance.

McDougall connects with a number of scientists and sports trainers to flesh out this part of the story. Below is an interesting (and related) video about Prof Daniel Liebermann and his work on the evolutionary biology and biomechanics of barefoot running:

(Being internet-deprived, I’m not currently able to find one of the cited Nature papers discussing other elements of our biology underlining our running heritage. Remind me in the comments so I can update this later.)

Another strand of the story is about the Tarahumara of Mexico, also
known as Rarámuri
, or The Running People, a legendary tribe of natural super athletes who are masters of avoiding other humans (due to past persecution and other shenanigans) and of running 30 miles in the mountains, in sandals.

Even more intriguing than their home-made sandals, is that they run throughout their healthy lives with joy and exuberance.

The final strand I want to mention here, is McDougall’s personal journey from injury-prone runner all the way to finally taking part in the very first edition of a gruelling 50 mile trail race (the centre-piece of the story I would argue), together with the world’s best ultra marathoners and the Taramuhara in the Copper Canyons of Mexico.

For a large part of this journey, he and a number of other key actors are propelled along by Caballo Blanco, the White Horse, a supernaturally gifted runner who lived off the land in the Copper Canyons, and one of the few foreigners who seemed to be completely accepted by the Taramuhara.

Caballo is the one who managed to bring together, for the first time, the Taramuhara and the best ultra-marathoners in the small town of Urque for this humanity-affirming 50 mile trail run. He did so from his stone hut in the middle of nowhere, from where he would have to run for 30 miles to the closest settlement that had a telephone line that he could use.

Often that line was down.

Caballo passed away in 2012, shortly after the first Copper Canyon race. Shortly after, the race was officially dubbed the Ultramaraton Caballo Blanco.

Right after I put the book down, I put on my shoes and went for a run
up the mountain-side and in the valley over here. On the balls of my feet as they landed right under me, really small steps, straight back, trying my best to float over the earth, just like the running people in the book.

I could not help but smile for most of the way.