Just sharing a link to my post here so that it reaches Planet Clojure: Building ETL pipelines with Clojure and transducers.
Annoying issue I happened to experience lately. Servers that we launch inside
AWS from Ubuntu 16.04 AMI had this gst-plugin-scanner
process running and
eating 100% of one of CPU cores. Investigation revealed that this is a
gstreamer-related process which had no business running in a non-graphical
environment.
Any sufficiently advanced Clojure namespace is indistinguishable from a Java
class. Gosh, thankfully that's false. But what does actually happen is that the
ns
form gets large and bloaty, just like the imports blob in your everyday
Java file. Gone are the times when Clojure namespace declarations were small
cozy gardens that you wanted to cultivate and tend to manually. The tooling kept
up, so now Cursive promises you "to never have to look at your namespace form
again," and clj-refactor also has very handy features to auto-require
namespaces, sort and cleanup requires, etc.
Perseverance: flexible retries for Clojure
Few of us have the luxury to work exclusively with reliable systems. Sure, nothing is 100% reliable, but as long as you use only the resources local to a single machine, you may disregard the possibility of failure. However, as soon the network is involved, a whole array of problems emerges. You can no longer be sure that an HTTP request succeeds, or a file gets downloaded, or a TCP stream doesn't close midway. Thus, you have to protect your program from such calamities and devise a recovery plan in advance.
Generic fault tolerance in distributed systems is an incredibly difficult topic to contemplate on, so we won't trouble ourselves with it today. Instead, I want to discuss simpler matters, when an unsuccessful operation can just be tried again. It is usually true for actions that are immutable or idempotent, such as HTTP GET requests. We run into these scenarios quite often, so we created Perseverance.
Beware of assertions
What I write below might be universal knowledge, but I was personally bitten by this more than once, so I feel the need to emphasize it once again.
Clojure's assert form can be used for quick&dirty verification and consistency
checking in your code. It is convenient to have them during development because
they may discover incorrect usage of functions before the mistake drops deeper
into the callstack and fails with something like NullPointerException
or
java.lang.Long cannot be cast to clojure.lang.IFn
. And for this task assert
is perfectly fine — it is there out of the box, and you can disable the
assertions in production by setting clojure.core/*assert*
var to false.
The problems begin when you stick to assertions for prod-time data validation (once again, you may already be smart enough not do it. I wasn't).
I finally managed to polish a thing that lived on my hard drive in disassembled state for years. ns-graph is initially a fork of the abandoned clojure-dependency-grapher. As vain as it may seem, this tools is often very useful to me. It is surprisingly descriptive when you need to understand the high-level architecture of some project. Recently I wanted to catch up with the changes in cider-nrepl and figure out where exactly it uses its new dependencies. Well, sometimes you should think carefully what you wish for.
Implementing leaky channels with core.async
Imagine the following setup: service A serves a stream of data items by exposing the latest batch of these items as an HTTP endpoint. Application B polls service A every now and then, fetches the recent batch and puts the unwrapped items onto a core.async channel. The consumer(s) on the other side of the channel may come and go. The nature of the data is also such that it is droppable (not every item has to be processed) and becomes stale over time.
Implementing "cycle" transducer in Clojure
This post was originally written in a notebook. Check out Gorilla REPL if you haven't done it yet!
If you don't know yet what a transducer is, or how they work, there are two amazing talks to get started, both by Rich Hickey: Transducers and Inside Transducers. I watched those talks a couple of times each, and I moderately use transducers in my projects; thus I have a decent understanding of transducers. Or so I thought.
Yesterday my coworker and I delved into reading the already implemented
transducers in clojure.core, and started pondering why some seq-operating
functions don't have a transducer counterpart. Multiple-collection map
?
Probably missing because it doesn't fit the single-stream semantic very well.
take-last
and drop-last
? Those two can actually be implemented as
transducers by using a queue. And then… cycle
? Can we do cycle?
As it turned out, my knowledge of transducers was limited. I could use them, and
I got the idea behind them, but until I implemented one I lacked the perception
of the underlying machinery. Today I will guide you step by step through how we
implemented cycle
transducer, so you might also get a better comprehension of
this topic.
What Is a List? The Ultimate Predicate Showdown
This post was originally written in a notebook. Check out Gorilla REPL if you haven't done it yet!
A sizeable number of Clojure developers had some Common Lisp experience in the
past. When asked what the main Clojure advantages over CL are they often mention
having a single equality operator (compared to Lisp's eq
, eql
, equal
,
equalp
). It might come as a minor point, but in practice, it is very
cognitively exhausting to keep track of which one you should use. What's, even
more, jarring is that eql
— not the most intuitive one — is usually the
default. Can you explain with a straight face to a beginner that their
string-keyed hashtable didn't work because it was created with a wrong equality
operator? I never could.
But Clojure has a similar sin of its own — the multitude of list type
predicates. If woken up at 3 A.M. and asked what the standard Clojure data
structures are, you would likely name lists, vectors, maps, and sets. But how do
you tell if the given object is a data structure of a certain type? "Well",
you'd say, "there are vector?
, map?
, set?
, and… list?
? Leave me alone,
man, I'm trying to catch some z's."
Gotcha! The thing is, list?
is a weak predicate. It checks if the object is
precisely a PersistentList, but there are plenty of things in Clojure that look
like lists without being ones.
Omniconf: comprehensive configuration library for Clojure
TL; DR: Omniconf is a new configuration library for Clojure that unifies environment variables, command-line options, and config files; and ensures the configuration is complete and correct before the main application code runs.
Configuring your application is often a daunting and thankless task. It very much depends on the way you launch your program, so it is difficult to solve the configuration problem for the general case. Twelve-factor app guidelines suggest using environment variables for everything, but that rule is unnecessarily rigid. If you have many options, configuration files are more appropriate. When you launch the app from the command line (on a dev machine, for example), command-line arguments are preferable. But now you suddenly need a unified access to all those configuration sources.
There are already libraries that solve the configuration problem in Clojure, namely: Environ, Aero, Nomad, Fluorine. All of them are quite good at what they do; however, we at Grammarly needed extra functionality — to check the final configuration state before the main program executes, and make sure there are no missing or incorrect options. What began as a few helper functions were extracted into a separate library and called Omniconf.
For more posts see the Archives page.