This will be a very brief overview of our 2024 HackUTD hackathon
project. There were a couple of very interesting sponsor prompts, and
I’m still a little sad we never got the chance to use Pinata. We went
with PNC’s prompt, to build tooling for observability and data
lifetimes. Of course, the word “observability” and “lifetimes”
immediately caused us to think of Kafka and message queueing. So, for
our project, we decided to build a platform to model and display the end
to end life cycle of data cleanly and efficiently. To that end, with
only one all-nighter pulled. We produced Sauron
(a very
clever name I came up with my self, thank you very much). Here is a link of a
demo that due to time constraints we had to make before we could
fully hook it up the Confluents instance that we were simulating our
Extract, Transform, Load pipeline on. And here is the GitHub
repo
Briefly, we used Terraform to spin up a Confluents instance, as well
as several small python cli programs (which were containerized and
orchestrated over k8) which acted as modular consumer-producers to
simulate micro-services in our pipeline. We utilized some minimal (and
very under documented) OpenTelemetry python instrumentation, written
directly into the service, which hooked up to a Jaeger
instance (also deployed on k8). Finally, we somehow (I was asleep)
relayed the Jaeger data into our front end, which you could see in the
video.
Overall it was a very good, although exhausting hackathon, and we produced a very interesting project. I learned a lot, and was definitely the most hands on experience that I’ve had with Kafka. A couple take aways:
- Maybe python isn’t the best language for a hackathon. Go is synonymous with cloud native architecture, and it really felt like Jaeger and really the entire stack only tolerated python.
- Tooling, as opposed to consumer software, is difficult to build partially because the standards and protocols which are used have “rougher edges”. This is very similar to my experience working in embedded spaces, where protocols designed to I/O with a non-technical user (e.g. USB, SCPI) are much easier to leverage than whatever is going on with QUADSPI.