I know this may sound weird, but I’d like to start this post with both an apology and a promise: I’m sorry for bringing another topic based on Azure, and promise that sometime in the next few years I’ll go back to other providers.
This time, instead of writing something based solely on the Azure ecosystem, I actually wanna talk about integrating an old friend of Ruby on Rails developers worldwide into this mess: good old Sidekiq.
Sidekiq is a gem (not gonna explain what a gem is, not in our scope) that can be used to run background jobs on Ruby projects. It can be useful to handle all those “report generation” and “data crunching” and “super amazing AI that is actually a bunch of if’s and else’s” tasks for you, instead of having the server threads handling that. A user submits a form, you start a worker with the received data, and you can immediately tell the user that something’s cooking while Sidekiq creates your new avatar with a collage of cat pictures. Or whatever you want, really.
Something to consider when using Sidekiq is that it runs a completely separate process than your Puma/Unicorn/Webrick/whatever server, and in some cases that requires a different machine: a new Docker container, a new Heroku Dyno, or in our case, a new Web App Service (arguably you can overcome this limitation with some tricks and hacks, but again, not on the scope of this post). Also, while there is a nifty web interface that you can add to your RoR project, Sidekiq itself doesn’t have any outwards facing port, and the information in the dashboard is actually obtained by querying Redis (ah, yes, Sidekiq uses Redis).
Image is taken from the official GH repository: https://github.com/mperham/sidekiq
Now that you all know at least as much about Sidekiq as I do, let’s go back to our scenario.
You have an RoR app. You need some tasks to be run in the background every once in a while. You bring in the big guns (ahem, Sidekiq), you force everyone in the team to install the gem, you add it to your start script, you set up a new Web App Service, you change the CI/CD pipeline to deploy Sidekiq to that Web App Service, and you take a 30 minute power nap while said CI/CD pipeline does its thing. It looks quick and easy, and you’ve done it a dozen times for other projects, so…what can go wrong, right?
Well, a few days after deploying the new service, you go to the dashboard and notice that the Sidekiq process has only been running for a few minutes. But the queue is empty, so you assume something went wrong and made the process restart. You add some instrumentation to keep an eye on this for you, and you go on with your life. But you actually don’t. Because as soon as you deploy the new instrumentation tools, you notice that the process is restarting every few minutes. At this point, it’s mostly annoying, but eventually, some tasks are actually held in the queue because the process restarted before the task could end. And the problem won’t solve itself, so you gotta find a solution as fast as possible.
Yay.
This part took me a lot longer to figure out than I’d like to admit, but actually, it’s related to something I mentioned above. While some tools will base themselves on the existence of a running process to check if a container/dyno/whatever is alive, and while it might look like that’s what Azure Web App Services do, it isn’t. You see, you usually can just spin up processes and have them running and doing their thing regardless of what that is, but these Web Apps were designed specifically for running, well, web apps. And the best way to make sure that a web app is running isn’t apparently by checking if the process is actually running, but by making a request, and making conclusions based on what’s returned. If you get a 200 OK, for example, your web app is good to go; a 500 Internal Server Error just means that the server has issues. Now, remember when I said above that Sidekiq doesn’t have an outwards facing port? Try and guess what happens when you probe the port used in the process above. I’ll help you: Azure assumes that the web app is not running, and restarts it. While I’m not saying this is wrong (sounds like a fair approach, really, even if not ideal), it means that you have your hands tied if you try to do something out of the ordinary, or even something as ordinary as running Sidekiq.
I don’t want to make this a weirdly long post, so I’ll just say that there are solutions for this specific issue out there (like sidekiq_alive, for example), that basically adds a worker listening on a port and returning a 200 OK, and in all fairness you can implement such a solution yourself. Could I be missing some other solution?
I’m not very good at writing conclusions.