Testing an LLM integration in Rails

Alicia Rojas
4 min readMay 8, 2024

As AI-powered features become increasingly prevalent in web development, it’s crucial to ensure that our applications behave as expected. In this tutorial, we’ll walk through the process of setting up a comprehensive system (also called “feature”) test to verify that your application’s AI-powered features function correctly for end-users.
Some code samples have references to RSpec, but you can apply the general idea to any framework of your choice.

Understanding the implementation

The idea of “testing behavior, not implementation” is useful in pretty much all types of tests, but is particularly relevant in feature tests. However, understanding the implementation helps us understand the scope of our testing code and how to set up the testing environment to ensure that the feature is tested as realistically as possible.

As we can see in the diagram, the interactions between the app and the external service are encapsulated inside an adapter class that we call “ApiConnector”. In the following section, we’ll see how this pattern will become handy for our tests.

This is roughly how a TextGeneratorService would look like:

class TextGeneratorService < ApplicationService
cattr_accessor :client
self.client = OpenAi::ApiConnector # This is the real adapter

def initialize(prompt)
@prompt = prompt
end

def call
self.class.client.new(prompt).send_request
end
end

And the ApiConnector class:

class OpenAi::ApiConnector
attr_reader :prompt

def initialize(prompt)
@prompt = prompt
end

def send_request
# make request and parse response
end
end

We will assume that we have unit and intermediate-level tests that cover the behavior of the ApiConnector, the TextGeneratorService, the requests made by the controller, the model (in case you want to store the LLM responses in your database), and so on. In our feature test we will make sure that all these pieces ensemble together properly in a “happy path”, this is when everything just works.

First problem: the external service

When testing an LLM integration (or any API integration), we likely want to avoid performing real requests to the external service, for various reasons: network reliability, test suite performance, and unnecessary charges (if it’s a paid API).

Solution: fake adapter approach

We mentioned before that we encapsulated the interaction with the API inside an adapter class consumed by a service. Common approaches to test external services include stubbing the adapter instead of making the network request, or using gems such as Webmock or VCR. However, in feature tests, we try to avoid stubbing to test the entire system. In this sense, From ThoughtBot’s excellent guide “Testing Rails”, we took the idea of injecting a fake adapter to avoid real requests during our test execution.

This fake adapter implements the minimum required methods so we make little to no changes to the service that consumes the real adapter. This is a simplified version of such a class:

class OpenAi::FakeConnector
attr_reader :prompt

def initialize(prompt)
@prompt = prompt
end

def send_request
{ "response" => "This is a test response to #{prompt}" }
end
end

Now we need to tell the service to consume this adapter during the test execution. There are several ways we can accomplish that. My preferred method is to write an around hook inside rails_helper.rb like so:

RSpec.configure do |config|
config.around :each, type: :system do |example|
TextGeneratorService.client = OpenAi::FakeConnector
example.run
TextGeneratorService.client = OpenAi::ApiConnector
end
end

This is useful if you want to use the fake adapter in some types of tests and the real one in others. In my case, I had intermediate-level examples that were testing side effects on the real adapter without actually making a request (like spying into the class to check that this or that method was being called). Thus I wanted the fake client to work only on system tests.

Otherwise, you can set the fake adapter based on the environment inside the consumer class. I don’t recommend this approach since it feels like a smell to have test-related code inside production code. But it’s up to you.

Second problem: background jobs

When dealing with external services and especially if you have several requests per minute, it is best practice to defer API calls to Background Jobs. In our case, we use Sidekiq to handle this behavior.
Sidekiq provides a couple of dedicated methods to facilitate testing. If you load this module in your RSpec configuration, by default all jobs will be pushed into an array that acts as a fake queue, which means jobs will not execute.

The other option is Sidekiq::Testing.disable! which means that jobs will be pushed to Redis. We might we want to test that our job is enqueued, but we are already testing that in our intermediate tests. This means we don’t need to enqueue jobs during the feature test (so we don’t make it any slower than already is!).

This long explanation was to say that we prefer Sidekiq::Testing.inline! to run the jobs immediately instead of enqueuing them. Following Evil Martians’ guide to configuring system tests, we have a dedicated system_helper.rb file that may contain configuration related to system tests. It made sense to place that code inside, as we only want our system tests to run jobs.

Final thoughts

With these changes, we can now write a system test that will simulate receiving a response from the LLM’s API, without significantly altering the production code.

It may seem like too much work for a test, but if your app’s main feature depends on an external service, consider it an investment: a reliable test suite means less manual testing every time you deploy new code and fewer regression headaches.

If you face the need for refactoring code to write your test, take it as an opportunity! Readable and maintainable code is usually easy to test. If you feel this is not your case, writing tests is a great and fun way to encourage a necessary refactor.

--

--

Alicia Rojas

Compulsive (self)learner, musician and software developer