Thursday, Oct 10, 2024
 
                Case Study: Virtual Assistants
Case Study: AI-enabled Virtual Assistants
Chatbots, virtual assistants, and virtual customer service representatives have been under development for the last 60 years. With the advent of ChatGPT and other large language models (LLMs), it seems we finally have the tools to make these entities ubiquitous. In this article, we’ll take a quick look at the architecture of some of these programs and explore what you can look forward to if you choose to create a production-ready assistant.
Chatbots, virtual assistants, and virtual customer service representatives are collectively known as conversational agents. These are software applications designed to simulate human conversation through text or voice interactions. Leveraging advancements in natural language processing (NLP), machine learning, and artificial intelligence (AI), these agents can understand, interpret, and respond to user queries in a human-like manner. They enhance customer engagement by providing instant, 24/7 support. They automate routine inquiries and tasks, which reduces operational costs and allows human employees to focus on more complex and value-added activities.
Some historical perspective
ELIZA, one of the most well-known chatbots of all time, captured imaginations at the end of the 1960s. It was an impressive tour-de-force at the time, showcasing the advancements in computing techniques and architectures, and providing users with a textual interface to a primitive virtual psychiatrist. Nowadays, you can play with an implementation of ELIZA in the Python NLTK.
One of the first things you’ll notice is that it is not that “smart.” An experienced programmer could probably play with it for 5 minutes and know almost exactly how it works. This will be contrasted with chatGPT later, but it’s instructive to examine how ELIZA works, so let’s dwell for a moment.
The elementary objects used to implement ELIZA are:
- regular-expressions: these are used to match the user input to predefined outputs.
- string-templates: these are predefined responses that are somehow parameterized.
The main data-structure that ELIZA uses is a MatchList and can be described as the following python type: list[tuple[re.Pattern, list[str]]].  The basic algorithm, in python, would be:
def respond(user_input: str) -> str:
    for pattern, template_list in match_list:
        if match := pattern.matches(user_input):
            random_template = random.choice(template_list)
            # we assume that `format` will substitute the extracted
            # values in the regular expression
            return random_template.format(**match.groupdict())
    raise NoMatchError()
To illustrate how ELIZA works, consider the following interaction:
     - User: "I feel sad."
     - ELIZA: "Why do you feel sad?"
Here, the regular expression might match the phrase "I feel (emotion)" and extract "sad" as the emotion. The string template "Why do you feel {emotion}?" is then used to generate the response. While ELIZA can simulate conversation, *it does not understand the meaning of the text*. It relies entirely on pattern matching and predefined templates, which can lead to nonsensical or repetitive responses. Despite its simplicity, ELIZA laid the groundwork for modern conversational agents. Today's chatbots and virtual assistants use more advanced techniques, such as deep learning and maintaining conversational context, but **the basic principles of pattern matching and template-based responses are still relevant**.
A Virtual Assitant for Property Managers
ELIZA, while groundbreaking in its time, is now 60 years old and not particularly smart by today’s standards. Enter ChatGPT, which appears to be quite intelligent, though its inner workings remain somewhat opaque. ChatGPT’s impressive capabilities have led some to believe that the “chatbot problem” is essentially solved. This perception was evident among some of our clients, who assumed that integrating ChatGPT into their systems would be a straightforward task.
In this article, we’ll take a closer look at an assistant we developed for property managers. Here is a natural, simplified description of what the assistant should do:
- Open maintenance tickets for users reporting issues.
- Answer basic questions about their lease and policies regarding their stay.
- Forward any requests about new leases directly to the property manager.
Each of these presents specific challenges that we have addressed in various ways throughout the development process, but first let’s verify that the value such an assistant provides is in fact what we claimed they do. (1) and (2) certainly fit the mold of automating routine inquiries and tasks, and the whole system will provide instant, 24/7 support. (3) seems a bit more like a filtering activity, but the point is that this is one of those more complex and value-added activities. It seems quite reasonable for a business to want their human agents to handle sales. Also note there is an implicit task not listed:
- Understand which of these 3 tasks is appropriate for the last message receieved
Let’s look at these tasks in some detail.
Task 1: Opening maintenance tickets
Often times property managers would receive reports of maintenance issues. The residents already had access to a web-app where they could report these issues, but many of them were not aware of it. Our first idea was to fully handle the filing of tickets via the chat interface. We had success with this to some extent, but we were always able to find a scenario where the bot would incorrectly understand what it should do regarding maintenance. We finally decided to go with the following solution: instead of filing the ticket automatically, we will gather information and direct the resident to the online form - pre-filled with whatever information the resident has already provided. This solution, while not as “sexy” as a fully automated system, tends to be more acceptable because it keeps a human in the loop.
Nonetheless, this task provides us with a good opportunity to practice some entity extraction:
Entity extraction, or named entity recognition (NER), is the process of identifying and categorizing key pieces of information (entities) within a text, such as names, dates, locations, and numerical values. In virtual assistance, entity extraction helps the system understand user queries more accurately by pinpointing relevant information, enabling more precise and contextually relevant responses.
Conversation Example
Here is a demonstration of the capability we are talking about. Suppose we get this message from a resident:
- Resident: Hi, there’s a leak under my kitchen sink.
We could process this message with the following prompt:
Given the following conversation:
"""
**Resident**: "Hi, there's a leak under my kitchen sink."
"""
Please extract information and craft a response using the following JSON schema:
"""
{
    "complaint": "(string) the complaint from the resident",
    "category": "'plumbing'|'electrical'",
    "permission-to-enter": "(boolean|null) may the maintenance crew enter the apartment if the resident is not present",
    "response": "(string) a response for the resident"
}
"""
JSON:
As 16 May 2024, the gpt-4o model gives this output:
{
    "complaint": "Hi, there's a leak under my kitchen sink.",
    "category": "plumbing",
    "permission-to-enter": null,
    "response": "Thank you for reporting the leak under your kitchen sink. We will send a maintenance crew to address the issue. Could you please let us know if it is okay for our team to enter your apartment if you are not present?"
}
Now, in code, we can look at our extracted information and we see that we are missing the permission-to-enter field.  We assume that the model will request this information in our response, so we send it to the resident.  Maybe they respond like so:
- Resident: Hi, there’s a leak under my kitchen sink.
- Assistant: I’m sorry to hear that you have a leak under your kitchen sink. Our maintenance team will be dispatched as soon as possible. Could you please let us know if it’s okay for them to enter your apartment if you’re not present?
- Resident: Yes, they can enter if I’m not there.
Updating the conversation in the prompt above and processing it again, we get:
{
    "complaint": "There's a leak under my kitchen sink.",
    "category": "plumbing",
    "permission-to-enter": true,
    "response": "Thank you for providing the necessary information. Our maintenance team has been notified about the leak under your kitchen sink and will be dispatched as soon as possible. Since you have given us permission, they will enter your apartment even if you're not there to fix the issue. If you have any further questions or need additional assistance, feel free to reach out."
}
And the complete conversation is:
- Resident: Hi, there’s a leak under my kitchen sink.
- Assistant: I’m sorry to hear that you have a leak under your kitchen sink. Our maintenance team will be dispatched as soon as possible. Could you please let us know if it’s okay for them to enter your apartment if you’re not present?
- Resident: Yes, they can enter if I’m not there.
- Assistant: Thank you for providing the necessary information. Our maintenance team has been notified about the leak under your kitchen sink and will be dispatched as soon as possible. Since you have given us permission, they will enter your apartment even if you’re not there to fix the issue. If you have any further questions or need additional assistance, feel free to reach out.
Not too bad! As you can see, we can use the model as a powerful NLP tool to perform many tasks at once. This is what the model has to say for itself:
I excel at entity extraction, identifying and categorizing key information from the resident's message with precision. My contextual understanding allows me to infer issue categories, like recognizing "plumbing" from a "leak under the kitchen sink," effortlessly. I shine in dynamic response generation, crafting responses to gather missing details and making every interaction seamless. My ability to handle iterative interaction ensures that all necessary information is collected through multiple conversation turns, showcasing my conversational prowess.
Okay, and how would I accomplish all this 10 years ago?
Ten years ago, accomplishing these tasks would have been much more challenging. Entity extraction relied on rule-based systems and regular expressions, requiring extensive manual effort and lacking flexibility. Contextual understanding was rudimentary, often using basic keyword matching and heuristics, leading to frequent errors. Dynamic response generation involved pre-scripted replies and complex logic, lacking the adaptability of modern models. Iterative interaction was managed through rigid state machines or dialogue trees, making the system cumbersome and less responsive. Overall, the process was labor-intensive, less accurate, and far less efficient than today’s NLP capabilities.
I tend to say, tongue-in-cheek, that ELIZA was state of the art until chatGPT. While that’s a pretty hot take, it wouldn’t be far-fetched to say that the state of the art before LLMs was probably closer to ELIZA than it was to chatGPT.
Task 2: Answer basic questions
In our situation, this appears to be an instance of the well-known RAG task:
In RAG, a retrieval component first searches a large corpus of documents to find relevant information based on the input query. This retrieved information is then fed into a generative model, which uses it to produce a coherent and contextually relevant response.
We were not completely unprepared for this task when we set off - 5 years ago we created some question answering (QA) systems using the new and large (at the time) BERT models available from the budding 🤗 Hugging Face. We already knew about slicing up your documents into chunks and storing them in a vector database (ingestion), we were aware of the difficulties surrounding response selection and generation. We were hoping that the emergent AI models would help us overcome some of these challenges, and they did - to an extent.
A Vector Database (vectordb) is a specialized database for storing and managing numerical representations of text, known as vectors. In Natural Language Processing (NLP), these vectors capture the meanings and relationships of words, sentences, or documents. A vectordb is optimized to quickly find similar vectors, making it useful for tasks like finding similar texts, answering questions, and improving search results in applications such as chatbots and recommendation systems.
Indeed the vectordb is a key component to scalable QA systems: when we get a query from the user, we gather relevant information from the vectordb and use it to try to answer the question. Probably the most important part of creating these data structures is document ingestion, taking some (say) pdf document and deciding what text should be stored as information. In practice this involves “splitting” the text into smaller “chunks” of text, a process sometimes called “chunking.” For example, maybe we split on whitespace and take windows of size 100 words and store these in our vectordb for later retrieval.
However, a significant issue arises with naive document ingestion: context loss. When documents are divided into smaller text segments (chunks) during ingestion for use in a vectordb, essential contextual information can be lost, and when those text segments are placed in a new context, they may be incorrectly interpreted.
What we in fact found was that we had some documents for which no splitting would be appropriate. To demonstrate this, let’s take a pretty-close-to-real-life example. We have a document describing what utilities the resident must pay, and which are covered by the landlord. Here is what it might look like:
### Example Document: Utilities
| Utility/Service | Responsible Party |
|-----------------|-------------------|
| Electricity     | Resident          |
| Water           | Landlord          |
#### Responsible Party: Resident.
The resident is required to pay the utility.
#### Responsible Party: Landlord.
The landlord is required to pay the utility.
Of course, the real document is a pdf where the table is much more complicated, and it’s three pages filled with legalese.
Anyways, let’s suppose that we automatically ingest this document by splitting it up into chunks of text. In general, we may end up with the following two chunks of text represented in our vectordb:
- The resident is required to pay the utility.
- The landlord is required to pay the utility.
Now suppose someone asks: do I need to pay my water utility bill?.  Given the way vectordbs similarity engine works, it is not at all unreasonable to suppose that both, neither, or only one of those text-chunks could end up in the results returned from a similarity query.  And that’s the issue: suppose you are an AI model and you really really want to answer questions.  I give you a prompt like:
Given the following information:
1. The resident is required to pay the utility.
Try to answer the following question:
> do I need to pay my water utility bill?
In order to understand that you shouldn’t just answer yes in this situation, you don’t need artificial intelligence, you need artificial omniscience! Indeed, this demonstrates the problem with this approach that can’t be escaped: sometimes information is correct in context and incorrect when placed in a different context!
It’s so obvious - I’m sure that we all know that sometimes to understand something you need to read it all and not just a few sentences. To put it more sophisticatedly: natural language does not provide a context-free representation of what is true.
Task 3/4: Understanding when (not) to do something
In business logic, there are lots of conditions. If this then do that, otherwise do this, unless that other thing, in which case do Y instead of Z (except for this client). It’s no different for us: we want to answer questions for the resident, but if they ask about getting a new lease, we need to not try to answer that. Let’s try to reflect this in our schema, also taking the maintenance functionality into account:
MaintenanceRequest:
{
    "complaint": "complaint from the resident",
    "category": "'plumbing'|'electrical'",
    "permission-to-enter": "may the maintenance crew enter the apartment if the resident is not present"
}
ResidentQuestion:
{
    "question": "(string) complete question asked by resident (not related to getting a new lease)",
}
LeaseQuestion:
{
    "question": "(string) complete question asked by resident (must be related to getting a new lease)",
}
Now we need to decide how to provide a schema that chooses one or the other.  The most obvious type to create for this purpose would be the union type ResidentQuestion|MaintenanceRequest.  In our experience this tends to be quite taxing on the model and lead to degraded performance in other tasks.  This schema also suffers from one particularly fatal flaw, that the ResidentQuestion and LeaseQuestion cannot be distinguished by their fields alone!  That means we will get an entity like
{
    "question": "how long before move out can I renew my lease?"
}
And we won’t be able to tell which type of question it was!  Essentially, for entity extraction, we have to let go of a nominal way of thinking of things, and move toward an algebraic understanding (think typescript vs C#).  To correct our current schema, we could do something like:
MaintenanceRequest:
{
    "complaint": "complaint from the resident",
    "category": "'plumbing'|'electrical'",
    "permission-to-enter": "may the maintenance crew enter the apartment if the resident is not present"
}
ResidentQuestion:
{
    "question": "(string) complete question asked by resident",
    "is-lease-question": "(boolean) is the question related to getting a new lease?"
}
Analysis: MaintenanceRequest|ResidentQuestion
Note that the “two different types” have been combined into “one type” with a tag.
Now our schema is looking better, but let’s take another look at the question:
- how long before move out can I renew my lease?
How would you classify this? Is it a lease-question that should be forwarded to the property manager? Or is it some detail that’s probably provided in the lease agreement or stated in the policies somewhere? If in doubt, go complain to your client that they haven’t specified the problem well enough (that’s a joke).
And remember, I cooked this example up right now - your users will not stop finding new ways to confuse your system. What we found is that eventually in this situation we would need to decompose the task into smaller subtasks, such as running a classifier first to chose which entities to extract. While some tasks can be done in parallel, typically this leads to at least some serialization of queries to the model and greater latency in responses (its effect on token usage may vary, however, as you may find a lot of tokens saved when using smaller schema).
Rule-based systems in chatbot NLP, exemplified by ELIZA, offer high precision in narrow domains, are easy to understand and debug, and provide greater control over behavior. However, they struggle with scalability, flexibility, and adaptability to new scenarios. On the other hand, machine learning approaches are scalable, flexible, and adaptable to new data and scenarios, but they often operate as "black boxes," can be resource-heavy, and sometimes exhibit unacceptable behavior. Rule-based systems are ideal for well-defined tasks, such as answering frequently asked questions, while machine learning is best for tasks requiring flexibility, such as sentiment analysis. Combining both methods can leverage their respective strengths.
Conclusion
While there are many more issues and technical details we could explore, the main point is clear: with powerful, barrier-reducing tools like ChatGPT, virtual assistants are now a viable option for smaller businesses. However, don’t be fooled by the power of these shiny new tools - achieving the desired behavior will still require a development effort. And while far less expertise is needed to undertake such tasks now compared to a few years ago, it is not zero. If you have a basic understanding of NLP and chatbot functionality, then an LLM can be a powerful asset; otherwise, it may turn into an expensive crutch.
Ultimately, the key to success lies in understanding the unique requirements of your business and carefully designing your virtual assistant to meet those needs. With the right approach, virtual assistants can significantly enhance customer service, streamline operations, and provide valuable insights, making them an invaluable asset for modern businesses.