Prompt Engineering in QA and Software Testing

Pragya Yadav

In the age of artificial intelligence (AI) and automation, software testing has evolved considerably. One emerging practice is “prompt engineering,” which is especially relevant when it comes to testing models like the OpenAI’s series. But what exactly is prompt engineering, and how does it fit within the software testing landscape? Let’s look into it, and then show you how testRigor is utilizing its model for faster and more efficient test creation.

Key Takeaways:
Clear prompts = better AI performance: Well-structured, unambiguous instructions help AI generate accurate and consistent test cases. Prompt engineering is iterative: Refining prompts through feedback loops improves model understanding and testing effectiveness. Diverse inputs are essential: Testing with varied prompts ensures fairness, robustness, and broader coverage of real-world scenarios. Manual intervention complements AI: When AI gets stuck, adding specific steps helps it resume and complete the task effectively. New QA skills are required: Modern testers need to understand AI behavior, language nuances, and domain context—not just code.

Key Takeaways:

Clear prompts = better AI performance: Well-structured, unambiguous instructions help AI generate accurate and consistent test cases.
Prompt engineering is iterative: Refining prompts through feedback loops improves model understanding and testing effectiveness.
Diverse inputs are essential: Testing with varied prompts ensures fairness, robustness, and broader coverage of real-world scenarios.
Manual intervention complements AI: When AI gets stuck, adding specific steps helps it resume and complete the task effectively.
New QA skills are required: Modern testers need to understand AI behavior, language nuances, and domain context—not just code.

What is Prompt Engineering?

At its core, prompt engineering involves designing, analyzing, and refining the inputs (or “prompts”) used to elicit responses from AI models, ensuring that the outputs are as desired. Just as a skilled interviewer can frame questions in various ways to get the most accurate answers from a human interviewee, prompt engineers frame their inputs to AI systems in a way that maximizes the accuracy, relevance, and clarity of the system’s outputs.

For many people, the phrase “using prompt engineering” is synonymous with “using ChatGPT”. However, this isn’t necessarily the case, as prompt engineering can be applied to a broad spectrum of models. Additionally, many companies have security concerns about using ChatGPT, which is one of the reasons why testRigor’s AI engine does not use it.

Why is Prompt Engineering Crucial in Software Testing?

Improves Model Understanding: Different prompts shed light on the AI model’s functioning, assisting in troubleshooting and behavior refinement.
Enhances Model Utility: Consistent and appropriate responses to a broad spectrum of user queries make models like chatbots or virtual assistants more valuable. Prompt engineering is the key.
Safety and Reliability: It’s imperative to identify and rectify potential problematic outputs for AI models in sensitive applications. Diverse prompts play a pivotal role.
1. Real-world Consequences: Inadequately tested AI models, especially in sectors like healthcare or autonomous vehicles, can have grave implications. This emphasizes the necessity of prompt engineering.
2. Contingency Measures: It’s beneficial for AI systems to have built-in mechanisms, like deferring to a human or providing generic answers, when faced with unfamiliar prompts.

Key Aspects of Prompt Engineering in Software Testing

Diverse Inputs:
- Examples: For instance, testing a chatbot requires prompts from different languages, colloquialisms, and cultural contexts.
- Impact on Model Fairness: Ensuring models don’t discriminate against specific groups mandates testing with diverse demographic inputs.
Iterative Refinement:
- Feedback Loop Creation: Continuous improvement is realized when insights from one test cycle inspire the next set of prompts.
- Integration with Other Testing Methods: Prompt engineering works best when integrated with other methods, like adversarial testing.
Collaboration with Model Training:
- Fine-tuning with Custom Prompts: Insights can guide further model refinement. If a style of prompt is consistently misinterpreted, it indicates a training gap.
- Active Learning: Challenging examples unearthed by prompt engineering can be incorporated into model retraining.

Comparison with Traditional Testing

Traditional QA methodologies often focus on fixed scenarios with predictable outputs. In contrast, prompt engineering, tailored for AI, accepts and even expects variability. While the former might rely heavily on predefined test cases, the latter leans into adaptability and exploration, navigating the vast landscape of potential AI responses to ensure consistency and reliability.

Training and Skillset for QA Testers

With AI becoming central to software solutions, the required skill set for QA engineers is also evolving. In the age of prompt engineering, understanding AI behavior, linguistic intricacies, and domain knowledge is as crucial as understanding code structure. Training programs are emerging to equip QA professionals with these competencies, ensuring they are primed to navigate the challenges AI presents.

Challenges and Considerations

Bias Mitigation: Testing prompts must be unbiased, ensuring the model’s fairness and wide applicability. Read: AI Model Bias: How to Detect and Mitigate.
Complexity of AI Responses: AI models, unlike traditional software, produce a broad range of responses, complicating the testing process. Read: What is Explainable AI (XAI)?
Subjectivity in Evaluating Responses: The “correctness” of AI responses can be open to interpretation, posing unique challenges.
Scalability:
- Automated Prompt Generation: Given the vastness of potential prompts, automated tools might be the answer to generate a plethora of test prompts, or even employ AI to craft challenging prompts for other AI systems.

Prompt Engineering in Software Testing Example

Now, let’s talk about how you can use prompt engineering to build your automated tests in testRigor. And before we dive into more details, here is an example of how to use Prompt Engineering for your test cases:

How does Prompt Engineering in Software Testing Work

As a prompt engineer, you copy and paste your test case into testRigor, which then breaks it down line by line. Each line is treated as a prompt and executed step by step by the AI. The system examines your screen at each step and determines what action should be taken based on the content displayed. In the context of testRigor, know all the super easy ways to create or generate tests: All-Inclusive Guide to Test Case Creation in testRigor.

High-level approach

When you are creating a test case, you can specify test case steps or sections in plain English, such as:

find a kindle
and add it to the shopping cart

This is how it will look in the UI:

After pressing confirm, sit back and relax. The testRigor engine will create the test case based on the criteria you’ve specified. However, upon execution, you might discover that it doesn’t perform as you intended:

As illustrated in the example, since no Kindle was selected, the system wandered around trying to satisfy the second prompt: add it to the shopping cart.

The problem arises because ‘find a Kindle’ is an instruction to merely locate it, not to select it. Meanwhile, the ‘add to cart’ instruction assumes the item has already been selected. The clear solution is to fine-tune the prompt to explicitly instruct the system to select a Kindle, as follows:

find a kindle and select it
and add it to the shopping cart

This is how it will look in the UI:

In essence, the primary responsibility in this scenario is ensuring that the prompt is lucid and straightforward. It may require supplementary clarifications or additional context to guide the system effectively and guarantee it operates as intended. You can follow this 3-step process to make sure that your prompting works as expected.

Step 1: Provide a Detailed Description of the AUT: testRigor will generate tests based on the app description that you will provide. Make sure that it is detailed enough to provide AI a clear context, so the AI-generated tests are relevant. For example, this is a good description for the Salesforce application: This is a full-featured web and mobile based CRM system. As a user you can create Contacts and Deals, set up associations between those and other objects, and much more. You can also build your custom forms backed by built-in Apex programming language, and search types of available objects”.

Step 2: Provide Non-ambiguous Test Case Description: testRigor generates test steps based on the test case description as well. Make sure you provide a non-ambiguous test case description and help AI to generate relevant test steps. You can also choose to select ‘AI Context‘ to have more meaningful test steps. In the example below we can provide the Test Case Description as Find, Select, and Add Kindle to Cart instead of Checkout test. This description is helpful for AI to generate relevant test steps.

Step 3: Provide Manual Inputs When Needed: AI may sometimes get stuck while building a test case, even with clear instructions. When this happens, step in to manually guide it by adding the specific steps needed to overcome the hurdle. After providing this help, click Use AI to complete creating this test so the AI can resume and finish the process from where you left off.

Other Prompt Engineering Techniques

Prompt engineering is a multifaceted field comprising numerous techniques. Let’s consider the ones that would be helpful in a QA environment.

Least-to-most prompting technique for QA prompts

Rooted in the principle of gradation, the ‘least-to-most’ technique seeks to guide AI systems incrementally. There might be instances where an AI doesn’t behave as anticipated. Drawing from this technique, one effective countermeasure is to fractionate the primary instruction into more granular, explicit steps, thereby facilitating the AI’s comprehension and execution.

Take, for instance, the directive:

add a kindle to the cart

When processed by the AI, this may result in ambiguity. To combat this, it might be better to articulate the steps more clearly, such as:

find a kindle and select it
and add it to the shopping cart

By employing such granular instructions, we can bridge the gap between AI’s interpretation and the desired outcome, ensuring smoother and more predictable system interactions. For more examples, dos and don’ts, read this detailed guide to know How to use AI effectively in QA.

You're 15 Minutes Away From Automated Test Maintenance and Fewer Bugs in Production

Simply fill out your information and create your first test suite in seconds, with AI to help you do it easily and quickly.

	Achieve More Than 90% Test Automation
	Step by Step Walkthroughs and Help
	14 Day Free Trial, Cancel Anytime

“We spent so much time on maintenance when using Selenium, and we spend nearly zero time with maintenance using testRigor.”

Keith Powe VP Of Engineering - IDT

Start testRigor Free

Request a Demo