Senior SDET · AI Quality Engineer

Hey! I am Rana Usman Shahid, Senior SDET & AI Quality Engineer

Your AI fails silently — wrong answers, hallucinated facts, broken context. I catch what your error logs miss. For 6+ years I've built quality systems for production AI — LLM evaluation, RAG testing, agent quality, and performance under load — alongside high-stakes testing in FinTech and cybersecurity, where silent failures cost millions.

0 Years Experience
0 Projects Delivered
0 Prompts Evaluated
About Me

Quality systems for production AI — where silent failures cost millions.

Quality Assurance

Senior SDET with 6+ years specializing in production AI & LLM evaluation, RAG testing, agent quality, and prompt regression, alongside high-stakes testing in FinTech and cybersecurity. Deep expertise testing institutional investment management systems ($100M+ AUM), AI governance platforms (guardrails, PII filtering, observability, RBAC/ABAC), conversational AI agents, and semantic search validated to 94% relevance across 10,000+ queries. Proficient in Playwright, Appium, K6, Postman, REST Assured, JavaScript, Python, and AWS CloudWatch.

I've evaluated 1,200+ prompts and surfaced failure patterns to ML teams, lifted chatbot accuracy from 78% to 92%, and reduced critical production defects by 85%. My work isn't about writing more tests, it's about asking the questions that don't get asked until something goes wrong, and answering them before launch.

0 Fewer critical production defects
00 Chatbot accuracy lift
0 Semantic search relevance
$0 AUM systems tested
Skills

My Core Competencies

AI Quality & Evaluation

LLM Evaluation RAG Testing Agent Quality Prompt Regression Semantic Search Testing Chatbot Testing Hallucination Identification Guardrails & PII Filtering Bias Detection

Automation & Frameworks

Playwright Appium K6 REST Assured API Testing

Testing Types

Load & Performance Manual Testing Regression Exploratory & Risk-Based Functional Cross-browser Usability

Languages & Tools

JavaScript TypeScript Python SQL Postman AWS CloudWatch JIRA

Methodologies

Agile/Scrum TDD BDD Shift-Left Testing

Documentation

Test Plans Test Cases Bug Reports Test Scenarios

Platforms

Web App Mobile (iOS & Android)

OS / Browsers

Windows macOS Chrome Firefox Safari Edge

Let's Work Together

Have an AI system that has to be trusted before it ships? Let's find what breaks — before your users do.

Contact Me At: +92 311 4502708
Contact Me
Experience

Where I've Worked

April 2026 — Present

Senior Software Development Engineer in Test (SDET)

Kualimate

  • Founded Kualimate, an AI quality engineering practice — the diagnostic layer between AI models and the users they ship to. Services span LLM evaluation, RAG testing, agent quality, prompt regression, and pre-launch AI audits for B2B and B2C teams.
  • Leading the end-to-end load and performance program for a production LLM-powered virtual agent platform — concurrent-conversation simulation, latency degradation curves, and capacity planning.
  • Building a K6-based load testing framework in JavaScript with AI-specific metrics, including Time To First Token (TTFT), error rate under sustained load, and tail latency at scale.
  • Designing a Python evaluation pipeline that scores agent responses across five quality dimensions — accuracy, relevance, safety, instruction adherence, and conversational coherence — surfacing regressions before each release.
  • Engagements span B2B (enterprise AI, FinTech, SaaS) and B2C (consumer AI, conversational agents) across North America, EU, UK, and Australia.
January 2025 — Present

Senior Software Quality Assurance Engineer

CodingCops

  • Lead QA strategy across four AI and consumer products in parallel, mentoring a team of 12 QA engineers and embedding shift-left, risk-based testing into Agile delivery.
  • Designed validation frameworks for semantic search and chatbot systems; tested 1,200+ prompts and surfaced 80+ hallucination patterns to ML teams.
  • Improved chatbot accuracy from 78% to 92% via structured prompt regression, and lifted semantic search relevance to 94% across 10,000+ queries.
  • Validated chatbot access controls under RBAC and ABAC; exploratory and risk-based testing uncovered 120+ critical defects, including 15 high-severity security vulnerabilities.
  • Architected Playwright and Appium automation, cutting regression cycles from 12 hours to 3 (75% reduction) and enabling twice-weekly releases at a 98% pre-release defect resolution rate.
November 2022 — January 2025

Software Quality Assurance Engineer

Techverx

  • QA owner for four enterprise platforms in regulated, high-stakes environments — most notably a $100M+ AUM investment management system (OMS/EMS/PMS) with a zero-critical-defect streak for 18 consecutive months.
  • Prevented 5 critical financial calculation errors in portfolio valuation, trade execution, and interest accrual workflows using state-transition and boundary-value analysis.
  • Executed 1,200+ test cases per quarter at 96% pass rate with risk-based prioritization across concurrent streams.
  • Validated 30+ payment and financial API endpoints in Postman — data integrity, authentication, error handling, and performance.
January 2021 — October 2022

Software Quality Assurance Engineer

LeapSofts

  • QA delivery for 8 high-traffic eCommerce platforms generating $2M+ in monthly transactional revenue, owning checkout flows, inventory systems, and payment-gateway integrations.
  • Caught 25+ critical payment integration defects pre-launch, safeguarding an estimated $200K+ in transaction revenue.
  • Held consistent performance for 95% of active users through cross-browser and cross-device testing across 15+ environment combinations; usability findings tied to a 20% lift in customer satisfaction.
Portfolio

My Recent Projects

Testimonials

Trusted By Multiple Clients

Want to Hire Me?

Let's Work Together On a Project

Tell me about your AI system or product. I'll tell you where it's most likely to fail — and how to prove it won't.