Plaid Integration & Merchant Intelligence

How this platform uses Plaid to connect user accounts, sync transactions, detect subscriptions, and power a centralized merchant directory that bridges bank data with AI-parsed email intelligence.
Plaid Link SDK
/transactions/sync
/transactions/recurring/get
NestJS Backend
MongoDB
Cursor-Based Pagination
Fuzzy Match Engine
Confidence Scoring
AI Merchant Resolution

10-20 mins

Sync Time After
↓ 95% reduction (was 4.5 hrs)

-75%

AI Token Cost Reduction
via prompt caching + unified calls

Zero

Silent AI Features Post-Fix
real retry with exponential backoff

Overview

A Multi-Portal Workforce Platform for the Healthcare Industry

Physician scheduling in the US still runs on text threads, spreadsheets, and informal word-of-mouth. Shiftolic was built to replace this with a structured, real-time marketplace — connecting doctors who need coverage with hospitals and staffing companies that need reliable providers.

THE PROBLEM

Physicians needing shift coverage have no centralized platform. Locum companies manually manage pools of providers across spreadsheets and email chains. There is no structured way to post, discover, or claim shifts — resulting in last-minute scrambles and poor visibility for all parties.

The Solution

A four-portal SaaS platform where doctors can post and claim shifts, join specialty groups, verify credentials, and sync their calendar — while locum companies manage provider pools, post open shifts, and build out their facilities. One shared backend serves all portals.

Technology Stack

A four-portal SaaS platform where doctors can post and claim shifts, join specialty groups, verify credentials, and sync their calendar — while locum companies manage provider pools, post open shifts, and build out their facilities. One shared backend serves all portals.
NestJS
MongoDB
Mongoose
JWT Auth
React Native
React Native
Expo
Ant Design
Redux
AWS S3
Sendgrid
Twilio
Stripe
OpenAI
Google OAuth

Overview

Why Plaid is Central to This Platform

The platform's core value proposition is matching refunds found in a user's Gmail inbox against actual credits received on their credit and debit cards. Plaid is the financial data layer that makes this matching possible, without it, there's no way to verify whether a refund email resulted in a real credit on the user's account.

THE PROBLEM

Users receive refund confirmation emails but have no reliable way to verify whether the money actually landed on their card. Manual reconciliation across multiple banks and cards is impractical at scale.

THE SOLUTION

By connecting Plaid, The platform automatically syncs up to 2 years of transactions from all linked cards and runs a continuous matching pipeline that pairs email-detected refunds with actual credits in the transaction feed.

Account Linking

Multi-Card Linking via Plaid Link SDK

Users connect their bank accounts and credit cards through Plaid Link, an OAuth-based flow embedded directly in the onboarding experience. When a user completes the Link flow, the frontend receives a public_token which is exchanged at the backend exchange endpoint for a persistent access_token stored per user in MongoDB.

The system supports multiple cards per user from the ground up, each connected institution produces its own access_token and item_id. The Connected Accounts page renders all linked cards with institution name, account type, masked card number, and live connection status.

Token Expiry & Reconnect Flow

When a Plaid Item expires or a bank forces re-authentication, the system catches the error code from Plaid's API response and surfaces a "Reconnect" prompt on both the dashboard attention cards and the Connected Accounts page. A dedicated reconnect endpoint handles re-linking without disrupting previously synced transaction history.

Edge Case: 8-Card Timeout
A timeout was discovered when users had 8 or more cards connected simultaneously — the token validation loop hit Plaid's API sequentially, causing the request to exceed the server timeout window. Fixed by batching validation requests in parallel with a bounded concurrency limit.
Upsert on Re-link
Duplicate accounts are prevented with an upsert pattern — if an existing item_id is detected for the user, the system updates the token rather than creating a new record, preserving all historical transaction data.

Transaction Sync

Cursor-Based Transaction Sync Up to 2 Years of History

2 yr
Transaction history fetched per user on initial sync via Plaid's /transactions/sync endpoint

Once accounts are linked, the system fetches up to 2 years of transaction history using Plaid's /transactions/sync endpoint, which uses a cursor-based pagination model. Each sync call stores a cursor per linked item in MongoDB, so subsequent syncs only retrieve net-new, modified, or removed transactions — not the full history on every run.

Transactions are stored per user with all relevant fields and served to the Connected Account Details screen in the frontend, which provides a paginated, filterable transaction history view. Users can search by merchant, filter by date range, and drill down per account.

Rate Limit Handling

To handle Plaid's per-item API rate limits during periods of high concurrent user activity, the sync was moved to a batched execution model. Transactions are processed in defined batch sizes, and the cron-driven sync jobs are spread across users with staggered scheduling to avoid synchronized spikes in API consumption.

Cursor Continuity
The cursor value persists across sync runs. If a user connects a new card mid-sync or re-authenticates, the cursor for each item is preserved independently, ensuring no transactions are double-counted or missed during the overlap window.

Recurring Transactions

Subscription Detection &
Recurring Payment Management

Beyond one-time transactions, The platform integrates with Plaid's dedicated recurring transactions endpoint to detect subscription streams and give users a full view of what they're being charged regularly.

01

Fetch Recurring Streams

Call /transactions/recurring/get after initial transaction sync. Plaid analyzes transaction history to detect patterns by merchant, frequency, and amount consistency.
Foundation

02

Classify Subscription Type

Custom logic distinguishes true subscriptions (same merchant, same amount, 3+ consecutive months) from recurring charges (same merchant, variable amounts like utility bills).
Classification

03

Display & Persist State

Active/Cancelled status displayed per stream. User insights (ignored items) stored in MongoDB, not localStorage, ensuring state persists across all devices and browsers.
Persistance

The /transactions/recurring/get endpoint returns structured data for each recurring transaction stream, including merchant name, frequency (weekly, monthly, annual), average and last amount, and status — either active or early detection (fewer than 3 occurrences observed).

This data powers the Subscriptions tab, with filters for All, Active, and Cancelled states. The subscription view was designed to surface the distinction between two different types of recurring activity that users often conflate.

The Plaid Limitation & Our Fix

Plaid does not natively provide a "paused" or "cancelled" status, it only reflects what it can detect from the raw transaction stream. A subscription that stops charging simply stops appearing with an active next-date prediction.

Plaid's Delayed Data Refresh
Active subscriptions were incorrectly showing as cancelled because Plaid doesn't immediately update the predicted next charge date after a transaction posts. We introduced a buffer window that waits for Plaid's data refresh cycle before marking a stream as cancelled, validated by observing actual update timing in dev across multiple accounts.

Merchant Intelligence

Centralized Merchant Directory —
Bridging Plaid & Email Data

The most technically complex piece of the Plaid integration. Bank statements use abbreviated merchant names like "AMZN Mktp US" while refund emails say "Amazon." The merchant directory resolves this gap with fuzzy matching, confidence scoring, and a growing database of aliases.

Email Merchants are entities created automatically when the AI parsing pipeline extracts a merchant name from a refund email. Each record stores the normalized name, aliases, logo URL fetched from logo.dev, and a list of source users — the users whose emails produced this merchant — so reliability can be assessed before linking.

Plaid Merchants are raw merchant names from bank statement data, which typically use abbreviated, bank-statement-style formats that differ significantly from the brand name a user would recognize.

The matching engine runs an extensive string and fuzzy match algorithm combining exact match, tokenized partial match, and Levenshtein distance scoring with a final confidence value. This runs automatically every 24 hours to pick up new merchant data from fresh syncs, and can also be triggered manually on demand from the admin portal.

Impact on AI Costs

Once a Plaid merchant is linked to an email merchant, the system uses the pre-built alias map for all future transaction matching — eliminating the need for a Gemini AI call to resolve the merchant name at runtime. In production load testing, this resulted in a 90.3% DB match rate, meaning only 9.7% of merchant lookups required AI fallback.

Push Notifications via Expo
A dedicated page in the admin portal lets operators view all merchants, edit aliases, review suggested links, approve or reject fuzzy matches, and see exactly which users' emails produced each merchant record — providing full traceability on merchant data quality.
90%
Of merchant lookups resolved from DB — only 10% required AI fallback. Measured across a full 2-year backfill on a power-shopper account with 11,727 emails.

Refund Matching

How Email Refunds Map
to Plaid Transactions

Once transactions are synced and merchant records are linked, the matching pipeline runs a three-step confidence-scored algorithm to determine whether a Plaid credit corresponds to an email-detected refund.

01

Merchant Match — Foundation Confidence

Check if the merchant in the email matches the merchant in the bank transaction via the alias map. Exact name match yields high confidence. Fuzzy match (e.g. "AMZN Mktp" ≈ "Amazon") yields medium confidence. No match → transaction rejected immediately.
Foundation Signal

02

Date Window — Time Confidence

Only transactions that fall within the expected refund timeframe are considered. Transactions too early or too late are rejected. Matching the window increases confidence. Window width is configurable and varies by merchant return policy.
Time Signal

03

Amount Matching — Strongest Signal

The system calculates all realistic refund amounts from the email (full refund, partial, subtotal, order total) and checks each against the transaction amount to 1 decimal place precision. Partial refunds are handled by computing percentage-of-order calculations.
Strongest Signal
Card Mask Extraction for Link Card Status
When a refund email contains a card mask (e.g. "Amex 3045"), the AI parser extracts it and stores it against the refund record. The system then checks whether a Plaid-connected card with that mask exists for the user. If not, the refund is assigned "Link Card" status, prompting the user to connect the missing card. When a match is found, the card's stored mask is updated and the refund is automatically re-matched.

Outcomes

Key Outcomes

Measured results from the Plaid integration running in production across real user accounts.

Multi-Card Support with Graceful Reconnect

Users can connect an unlimited number of cards. Expired tokens surface targeted reconnect prompts without disrupting any existing data. Timeout bug resolved for 8+ card accounts.

2 Years of Transactions — Incrementally Synced

Cursor-based pagination ensures only new data is fetched on subsequent runs. Initial sync retrieves the full 2-year history; daily background syncs remain lightweight and rate-limit safe.

Subscription Detection with Custom Classification

True subscriptions separated from recurring charges. Cancelled state handled with a buffer for Plaid's delayed refresh. Insights state persists across devices via backend storage.

90.3% DB Merchant Match Rate

The pre-built merchant alias map resolves 90 out of every 100 merchant lookups without any AI call. Only 9.7% require Gemini fallback — dramatically reducing AI cost and latency.

Automated Daily Merchant Re-Linking

The fuzzy matching engine runs every 24 hours automatically, incorporating new merchant data from fresh syncs and growing the alias database continuously over time.

Full Admin Observability

The Email Merchants admin page provides complete visibility into merchant linking status, confidence scores, pending review items, and source user traceability for every merchant in the directory.

Your next product runs on AI. Let's build it.

Tell us what you're building and we'll show you how AI can make it faster, smarter, and built to last.
Let's Talk