Building an AI-powered BI tool on Upsun

This tutorial builds on the LangChain chatbot from example 01. Instead of answering documentation questions, this version connects to a MySQL database and lets users query ecommerce data in plain English. The LLM translates questions into SQL, runs them against a read-only database endpoint, and formats the results for the user. It also uses Upsun’s MySQL read-only endpoints, because letting an LLM generate SQL against a database with write access is asking for trouble.

What you’re building

A chat interface where users type business questions (“What are the top 10 best-selling products?”, “Show me the monthly revenue trend”) and get answers from a real database. The app converts each question to SQL, validates it, runs it on a read-only MySQL connection, and sends the raw results back through the LLM for formatting. The pipeline:

User asks a question in plain English
The LLM generates a SQL query based on the database schema
The SQL guard validates the query (SELECT only, enforced LIMIT, no dangerous keywords)
The query runs on a read-only MySQL endpoint
Results go back to the LLM
The LLM formats an answer with markdown tables and summaries
The answer streams to the user

What changed from the chatbot

If you’ve read the chatbot tutorial (example 01), here’s what’s different. Gone:

Context stuffing (loading all Upsun docs into the prompt)
Datamarking and message enclosure (security features for document context)
Output filtering and n-gram leakage detection
Prompt injection pattern detection
The build-context.ts script and context.txt file

New:

MySQL database with an ecommerce schema (products, customers, orders, carts, reviews)
A seed script that populates the database with coherent fixture data on deploy
A SQL guard that validates generated queries before execution
Separate read-only and write database connection pools
A two-step LLM flow (generate SQL, then format results)
Upsun MySQL service with read-only and admin endpoints

The frontend is the same chat UI from the LangChain chatbot tutorial with different starter prompts and markdown table rendering.

Prerequisites

Node.js 22+, pnpm, an OpenAI API key from platform.openai.com, a local MySQL server for development, the Upsun CLI, and Git.

Project setup

Create the project:

mkdir bi-tool
cd bi-tool
pnpm init

Install dependencies:

pnpm add express dotenv cors @langchain/core @langchain/openai mysql2
pnpm add -D typescript @types/node @types/express @types/cors tsx @biomejs/biome

mysql2 is the MySQL client. It has built-in promise support, so no callback wrangling. Configure TypeScript (tsconfig.json): View source on GitHub

{
  "compilerOptions": {
    "target": "ES2022",
    "module": "ES2022",
    "moduleResolution": "node",
    "outDir": "./dist",
    "rootDir": ".",
    "strict": true,
    "esModuleInterop": true,
    "skipLibCheck": true,
    "forceConsistentCasingInFileNames": true,
    "resolveJsonModule": true
  },
  "include": ["src/**/*", "scripts/**/*"],
  "exclude": ["node_modules", "dist"]
}

Note: rootDir is set to . instead of ./src because the seed script lives in scripts/ and imports from src/. Both directories need to compile. Add scripts to package.json:

{
  "type": "module",
  "scripts": {
    "dev": "tsx watch src/index.ts",
    "build": "tsc",
    "seed": "tsx scripts/seed.ts",
    "start": "node dist/src/index.js"
  }
}

The database layer

Database connection

Create src/database.ts: View source on GitHub

import mysql from "mysql2/promise";

let pool: mysql.Pool | null = null;
let readPool: mysql.Pool | null = null;

function parseUpsunRelationship(prefix: string): mysql.PoolOptions | null {
  const host = process.env[`${prefix}_HOST`];
  const port = process.env[`${prefix}_PORT`];
  const user = process.env[`${prefix}_USERNAME`];
  const password = process.env[`${prefix}_PASSWORD`];
  const database = process.env[`${prefix}_PATH`];

  if (!host) return null;

  return {
    host,
    port: port ? Number.parseInt(port, 10) : 3306,
    user: user || "root",
    password: password || "",
    database: database || "main",
    waitForConnections: true,
    connectionLimit: 5,
  };
}

function getLocalConfig(): mysql.PoolOptions {
  return {
    host: process.env.MYSQL_HOST || "127.0.0.1",
    port: Number.parseInt(process.env.MYSQL_PORT || "3306", 10),
    user: process.env.MYSQL_USER || "root",
    password: process.env.MYSQL_PASSWORD || "",
    database: process.env.MYSQL_DATABASE || "bi_tool",
    waitForConnections: true,
    connectionLimit: 5,
  };
}

export function getWritePool(): mysql.Pool {
  if (!pool) {
    const upsunConfig = parseUpsunRelationship("DATABASE");
    const config = upsunConfig || getLocalConfig();
    pool = mysql.createPool(config);
  }
  return pool;
}

export function getReadPool(): mysql.Pool {
  if (!readPool) {
    const upsunConfig = parseUpsunRelationship("REPORTS");
    const config = upsunConfig || getLocalConfig();
    readPool = mysql.createPool(config);
  }
  return readPool;
}

export async function queryReadOnly(sql: string) {
  const pool = getReadPool();
  const [rows, fields] = await pool.query(sql);
  return {
    rows: rows as Record<string, unknown>[],
    fields: fields.map((f) => f.name),
  };
}

Two connection pools. getWritePool() connects through the DATABASE relationship (admin privileges), used only by the seed script. getReadPool() connects through the REPORTS relationship (read-only), used by the app at runtime. Locally, both pools use the same MySQL config from your .env. On Upsun, they get separate credentials. parseUpsunRelationship reads the environment variables that Upsun injects when you define a relationship. A relationship named database gives you DATABASE_HOST, DATABASE_PORT, DATABASE_USERNAME, DATABASE_PASSWORD, and DATABASE_PATH.

The schema definition

Create src/schema.ts: View source on GitHub

export const DB_SCHEMA = `
Database: bi_tool (MySQL)

Tables:

1. categories
   - id INT PRIMARY KEY AUTO_INCREMENT
   - name VARCHAR(100) NOT NULL
   - description TEXT

2. products
   - id INT PRIMARY KEY AUTO_INCREMENT
   - name VARCHAR(200) NOT NULL
   - description TEXT
   - category_id INT (FK -> categories.id)
   - price DECIMAL(10,2) NOT NULL
   - stock INT NOT NULL DEFAULT 0
   - created_at DATETIME DEFAULT CURRENT_TIMESTAMP

3. customers
   - id INT PRIMARY KEY AUTO_INCREMENT
   - first_name VARCHAR(100) NOT NULL
   - last_name VARCHAR(100) NOT NULL
   - email VARCHAR(200) NOT NULL UNIQUE
   - city VARCHAR(100)
   - country VARCHAR(100)
   - created_at DATETIME DEFAULT CURRENT_TIMESTAMP

4. orders
   - id INT PRIMARY KEY AUTO_INCREMENT
   - customer_id INT NOT NULL (FK -> customers.id)
   - status ENUM('pending','processing','shipped','delivered','cancelled')
   - total DECIMAL(10,2) NOT NULL
   - created_at DATETIME DEFAULT CURRENT_TIMESTAMP

5. order_items
   - id INT PRIMARY KEY AUTO_INCREMENT
   - order_id INT NOT NULL (FK -> orders.id)
   - product_id INT NOT NULL (FK -> products.id)
   - quantity INT NOT NULL
   - unit_price DECIMAL(10,2) NOT NULL

6. carts
   - id INT PRIMARY KEY AUTO_INCREMENT
   - customer_id INT NOT NULL (FK -> customers.id)
   - created_at DATETIME DEFAULT CURRENT_TIMESTAMP

7. cart_items
   - id INT PRIMARY KEY AUTO_INCREMENT
   - cart_id INT NOT NULL (FK -> carts.id)
   - product_id INT NOT NULL (FK -> products.id)
   - quantity INT NOT NULL DEFAULT 1

8. reviews
   - id INT PRIMARY KEY AUTO_INCREMENT
   - product_id INT NOT NULL (FK -> products.id)
   - customer_id INT NOT NULL (FK -> customers.id)
   - rating TINYINT NOT NULL (1 to 5)
   - comment TEXT
   - created_at DATETIME DEFAULT CURRENT_TIMESTAMP

Relationships:
- products.category_id -> categories.id
- orders.customer_id -> customers.id
- order_items.order_id -> orders.id
- order_items.product_id -> products.id
- carts.customer_id -> customers.id
- cart_items.cart_id -> carts.id
- cart_items.product_id -> products.id
- reviews.product_id -> products.id
- reviews.customer_id -> customers.id
`.trim();

This string goes straight into the system prompt. The LLM needs exact table names, column names, types, and relationships to generate correct SQL. We hardcode it as plain text rather than querying information_schema at runtime because it’s simpler and the schema doesn’t change.

The SQL guard

The SQL guard sits between the LLM output and the database. LLMs generate text. They can be prompted to generate anything, including DROP TABLE. The guard rejects anything that isn’t a safe read query. Create src/sql-guard.ts: View source on GitHub

const MAX_LIMIT = 500;

const FORBIDDEN_KEYWORDS = [
  "INSERT", "UPDATE", "DELETE", "DROP", "ALTER", "CREATE",
  "TRUNCATE", "REPLACE", "GRANT", "REVOKE",
  "CALL", "EXEC", "EXECUTE",
  "INTO OUTFILE", "INTO DUMPFILE", "LOAD_FILE", "LOAD DATA",
];

export function validateSQL(sql: string) {
  const trimmed = sql.trim().replace(/;+$/, "").trim();

  if (!trimmed) {
    return { valid: false, error: "Empty query" };
  }

  // Must start with SELECT or WITH (for CTEs)
  const upper = trimmed.toUpperCase();
  if (!upper.startsWith("SELECT") && !upper.startsWith("WITH")) {
    return { valid: false, error: "Only SELECT queries are allowed" };
  }

  // Check for forbidden keywords
  for (const keyword of FORBIDDEN_KEYWORDS) {
    const pattern = new RegExp(`\\b${keyword}\\b`, "i");
    if (pattern.test(trimmed)) {
      return { valid: false, error: `Forbidden keyword: ${keyword}` };
    }
  }

  // No multiple statements
  if (trimmed.includes(";")) {
    return { valid: false, error: "Multiple statements are not allowed" };
  }

  // Enforce a LIMIT if none present
  let sanitized = trimmed;
  if (!/\bLIMIT\b/i.test(sanitized)) {
    sanitized = `${sanitized} LIMIT ${MAX_LIMIT}`;
  }

  return { valid: true, sanitized };
}

Four checks:

The query must start with SELECT or WITH (for CTEs). Everything else is rejected.
Certain keywords are forbidden even inside a SELECT. INSERT, UPDATE, DELETE, DROP are obvious. INTO OUTFILE and LOAD DATA prevent file system access. GRANT and REVOKE prevent privilege escalation.
Semicolons inside the query are blocked. This stops stacking attacks like SELECT 1; DROP TABLE products.
If the LLM forgets a LIMIT, the guard appends one (capped at 500 rows). No accidental full table scans.

The read-only MySQL endpoint would already reject writes at the database level. But catching bad queries in the application means faster feedback, better error messages, and less work for the database server.

The chat logic

Create src/chat.ts: View source on GitHub

import { AIMessage, type BaseMessage, HumanMessage, SystemMessage } from "@langchain/core/messages";
import { ChatOpenAI } from "@langchain/openai";
import { queryReadOnly } from "./database.js";
import { DB_SCHEMA } from "./schema.js";
import { validateSQL } from "./sql-guard.js";

const SYSTEM_PROMPT = `You are a Business Intelligence assistant for an ecommerce store. You help users explore their data by answering questions in plain English.

You have access to a MySQL database with the following schema:

${DB_SCHEMA}

When the user asks a question about the data, you MUST follow this process:
1. Generate a SQL query to answer it.
2. Wrap the query in a <sql> tag like this: <sql>SELECT ...</sql>
3. You will then receive the query results and must provide a clear, helpful answer based on them.

Rules:
- Only generate SELECT queries. Never modify data.
- Always include a reasonable LIMIT (max 500 rows).
- Use JOINs when the question spans multiple tables.
- Format numbers, currencies, and dates nicely in your answers.
- If the question is ambiguous, pick the most reasonable interpretation and explain your assumption.
- If a question cannot be answered from the available data, say so.
- When presenting results, use markdown tables for tabular data and bullet points for summaries.
- Keep answers concise and focused on what was asked.`;

Simpler than the chatbot version. No security sandwich, no datamarking. The context is a fixed schema definition rather than user-contributed documents, so the injection surface is smaller. The LLM wraps SQL in <sql> tags, which makes extraction reliable. The chat flow happens in two LLM calls:

export function createChat() {
  const model = process.env.OPENAI_MODEL || "gpt-4o-mini";
  const llm = new ChatOpenAI({ model });

  return {
    async *stream(message, history) {
      const messages = [new SystemMessage(SYSTEM_PROMPT)];

      // Add conversation history
      // ...

      messages.push(new HumanMessage(message));

      // Step 1: Ask the LLM to generate SQL (not streamed to user)
      let sqlGenerationResponse = "";
      const sqlStream = await llm.stream(messages);
      for await (const chunk of sqlStream) {
        sqlGenerationResponse += chunk.content;
      }

      const sql = extractSQL(sqlGenerationResponse);

      if (!sql) {
        // No SQL needed, stream a direct answer
        // ...
        return;
      }

      // Step 2: Validate
      const validation = validateSQL(sql);
      if (!validation.valid) {
        // Tell the user and ask the LLM to try without SQL
        // ...
        return;
      }

      // Step 3: Execute on read-only pool
      const { rows, fields } = await queryReadOnly(validation.sanitized);

      // Show the SQL to the user
      yield `**Query:**\n\`\`\`sql\n${validation.sanitized}\n\`\`\`\n\n`;

      // Step 4: Send results back to LLM for formatting
      messages.push(new AIMessage(sqlGenerationResponse));
      messages.push(new HumanMessage(
        `Here are the query results:\n\n${JSON.stringify({ fields, rows })}\n\nProvide a clear answer based on these results.`
      ));

      const answerStream = await llm.stream(messages);
      for await (const chunk of answerStream) {
        yield chunk.content;
      }
    },
  };
}

The first LLM call collects the full response without streaming it to the user. It extracts the SQL, validates it, runs it. Only then does the second call stream the formatted answer. There’s a pause while SQL generation and execution happen, but the typing indicator covers it. If the LLM doesn’t produce SQL (for a question like “What tables are available?”), the response streams directly. If the SQL guard rejects a query, the user sees why, and the LLM gets a second chance to answer without SQL.

Fixture data

The seed script generates deterministic ecommerce data using a seeded PRNG (mulberry32, seed 42). Same data every time, which makes testing predictable. Create scripts/seed.ts. The full script is in the repository. It generates:

10 categories (Electronics, Clothing, Home & Kitchen, Books, Sports, Beauty, Toys, Food, Office, Garden)
100 products, 10 per category, with realistic names, prices ( $5 to$ 300), and stock levels
500 customers, names from common first/last name pools, spread across 20 cities in 10 countries
750 orders with 1 to 5 items each, weighted toward “delivered” status (60%), totals calculated from actual product prices
100 active carts for random customers, 1 to 4 items each
800 reviews, ratings skewed toward 3 to 5 stars, one review per customer per product

The seed is idempotent. It checks if data exists before inserting:

const [rows] = await pool.query("SHOW TABLES LIKE 'categories'");
if (Array.isArray(rows) && rows.length > 0) {
  const [countRows] = await pool.query("SELECT COUNT(*) as cnt FROM categories");
  if (count > 0) {
    console.log("[seed] Database already seeded, skipping");
    return;
  }
}

Redeploys don’t wipe the database. The seed only runs when the tables are empty.

The Upsun configuration

This is the interesting part. Create .upsun/config.yaml: View source on GitHub

applications:
  bi-tool:
    type: "nodejs:22"

    variables:
      env:
        OPENAI_MODEL: "gpt-4o-mini"

    build:
      flavor: none

    dependencies:
      nodejs:
        pnpm: "9.15.4"

    hooks:
      build: |
        set -e
        pnpm install --frozen-lockfile
        pnpm build

      deploy: |
        set -e
        node dist/scripts/seed.js

    web:
      commands:
        start: "node dist/src/index.js"
      locations:
        /:
          passthru: true
          allow: false
          scripts: false
          rules:
            \.(css|js|gif|jpe?g|png|svg|ico|woff2?|ttf|eot)$:
              allow: true

    mounts:
      "/.npm":
        source: "storage"
        source_path: "npm_cache"
      "/.pnpm-store":
        source: "storage"
        source_path: "pnpm_store"

    relationships:
      database: "mysql:admin"
      reports: "mysql:reader"

services:
  mysql:
    type: mariadb:11.4
    configuration:
      schemas:
        - main
      endpoints:
        admin:
          default_schema: main
          privileges:
            main: admin
        reader:
          default_schema: main
          privileges:
            main: ro

routes:
  "https://{default}/":
    type: upstream
    upstream: "bi-tool:http"
  "https://www.{default}/":
    type: redirect
    to: "https://{default}/"

MySQL read-only endpoints

The services.mysql.configuration section defines two endpoints instead of one:

admin: full read/write access to the main schema. The seed script uses this during deploy.
reader: read-only (ro) access to the same schema. The app uses this at runtime.

The application maps relationships to these endpoints:

relationships:
  database: "mysql:admin"
  reports: "mysql:reader"

Upsun injects credentials for each relationship as environment variables. The database relationship gives you DATABASE_HOST, DATABASE_PORT, DATABASE_USERNAME, DATABASE_PASSWORD, and DATABASE_PATH. The reports relationship gives you the same variables with the REPORTS_ prefix. The reader endpoint gets its own MySQL user with only SELECT privileges. If the application sends a write query through the read pool, the database rejects it. Hard stop, at the infrastructure level, regardless of what the application code does. This matters because you’re trusting a probabilistic text generator to follow your instructions. It usually does. “Usually” is not a word you want next to DROP TABLE. The SQL guard catches problems at the application level. The read-only endpoint catches everything the guard misses at the database level.

The deploy hook

The deploy hook runs node dist/scripts/seed.js on every deployment. Since the seed script is idempotent, it only inserts data the first time. After that it detects existing data and exits. The seed script uses the database (admin) relationship because it needs write access. The application server uses reports (reader) because it should never write.

Local development

Create a MySQL database for local development:

mysql -u root -e "CREATE DATABASE IF NOT EXISTS bi_tool"

Set up your .env:

OPENAI_API_KEY=sk-your-key-here
OPENAI_MODEL=gpt-4o-mini
PORT=3000
MYSQL_HOST=127.0.0.1
MYSQL_PORT=3306
MYSQL_USER=root
MYSQL_PASSWORD=
MYSQL_DATABASE=bi_tool

Seed the database:

pnpm seed

You should see output listing the inserted records. Start the dev server:

pnpm dev

Open http://localhost:3000 and try asking questions.

Deploying to Upsun

Initialize Git and push:

git init
git add .
git commit -m "Initial commit: AI BI tool"

Create the Upsun project:

upsun login
upsun project:create

Set the OpenAI API key:

upsun variable:create \
  --level project \
  --name env:OPENAI_API_KEY \
  --value "sk-your-key-here" \
  --sensitive true \
  --visible-build false \
  --visible-runtime true

Deploy:

upsun push

Upsun provisions the Node.js container and the MariaDB service, runs the build hook (install deps, compile TypeScript), then the deploy hook (seed the database), then starts the server. First deploy takes a bit longer because of the seeding. Access your BI tool:

upsun url

Testing

Try these questions:

“What are the top 10 best-selling products?”
“Which customers have spent the most money?”
“Show me the monthly revenue trend”
“What are the highest rated products by category?”
“How many orders are still pending?”
“What’s the average order value by country?”
“Which products have the most reviews but the lowest ratings?”

Each response shows the SQL that was generated, then the formatted answer. Monitor logs:

upsun logs --tail

You’ll see the SQL queries, response times, and any validation rejections.

How the safety layers stack up

Three layers between the user and a write query:

The system prompt tells the LLM to only generate SELECT queries. Works most of the time, but LLMs can be prompted to ignore instructions.
The SQL guard validates the query in application code. Checks it starts with SELECT, has no forbidden keywords, no stacked statements, and a LIMIT. Deterministic, so it catches what the LLM gets wrong.
The read-only MySQL endpoint rejects writes at the database level. If both layers above fail, the connection itself cannot modify data.

The prompt is soft. The guard is deterministic but still application code, so a bug could bypass it. The endpoint is a MySQL user that literally cannot write. You want all three.

​What you’re building

​What changed from the chatbot

​Prerequisites

​Project setup

​The database layer

​Database connection

​The schema definition

​The SQL guard

​The chat logic

​Fixture data

​The Upsun configuration

​MySQL read-only endpoints

​The deploy hook

​Local development

​Deploying to Upsun

​Testing

​How the safety layers stack up

​Resources