Skip to main content

KSQL Getting Started

This guide provides data analytics leaders and technical teams with a clear, practical overview of ksqlDB capabilities within the Kadeck KSQL Lab environment. ksqlDB is a purpose-built streaming database for Apache Kafka that enables organizations to perform real-time data processing, enrichment, aggregation, and serving using familiar SQL syntax — delivering low-latency insights directly from event streams without the overhead of traditional batch pipelines.

Enterprise Context and Business Value

In large-scale environments, organizations generate continuous streams of business events (orders, transactions, sensor readings, customer interactions, etc.). ksqlDB allows analytics teams to:

  • Transform and enrich raw event streams in real time
  • Build materialized views that stay continuously up to date
  • Support both continuous (push) and on-demand (pull) queries
  • Enable operational analytics, fraud detection, real-time dashboards, and microservice architectures with minimal custom code

The following examples use a simple fictional retail order-processing scenario to illustrate core patterns that scale to production enterprise workloads.

Kadeck KSQL Lab organizes ksqlDB queries into domains and worksheets, providing comprehensive role-based access control (RBAC) aligned with each user's organizational domain and permissions.

Example Data Model

Topic: orders (Stream)
Represents individual customer orders in a global retail operation.

Example record (JSON):

{
"order_id": "ORD-12345",
"user_id": "U-789",
"product_id": "P-456",
"quantity": 2,
"price": 29.99,
"status": "SHIPPED",
"order_time": 1714500000000
}

Topic: users (Table)
Contains the current state of customer profiles (updated whenever a profile changes).

Example record (JSON):

{
"user_id": "U-789",
"username": "alice_smith",
"country": "DE",
"loyalty_tier": "GOLD",
"signup_date": "2024-01-15"
}

Core Concepts

ConceptDescriptionBusiness AnalogyTypical Use Case
StreamAppend-only, unbounded sequence of eventsTransaction log / event ledgerOrders, payments, IoT telemetry
TableMaterialized current state per keyTraditional database tableCustomer profiles, product catalog

Exploring the Environment

SHOW TOPICS;
SHOW STREAMS;
SHOW TABLES;
SHOW QUERIES;

-- Inspect raw data
PRINT 'orders' FROM BEGINNING LIMIT 10;

-- View schema and metadata
DESCRIBE orders;
DESCRIBE EXTENDED orders;

Registering Streams and Tables

-- Register orders as a Stream
CREATE STREAM orders (
order_id VARCHAR,
user_id VARCHAR,
product_id VARCHAR,
quantity INTEGER,
price DOUBLE,
status VARCHAR,
order_time BIGINT
) WITH (
KAFKA_TOPIC = 'orders',
VALUE_FORMAT = 'JSON',
KEY_FORMAT = 'KAFKA'
);

-- Register users as a Table
CREATE TABLE users (
user_id VARCHAR PRIMARY KEY,
username VARCHAR,
country VARCHAR,
loyalty_tier VARCHAR,
signup_date VARCHAR
) WITH (
KAFKA_TOPIC = 'users',
VALUE_FORMAT = 'JSON'
);

Query Patterns

Push Query (continuous results as new data arrives):

SELECT * FROM orders 
EMIT CHANGES
LIMIT 20;

Pull Query (one-time snapshot, similar to traditional SQL):

SELECT * FROM users 
WHERE loyalty_tier = 'GOLD';

Filtering, Projection, and Transformation

SELECT 
order_id,
user_id,
quantity * price AS total_amount,
status,
TIMESTAMPTOSTRING(order_time, 'yyyy-MM-dd HH:mm:ss') AS order_datetime
FROM orders
WHERE status = 'SHIPPED'
AND quantity > 1
EMIT CHANGES;

Aggregations and Windowed Analytics

-- Current totals per product
CREATE TABLE order_summary AS
SELECT
product_id,
COUNT(*) AS order_count,
SUM(quantity * price) AS total_revenue,
LATEST_BY_OFFSET(status) AS latest_status
FROM orders
GROUP BY product_id
EMIT CHANGES;

-- Windowed revenue (tumbling 1-hour windows)
CREATE TABLE hourly_revenue AS
SELECT
product_id,
WINDOWSTART AS window_start,
WINDOWEND AS window_end,
COUNT(*) AS orders_in_window,
SUM(quantity * price) AS revenue_in_window
FROM orders
WINDOW TUMBLING (SIZE 1 HOUR)
GROUP BY product_id
EMIT CHANGES;

Joins and Data Enrichment

Stream + Table Join (highly efficient for enrichment):

CREATE STREAM enriched_orders AS
SELECT
o.order_id,
o.product_id,
o.quantity,
o.price,
o.status,
u.username,
u.country,
u.loyalty_tier
FROM orders o
LEFT JOIN users u
ON o.user_id = u.user_id
EMIT CHANGES;

Persistent Materialized Views (CTAS / CSAS)

Persistent queries run continuously and write results back to Kafka topics, enabling downstream consumption by dashboards, microservices, or other systems.

-- High-value orders stream
CREATE STREAM high_value_orders AS
SELECT *
FROM enriched_orders
WHERE (quantity * price) > 100
EMIT CHANGES;

-- Daily top products table
CREATE TABLE top_products AS
SELECT
product_id,
COUNT(*) AS sales_count,
SUM(quantity * price) AS total_revenue
FROM orders
WINDOW TUMBLING (SIZE 1 DAY)
GROUP BY product_id
EMIT CHANGES;

Data Ingestion for Testing

INSERT INTO orders (order_id, user_id, product_id, quantity, price, status, order_time)
VALUES ('ORD-99999', 'U-123', 'P-777', 3, 49.99, 'PENDING', 1714503600000);

Administration and Governance

SHOW QUERIES;
EXPLAIN SELECT * FROM enriched_orders;
TERMINATE <query-id>;

-- Cleanup
DROP STREAM IF EXISTS enriched_orders;
DROP TABLE IF EXISTS order_summary;

Quick Reference Cheat Sheet

ObjectiveExample Statement
Live order monitoringSELECT * FROM orders EMIT CHANGES LIMIT 50;
Current customer status lookupSELECT * FROM users WHERE loyalty_tier = 'GOLD';
Real-time revenue by productCreate order_summary table (see above)
Enriched orders with customer dataStream + Table JOIN (see above)
High-value order alertingCREATE STREAM high_value_orders AS …