KSQL Getting Started
This guide provides data analytics leaders and technical teams with a clear, practical overview of ksqlDB capabilities within the Kadeck KSQL Lab environment. ksqlDB is a purpose-built streaming database for Apache Kafka that enables organizations to perform real-time data processing, enrichment, aggregation, and serving using familiar SQL syntax — delivering low-latency insights directly from event streams without the overhead of traditional batch pipelines.
Enterprise Context and Business Value
In large-scale environments, organizations generate continuous streams of business events (orders, transactions, sensor readings, customer interactions, etc.). ksqlDB allows analytics teams to:
- Transform and enrich raw event streams in real time
- Build materialized views that stay continuously up to date
- Support both continuous (push) and on-demand (pull) queries
- Enable operational analytics, fraud detection, real-time dashboards, and microservice architectures with minimal custom code
The following examples use a simple fictional retail order-processing scenario to illustrate core patterns that scale to production enterprise workloads.
Kadeck KSQL Lab organizes ksqlDB queries into domains and worksheets, providing comprehensive role-based access control (RBAC) aligned with each user's organizational domain and permissions.
Example Data Model
Topic: orders (Stream)
Represents individual customer orders in a global retail operation.
Example record (JSON):
{
"order_id": "ORD-12345",
"user_id": "U-789",
"product_id": "P-456",
"quantity": 2,
"price": 29.99,
"status": "SHIPPED",
"order_time": 1714500000000
}
Topic: users (Table)
Contains the current state of customer profiles (updated whenever a profile changes).
Example record (JSON):
{
"user_id": "U-789",
"username": "alice_smith",
"country": "DE",
"loyalty_tier": "GOLD",
"signup_date": "2024-01-15"
}
Core Concepts
| Concept | Description | Business Analogy | Typical Use Case |
|---|---|---|---|
| Stream | Append-only, unbounded sequence of events | Transaction log / event ledger | Orders, payments, IoT telemetry |
| Table | Materialized current state per key | Traditional database table | Customer profiles, product catalog |
Exploring the Environment
SHOW TOPICS;
SHOW STREAMS;
SHOW TABLES;
SHOW QUERIES;
-- Inspect raw data
PRINT 'orders' FROM BEGINNING LIMIT 10;
-- View schema and metadata
DESCRIBE orders;
DESCRIBE EXTENDED orders;
Registering Streams and Tables
-- Register orders as a Stream
CREATE STREAM orders (
order_id VARCHAR,
user_id VARCHAR,
product_id VARCHAR,
quantity INTEGER,
price DOUBLE,
status VARCHAR,
order_time BIGINT
) WITH (
KAFKA_TOPIC = 'orders',
VALUE_FORMAT = 'JSON',
KEY_FORMAT = 'KAFKA'
);
-- Register users as a Table
CREATE TABLE users (
user_id VARCHAR PRIMARY KEY,
username VARCHAR,
country VARCHAR,
loyalty_tier VARCHAR,
signup_date VARCHAR
) WITH (
KAFKA_TOPIC = 'users',
VALUE_FORMAT = 'JSON'
);
Query Patterns
Push Query (continuous results as new data arrives):
SELECT * FROM orders
EMIT CHANGES
LIMIT 20;
Pull Query (one-time snapshot, similar to traditional SQL):
SELECT * FROM users
WHERE loyalty_tier = 'GOLD';
Filtering, Projection, and Transformation
SELECT
order_id,
user_id,
quantity * price AS total_amount,
status,
TIMESTAMPTOSTRING(order_time, 'yyyy-MM-dd HH:mm:ss') AS order_datetime
FROM orders
WHERE status = 'SHIPPED'
AND quantity > 1
EMIT CHANGES;
Aggregations and Windowed Analytics
-- Current totals per product
CREATE TABLE order_summary AS
SELECT
product_id,
COUNT(*) AS order_count,
SUM(quantity * price) AS total_revenue,
LATEST_BY_OFFSET(status) AS latest_status
FROM orders
GROUP BY product_id
EMIT CHANGES;
-- Windowed revenue (tumbling 1-hour windows)
CREATE TABLE hourly_revenue AS
SELECT
product_id,
WINDOWSTART AS window_start,
WINDOWEND AS window_end,
COUNT(*) AS orders_in_window,
SUM(quantity * price) AS revenue_in_window
FROM orders
WINDOW TUMBLING (SIZE 1 HOUR)
GROUP BY product_id
EMIT CHANGES;
Joins and Data Enrichment
Stream + Table Join (highly efficient for enrichment):
CREATE STREAM enriched_orders AS
SELECT
o.order_id,
o.product_id,
o.quantity,
o.price,
o.status,
u.username,
u.country,
u.loyalty_tier
FROM orders o
LEFT JOIN users u
ON o.user_id = u.user_id
EMIT CHANGES;
Persistent Materialized Views (CTAS / CSAS)
Persistent queries run continuously and write results back to Kafka topics, enabling downstream consumption by dashboards, microservices, or other systems.
-- High-value orders stream
CREATE STREAM high_value_orders AS
SELECT *
FROM enriched_orders
WHERE (quantity * price) > 100
EMIT CHANGES;
-- Daily top products table
CREATE TABLE top_products AS
SELECT
product_id,
COUNT(*) AS sales_count,
SUM(quantity * price) AS total_revenue
FROM orders
WINDOW TUMBLING (SIZE 1 DAY)
GROUP BY product_id
EMIT CHANGES;
Data Ingestion for Testing
INSERT INTO orders (order_id, user_id, product_id, quantity, price, status, order_time)
VALUES ('ORD-99999', 'U-123', 'P-777', 3, 49.99, 'PENDING', 1714503600000);
Administration and Governance
SHOW QUERIES;
EXPLAIN SELECT * FROM enriched_orders;
TERMINATE <query-id>;
-- Cleanup
DROP STREAM IF EXISTS enriched_orders;
DROP TABLE IF EXISTS order_summary;
Quick Reference Cheat Sheet
| Objective | Example Statement |
|---|---|
| Live order monitoring | SELECT * FROM orders EMIT CHANGES LIMIT 50; |
| Current customer status lookup | SELECT * FROM users WHERE loyalty_tier = 'GOLD'; |
| Real-time revenue by product | Create order_summary table (see above) |
| Enriched orders with customer data | Stream + Table JOIN (see above) |
| High-value order alerting | CREATE STREAM high_value_orders AS … |