How to Choose Data Types in Redshift

Galaxy Glossary

How do I choose the right Redshift data type?

Redshift data types control storage size, precision, and performance for each table column.

Welcome to the Galaxy, Guardian!
You'll be receiving a confirmation email

Follow us on twitter :)

Oops! Something went wrong while submitting the form.

Description

Example H2

Example H3

What are Redshift data types?

Redshift stores every value in a column according to a specific data type such as INT, DECIMAL, or VARCHAR. The type fixes how many bytes are used, the valid range, and how values are compared and sorted.

Which integer size should I use?

Use SMALLINT (-32,768 to 32,767, 2 bytes) for flags or small counters, INT (-2.1 billion to 2.1 billion, 4 bytes) for IDs, and BIGINT (8 bytes) for large sequences like Order IDs across shards. Smaller types reduce disk and memory usage.

How do I store money or precise totals?

Choose DECIMAL(p,s) when you need fixed precision, e.g., DECIMAL(10,2) for price up to 99,999,999.99. Avoid FLOAT for currency because binary rounding causes inaccuracies.

When should I use VARCHAR vs CHAR?

VARCHAR(n) stores strings up to n characters without trailing space padding, ideal for names and emails. CHAR(n) always pads to n, useful for fixed-length codes like country ISO.

How do I handle dates and time zones?

DATE stores calendar dates, TIMESTAMP stores date+time without zone, and TIMESTAMPTZ stores with zone. Convert client times to UTC on load for consistency.

Can I keep semi-structured JSON?

The SUPER data type lets you ingest JSON or Parquet and query with PartiQL. Ideal for flexible product attributes or event logs.

Best practices for choosing types

Prefer the smallest sufficient type

Smaller types speed up scans and reduce costs. Always profile max value before choosing.

Set VARCHAR length realistically

Over-allocating VARCHAR(65535) blocks compression. Size it to the real-world max plus margin.

Avoid implicit casts

Compare columns of the same type to keep predicates SARGable and enable zone-map pruning.

Why How to Choose Data Types in Redshift is important

How to Choose Data Types in Redshift Example Usage


-- Sum total revenue by day using proper DECIMAL aggregation
SELECT order_date::date   AS day,
       SUM(oi.quantity * p.price)::DECIMAL(18,2) AS daily_revenue
FROM   Orders      o
JOIN   OrderItems  oi ON oi.order_id = o.id
JOIN   Products    p  ON p.id = oi.product_id
GROUP  BY day
ORDER  BY day;

How to Choose Data Types in Redshift Syntax


-- Numeric types
SMALLINT | INT | BIGINT
DECIMAL(precision, scale)
REAL | DOUBLE PRECISION

-- Character types
CHAR(n)
VARCHAR(n)

-- Date & Time
DATE
TIMESTAMP [WITHOUT TIME ZONE]
TIMESTAMPTZ

-- Boolean
BOOLEAN

-- Semi-structured
SUPER

-- Example table definition
CREATE TABLE Products (
    id          INT          IDENTITY(1,1),
    name        VARCHAR(200) NOT NULL,
    price       DECIMAL(10,2) NOT NULL,
    stock       INT,
    created_at  TIMESTAMPTZ  DEFAULT current_timestamp
);

Common Mistakes

Storing currency in FLOAT leads to rounding errors. Fix by using DECIMAL(precision, scale) instead.
Using VARCHAR(65535) for every text column blocks compression and wastes space. Fix by sizing VARCHAR to expected max length.