How to Choose Data Types in Redshift

Galaxy Glossary

How do I choose the right Redshift data type?

Redshift data types control storage size, precision, and performance for each table column.

Sign up for the latest in SQL knowledge from the Galaxy Team!
Welcome to the Galaxy, Guardian!
Oops! Something went wrong while submitting the form.

Description

What are Redshift data types?

Redshift stores every value in a column according to a specific data type such as INT, DECIMAL, or VARCHAR. The type fixes how many bytes are used, the valid range, and how values are compared and sorted.

Which integer size should I use?

Use SMALLINT (-32,768 to 32,767, 2 bytes) for flags or small counters, INT (-2.1 billion to 2.1 billion, 4 bytes) for IDs, and BIGINT (8 bytes) for large sequences like Order IDs across shards. Smaller types reduce disk and memory usage.

How do I store money or precise totals?

Choose DECIMAL(p,s) when you need fixed precision, e.g., DECIMAL(10,2) for price up to 99,999,999.99. Avoid FLOAT for currency because binary rounding causes inaccuracies.

When should I use VARCHAR vs CHAR?

VARCHAR(n) stores strings up to n characters without trailing space padding, ideal for names and emails. CHAR(n) always pads to n, useful for fixed-length codes like country ISO.

How do I handle dates and time zones?

DATE stores calendar dates, TIMESTAMP stores date+time without zone, and TIMESTAMPTZ stores with zone. Convert client times to UTC on load for consistency.

Can I keep semi-structured JSON?

The SUPER data type lets you ingest JSON or Parquet and query with PartiQL. Ideal for flexible product attributes or event logs.

Best practices for choosing types

Prefer the smallest sufficient type

Smaller types speed up scans and reduce costs. Always profile max value before choosing.

Set VARCHAR length realistically

Over-allocating VARCHAR(65535) blocks compression. Size it to the real-world max plus margin.

Avoid implicit casts

Compare columns of the same type to keep predicates SARGable and enable zone-map pruning.

Why How to Choose Data Types in Redshift is important

How to Choose Data Types in Redshift Example Usage


-- Sum total revenue by day using proper DECIMAL aggregation
SELECT order_date::date   AS day,
       SUM(oi.quantity * p.price)::DECIMAL(18,2) AS daily_revenue
FROM   Orders      o
JOIN   OrderItems  oi ON oi.order_id = o.id
JOIN   Products    p  ON p.id = oi.product_id
GROUP  BY day
ORDER  BY day;

How to Choose Data Types in Redshift Syntax


-- Numeric types
SMALLINT | INT | BIGINT
DECIMAL(precision, scale)
REAL | DOUBLE PRECISION

-- Character types
CHAR(n)
VARCHAR(n)

-- Date & Time
DATE
TIMESTAMP [WITHOUT TIME ZONE]
TIMESTAMPTZ

-- Boolean
BOOLEAN

-- Semi-structured
SUPER

-- Example table definition
CREATE TABLE Products (
    id          INT          IDENTITY(1,1),
    name        VARCHAR(200) NOT NULL,
    price       DECIMAL(10,2) NOT NULL,
    stock       INT,
    created_at  TIMESTAMPTZ  DEFAULT current_timestamp
);

Common Mistakes

Frequently Asked Questions (FAQs)

Can I change a column’s data type later?

Yes, use ALTER TABLE ... ALTER COLUMN ... TYPE, but it rewrites data. Copy large tables into a new table for zero-downtime migrations.

What is the size limit for VARCHAR?

Maximum is 65535 bytes. Remember multibyte UTF-8 characters may use more than one byte per character.

Does Redshift support ARRAY types?

Native ARRAY is not supported; use SUPER with PartiQL, or normalise into child tables for high-performance analytics.

Want to learn about other SQL terms?

Trusted by top engineers on high-velocity teams
Aryeo Logo
Assort Health
Curri
Rubie
BauHealth Logo
Truvideo Logo
Welcome to the Galaxy, Guardian!
Oops! Something went wrong while submitting the form.