How to Set Up Amazon Redshift on Linux

Galaxy Glossary

How do I set up Amazon Redshift completely from a Linux terminal?

Install the AWS CLI, create a Redshift cluster, connect with psql, and load data—all from any Linux distribution.

Sign up for the latest in SQL knowledge from the Galaxy Team!
Welcome to the Galaxy, Guardian!
Oops! Something went wrong while submitting the form.

Description

Why set up Amazon Redshift on Linux?

Running Redshift from Linux lets you automate cluster creation, load data, and query warehouses directly in scripts and CI pipelines. The steps below work on Ubuntu, Debian, RHEL, CentOS, and Amazon Linux.

Which tools are required?

Install AWS CLI v2 for cluster management, the PostgreSQL client (psql) for SQL access, and optionally the Amazon Redshift ODBC/JDBC driver for GUI or BI tools. An AWS account and IAM user with AmazonRedshiftFullAccess are mandatory.

How do I install AWS CLI and psql on Linux?

Use curl -O https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip, unzip, then run sudo ./aws/install. On Debian-based systems install psql via sudo apt install postgresql-client; on RPM-based systems use sudo yum install postgresql.

How do I authenticate the AWS CLI?

Run aws configure and enter Access Key, Secret Key, default region, and output format. Credentials are stored in ~/.aws/credentials.

How do I create a Redshift cluster from Linux?

Execute aws redshift create-cluster with identifier, node type, node count, master username, password, security-group IDs, and IAM role. The cluster endpoint appears once status becomes available.

How do I open port 5439?

Modify the attached security group to allow inbound TCP 5439 from your IP. Alternatively, keep the port closed and connect through an SSH tunnel.

How do I connect with psql?

Run psql -h ENDPOINT -p 5439 -U admin -d dev. If SSL errors appear, add sslmode=require.

How do I create ecommerce tables?

After connecting, run CREATE TABLE statements for Customers, Orders, Products, and OrderItems. Use DISTKEY on customer_id and SORTKEY on order_date to optimize joins.

How do I load data from S3?

Stage CSV or Parquet files in S3, then execute COPY with an IAM role ARN. Redshift ingests data in parallel and skips headers with IGNOREHEADER 1.

How do I automate setup and teardown?

Store commands in shell scripts or CI jobs. Delete non-production clusters via aws redshift delete-cluster --skip-final-cluster-snapshot to cut costs.

Best practices checklist

  • Enable server-side encryption (KMS).
  • Rotate the master user password.
  • Use separate IAM roles for COPY/UNLOAD.
  • Schedule snapshots and set retention.
  • Add cost-allocation tags.

Common mistakes and how to fix them

Wrong AWS region: a psql timeout usually means the cluster lives in another region. Re-run commands with --region set correctly.

Missing IAM role in COPY: error 500 occurs when the role isn’t associated. Fix with aws redshift associate-iam-roles.

Need a faster SQL editor?

Try Galaxy—an AI-powered desktop SQL editor that connects to Redshift, autocompletes queries, and lets teams share vetted SQL in one click.

FAQ

Is Redshift Serverless manageable from AWS CLI?

Yes. Use aws redshift-serverless create-workgroup and create-namespace. Connect with the endpoint returned by get-workgroup.

Can I avoid opening port 5439 publicly?

Absolutely. Create a bastion host in the same VPC and establish an SSH tunnel: ssh -L 5439:ENDPOINT:5439 ec2-user@BASTION.

Why How to Set Up Amazon Redshift on Linux is important

How to Set Up Amazon Redshift on Linux Example Usage


SELECT c.name,
       SUM(oi.quantity * p.price) AS lifetime_value
FROM Customers   c
JOIN Orders      o  ON o.customer_id  = c.id
JOIN OrderItems  oi ON oi.order_id    = o.id
JOIN Products    p  ON p.id           = oi.product_id
GROUP BY c.name
ORDER BY lifetime_value DESC
LIMIT 10;

How to Set Up Amazon Redshift on Linux Syntax


# 1. Create a two-node RA3 cluster
aws redshift create-cluster \
  --cluster-identifier ecommerce-cluster \
  --node-type ra3.xlplus \
  --number-of-nodes 2 \
  --master-username admin \
  --master-user-password 'Str0ngPass!' \
  --vpc-security-group-ids sg-0123456789abcdef \
  --iam-roles arn:aws:iam::123456789012:role/RedshiftCopyRole \
  --region us-east-1

# 2. Connect from Linux using psql
psql -h ecommerce-cluster.abc123.us-east-1.redshift.amazonaws.com \
     -p 5439 -U admin -d dev

# 3. Load product data from S3
COPY Products (id, name, price, stock)
FROM 's3://my-bucket/products/'
IAM_ROLE 'arn:aws:iam::123456789012:role/RedshiftCopyRole'
FORMAT AS CSV
IGNOREHEADER 1;

Common Mistakes

Frequently Asked Questions (FAQs)

How long does cluster creation take?

A multi-node RA3 cluster typically takes 10-15 minutes to reach status available. Smaller DC2 clusters finish in 5-10 minutes.

Can I use Redshift Spectrum from Linux?

Yes. After connecting with psql, create an external schema linked to an AWS Glue Catalog and query data in S3 alongside native tables.

How do I stop paying when I’m done?

Run aws redshift delete-cluster --cluster-identifier ecommerce-cluster --skip-final-cluster-snapshot. This deletes compute nodes but preserves snapshots if you didn’t skip them.

Want to learn about other SQL terms?

Trusted by top engineers on high-velocity teams
Aryeo Logo
Assort Health
Curri
Rubie
BauHealth Logo
Truvideo Logo
Welcome to the Galaxy, Guardian!
Oops! Something went wrong while submitting the form.