Snowflake Postgres and pg_lake: Real-Time Data Mirroring Without ETL
🔄 Update — June 25, 2026: pg_lake Goes Open Source to Enable Postgres Lakehouses
The pg_lake extension powering Zero-ETL mirroring has been officially open-sourced. Developers can now connect PostgreSQL databases directly to Apache Iceberg and data lakehouses, allowing Postgres to natively query, write, and manage Iceberg tables using standard SQL.
What’s new?
- Open Source Release: pg_lake is now available as a set of open-source Postgres extensions, enabling integration into self-hosted and cloud-based PostgreSQL environments.
- Native Lakehouse Access: PostgreSQL can natively query and write to Apache Iceberg tables stored in object storage (like AWS S3) without external replication layers or ETL pipelines.
- DuckDB Integration: The extension leverages a DuckDB sidecar engine to optimize and accelerate large-scale analytical queries directly on object storage.
Why this adds to the article
This open-source release extends the Zero-ETL data mirroring concepts previously limited to managed Snowflake Postgres deployments to the broader PostgreSQL community, making native lakehouse storage accessible to anyone.
Summary
Snowflake has announced “Snowflake Postgres,” a fully managed PostgreSQL database service. A core highlight of this release is pipeline-free, near real-time data mirroring powered by pg_lake, an open-source extension. This architecture allows organizations to bridge the gap between transactional OLTP databases and analytical data lakehouses by writing PostgreSQL data directly into Apache Iceberg format on cloud object storage.
What happened?
- Product Announcement: Snowflake Postgres launched as a fully managed service featuring 99.95% uptime SLAs, connection pooling, and built-in extensions such as
pg_vectorandPostGIS. - pg_lake Integration: Originally developed by Crunchy Data, the
pg_lakeextension is now natively embedded, allowing PostgreSQL to write and manage data directly in the open Apache Iceberg format. - Zero-ETL Mirroring: Operational database tables are mirrored into Iceberg tables. Because Snowflake queries Iceberg natively, data is instantly available without external data movement tools.
- Transactional Consistency: Writes to the data lake adhere to Postgres ACID semantics; if the cloud storage write fails, the entire Postgres transaction rolls back.
Why it matters
Maintaining complex ETL pipelines between transactional (OLTP) and analytical (OLAP) environments is one of the most common friction points in modern data engineering. By establishing Apache Iceberg as a shared storage layer, Snowflake Postgres allows applications to write to database tables via standard SQL (e.g., CREATE TABLE ... USING iceberg), while Snowflake analytics and Cortex AI applications get immediate access to the same data without ingestion lag.
Evidence
- Official Blog Post: Detailed technical specs are outlined in the blog post Snowflake Postgres Unifies Your Apps, Analytics and AI.
- AI Pulse Announcements: The feature was highlighted alongside adaptive compute updates in the Snowflake AI Pulse – June 2026 Product Announcements.
- Architecture Details: Under the hood,
pg_lakeutilizes highly optimized query execution engines (like DuckDB) to perform object storage operations efficiently.
Analysis
The integration of pg_lake represents a major shift toward Zero-ETL architectures. By outputting Parquet/Iceberg files directly from the database engine, Snowflake Postgres bypasses the complexity of Change Data Capture (CDC) pipelines. However, writing to object storage is inherently slower than writing to local NVMe drives. While pg_lake mitigates this with DuckDB-based buffering, the performance under high-concurrency transactional write pressure remains a key metric to watch in production environments.
Practical Takeaways
- Eliminate CDC Complexity: Consider Snowflake Postgres for new operational applications to avoid setting up external CDC brokers like Debezium.
- Leverage Apache Iceberg: Store analytical datasets in open Iceberg formats to retain portability and avoid cloud vendor lock-in.
- Real-time Cortex AI: Point Snowflake’s Cortex AI features directly to the mirrored Iceberg tables for real-time model context.
Open Questions
- What is the latency and throughput overhead on OLTP writes when writing to object storage via
pg_lakeunder heavy workloads? - How does the pricing model of Snowflake Postgres compare to standard AWS RDS or Google Cloud SQL offerings?