Build production-grade data pipelines

For data teams who want reliable ETL with native DuckDB and Ducklake integrations. No Airflow or Spark clusters to manage.

Write pipeline steps in Python, TypeScript, SQL, Go, Bash and 15+ languages
Native integrations with DuckDB, Ducklake and Polars with zero config
Built-in S3 / Azure Blob / GCS workspace storage with dataset browsing
Run on the fastest workflow engine in the industry

Trusted by 4,000+ organizations, including 300+ EE customers at scale:

ZoomZoomKahootInvesting.comCFA InstituteAxiansAxiansPhotoroomPavePanther LabsNocd

Everything you need to build and run production-grade data pipelines

Write each step in Python, TypeScript, SQL, Go, Bash or any supported language, connect to your data sources with native DuckDB and Ducklake integrations, and deploy with built-in scheduling, retries and observability.

Steps as code

Write each pipeline step in the language that fits best. Python, TypeScript, SQL, Go, Bash, Rust, PHP and 20+ more. Mix and match freely within a single pipeline.

Steps as code

DAG visualizer

See your entire pipeline as an interactive graph. Inspect inputs and outputs at each node, follow data as it flows between steps, spot bottlenecks at a glance and restart from any step without replaying the full run.

DAG visualizer

Connect to any service

Connect to databases, APIs and third-party services using typed resources. Credentials are stored centrally and injected at runtime. Share connections across scripts and flows without duplicating secrets.

Connect to any service

Parallel branches

Fan out extraction steps across independent sources with parallel branches and collect results automatically. Configure for-loops with configurable parallelism to process batches concurrently.

Parallel branches

Restart from any step

Fix a bug and re-run from the failing step in the flow editor. No need to replay the entire pipeline or re-extract data from upstream sources.

Restart from any step

Retries & error handlers

Each step can have its own error handling strategy and configurable retries with exponential backoff. Run a custom script on failure (send a Slack alert, create a ticket), stop the pipeline early or mark non-critical steps to continue on error.

Retries & error handlers

Trigger from anywhere

Start pipelines from cron schedules, webhooks, Postgres CDC, Kafka, SQS, or manually from the UI. Combine multiple trigger types on the same pipeline.

Trigger from anywhere

Data tables

Built-in relational storage with zero setup via data tables. Query from Python, TypeScript, SQL or DuckDB. Credentials are managed internally and never exposed.

Data tables

Deploy & version control

Promote pipelines from draft to production in one click with full deployment history and instant rollbacks. Sync your workspace with GitHub or GitLab, use your existing code review workflows and deploy via the UI, the CLI or CI/CD pipelines.

Deploy & version control

Full observability

Every pipeline run is logged with inputs, outputs, duration and status. Filter by success or failure, inspect logs and re-run with one click. Track resource usage, monitor worker groups and set up alerts for failures.

Full observability

The native DuckDB and Ducklake orchestrator

The only orchestrator with zero-config DuckDB, Ducklake and S3 support. Credentials and connections are handled automatically, just write your query.

DuckDB

DuckDB

Query S3 files with SQL. DuckDB scripts auto-connect to your workspace storage. No credentials to manage, no connection strings to configure.

DuckDB
Ducklake

Ducklake

Store massive datasets in S3 and query them with SQL. Full data lake with catalog support, versioning and ACID transactions.

Ducklake
Workspace S3

Workspace S3

Link your workspace to S3, Azure Blob, GCS, R2 or MinIO. Browse and preview Parquet, CSV and JSON directly from the UI.

Workspace S3
Polars

Polars

Lightning-fast DataFrames in Python. Read and write Parquet directly from your workspace S3 bucket with zero config.

Polars

Assets lineage

Pipeline steps pass datasets as lightweight JSON pointers to S3 objects. No serialization overhead, no memory limits.

Assets lineage
And 50+ more:PostgreSQLMySQLBigQuerySnowflakeRedshiftClickHouseMongoDBREST APIsSee all integrations

Challenging the status quo of data warehouses

Stop paying per-query. Run DuckDB and Ducklake locally on your workers.

WindmillWindmill + DuckDBDuckDB
SnowflakeSnowflake / BigQueryBigQuery
ComputeLocal on your workersRemote warehouse
Cost modelFlat, pay for infra onlyPer-query pricing
Data storageYour S3 bucket, open formatsVendor-managed, proprietary
Vendor lock-inNoYes
OrchestrationBuilt-in (flows, retries, schedules)Separate tool needed
SetupZero config, auto-connectedCredentials, drivers, networking
Data egress feesNoYes

Windmill also orchestrates Snowflake, BigQuery and other warehouses. You can mix local DuckDB steps with remote warehouse queries in the same pipeline.

Production-grade performance that replaces Spark

Polars and DuckDB process data on a single node far faster than distributed frameworks for the vast majority of ETL workloads.

TPC-H benchmark, 8 queries on m4.xlarge (8 vCPUs, 32 GB RAM)

Frequently asked questions

Build your internal platform on Windmill

Scripts, flows, apps, and infrastructure in one place.